This is the ninth post in a series about Microsoft SQL Server R Services, and the eight post that drills down into the internal of how it works. To see other posts (including this) in the series, go to: SQL Server R Services
So far in this series we have been looking at what happens in SQL Server as well as the launchpad service when we execute
sp_execute_external_script, and we have still no real “clue” to where the R engine comes into play.
Well, hopefully that will change (at least a little bit) with this post, as we here will look at what happens when we leave the launchpad service.
Before we dive into the “juicy” stuff, let’s remind ourselves where we are.
Normally in the recaps in this series, we have looked at what was covered in the previous “episode”. In this recap, let’s look at he “full” picture up until now.
The first post in the series - Microsoft SQL Server 2016 R Services Installation - covered the installation of SQL Server 2016 R Services, and it also touched upon the external procedure which allows us to execute external scripts;
sp_execute_external_script. We looked at the signature of the procedure as well as executing the equivalent to a “Hello World” script.
In the subsequent posts we talked about - when executing
sp_execute_external_script - how SQL Server calls into the launchpad service, and how the launchpad service - through the
rlauncher.dll creates multiple
Rterm.exe processes as in Figure 1 below. One of the processes will be used to run the external script:
Figure 1: Rterm.exe Processes
I addition to creating
Rterm.exe processes, the launchpad service also creates backing directories for those processes. These backing directories are used for saving output, intermediate results etc. The following figure was used to show what the call flow looks like:
Figure 2: Call Flow Executing sp_execute_external_script
We discussed how the number of processes can be controlled by the
PROCESS_POOL_SQLSATELLITE_GROWTH setting in
rlauncher.config file, and how it defaults to 5 if nothing is set.
In Internals - VI we came back to the backing directories, and we realized that in addition to the backing directories created for the Rterm processes, one more directory is created - which will be the “official” working directory for the session, and we showed this using this figure:
Figure 3: Launchpad, Directories and Processes
While we were investigating the directories created in Internals - VI we saw that, while we executed an external script, files and sub-directories were created in the various backing directories:
Figure 4: Contents Process Directory
In Figure 4 we see he content of the directory which is the processing directory, and in Internals - VII we looked into what creates those files/directories and what they are for. We came to the conclusion that both the launchpad service (probably through the
rlauncher.dll) created some files, whereas
Rterm.exe created others.
That’s where we are, and now it is time to look at the world outside of the launchpad, and try to figure out what is being used when executing an external script.
In Internals - III we saw how the launchpad service creates Rterm processes, and how the launchpad service calls
launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo followed by
launchpad!CSQLSatelliteConnection::WriteMessage to call into the Rterm process chosen for execution of the external script. So what does
Rterm.exe and the RTerm process do?
To answer that let us first look at what the R “engine” really is, and let’s do that by looking at standard CRAN R, and for this I use RStudio as an IDE. So, let’s open RStudio and make sure it uses the CRAN R engine:
Figure 5: RStudion CRAN R
At that stage if I check in Process Explorer to see what processes are loaded I see something like this:
Figure 6: Processes Loaded
So in Figure 6 we see how the
rstudio.exe is host for
rsession.exe, which in turn host
conhost.exe. For this discussion
conhost.exe has no real impact, but if you want to know more about it, here is a good article explaining why we have
Going back to
rsession.exe; in Process Explorer, we look at what dll’s it has loaded (click on `
rsession.exe followed by Ctrl-D ) we’ll see something like this:
Figure 7: Loaded dll’s
Figure 7 above shows 7 dll’s loaded by
rsession.exe, and what is interesting is the second from bottom, which has a path pointing to the R libraries. When we look further down in the list we’ll see more dll’s from the R libraries:
Figure 8: R Dlls
Among the dll’s we see the
R.dll, which is the main dll for R. So, it seems that
rsession.exe acts as a host for the R engine, and the R engine is a “bunch” of dll’s.
Coming back to
rterm.exe and what it is, we can now assume it acts in the same way as
rsession.exe. Let us go back to the code that we used in Internals - III when we captured the figure we above in Figure 1:
EXEC sp_execute_external_script @language =N'R', @script=N'OutputDataSet<-InputDataSet; Sys.sleep(120);', @input_data_1 =N'SELECT 42' WITH RESULT SETS (([TheAnswer] int not null));
Code Snippet 1: Execute with Sys.Sleep
Like we did in Internals - III, we’ll execute the code in Code Snippet 1, and while the code is running we’ll look in Process Explorer and see if we find anything of interest:
- Stop the launchpad service.
- Restart the launchpad service.
- Start Process Explorer, order by Process, and scroll down to where you see process names starting with “RT” (on my box there are none at this stage), or where the processes should be.
- If you at this stage see RTerm, restart the launchpad service again and kill those processes.
- Execute the code in Code Snippet 1.
While the code is running, take a quick look in Process Explorer, and you should see something like so:
Figure 9: RTerm Processes
The difference between Figure 1 and Figure 9, is that in Figure 1 I had collapsed the Rterm processes. In Figure 9 we see how the launchpad service is host for multiple Rterm processes, and how
Rterm.exe is host for
conhost.exe as well as
BxlServer.exe. We’ll come back shortly to what
Let’s look a bit closer at what dll’s we find for one of the active Rterm processes, click on the Rterm process which has a CPU value, and then click Ctrl-D.
NOTE: You may have to re-execute the code in Code Snippet 1, maybe even increase the sleep time to 3 or 4 minutes to be able to see everything.
When I scroll down in the dll list I see some familiar R dll’s, that we also saw in Figure 8:
Figure 10: Rterm R dll’s
At this stage we can safely say that the
Rterm.exe and its process hosts the R engine. That is cool (yeah I know - I need to get a life), but what does that
BxlServer.exe do in the mix?
So the question was, what is the
BxlServer.exe? To answer that let’s take a step back, and look at the R offerings Microsoft has:
- Microsoft R Open - a Microsoft open source offering of R
- Microsoft R Server - Microsoft’s commercial R offering.
- SQL Server R Services - the moral equivalent of Microsoft R Server, but with SQL Server as delivery vehicle.
Let’s see what happens if we pointed RStudio towards the Microsoft R Open environment:
Figure 11: RStudio Targeting MRO
In Process Explorer, looking at the rstudio process, it won’t look any different than what it did when Rstudio targeted CRAN R as in Figure 6, so obviously the Microsoft R Open offering does not “know” of the
BxlServer.exe we see in Figure 9. But, looking at the dll’s loaded by
rsession.exe, we see more or less the same as was loaded by
rterm.exe in Figure 10:
Figure 12: RStudio MRO dll’s
What would happen if RStudio targeted the Microsoft R Server (MRS) instead:
Figure 13: RStudio Targeting MRS
This time when looking at the rstudio process we do see
BxlServer.exe as a process hosted by
rsession.exe, so it seems that
BxlServer.exe is specifically for the commercial versions of Microsoft R.
NOTE: There is no difference between the
BxlServer.exein MRS and SQL Server R Services. Well, not much anyway :), in next post we’ll see some differences.
By now you are probably saying something like: “Niels, this is very informative - NOT, get to the point - what is
BxlServer.exe”. OK, so let us see some of the dll’s
BxlServer.exe has loaded (click on the
BxlServer.exe process and Ctrl-D):
Figure 13: BxlServer Loaded dll’s
Ooh, that’s interesting, look at the path and especially what is in the red rectangle:
RevoScaleR, is an R package providing both High Performance Computing (HPC) and High Performance Analytics (HPA) capabilities for R. HPC capabilities allow you to distribute the execution of essentially any R function across cores and nodes, and deliver the results back to the user. HPA adds the ability to handle big data in a high performance way. In addition to this (or rather as part of) RevoScaleR also has functionality to handle data access with high performance, and
BxlServer.exe is the host for all this.
In addition to run the HPC, HPA and data access,
BxlServer.exe also coordinates with the R runtime in order to manage exchanges of data with SQL Server, and it runs most of the R processing. A lot of the coordination and handling of result sets in
BxlServer.exe is handled by a dll loaded by
BxServerLink.dll. Below we see how the
BxServerLink.dll is loaded by
Figure 14: BxServerLink Loaded by BxlServer.exe
Let’s go back to Figure 13 , where we see how
BxlServer.exe. How does that work, I mean -
rsession.exe is a CRAN R executable, and it does not know anything about Microsoft R Services and
BxlServer.exe? The answer is
R.dll (which is a Microsoft dll), the
R.dll then starts to load other dll’s. But,
BxlServer.exe is not a dll, and it runs in a separate process, surely
R.dll cannot do that? No, it cannot - but what it does do is to load a dll acting as a conduit: the
RxLink.dll creates a new process in where
BxlServer.exe runs, and that is how communication happens between
BxlServer.exe. And that is also what happens in SQL Server R Services where
Rterm.exe is the “host”.
Rterm.exe loads the
R.dll which loads
RxLink.dll who then creates a process for
NOTE: In the next Internals post we’ll look more at
Finally, let’s make sure that
BxlServer.exe actually is doing something when we execute an external script in SQL Server R Services. For that we’ll use some code we wrote in Internals V, where we discussed parallelism:
USE master; GO SET NOCOUNT ON; GO DROP DATABASE IF EXISTS TestParallel; GO CREATE DATABASE TestParallel; GO USE TestParallel; GO DROP TABLE IF EXISTS dbo.rand_30M CREATE TABLE dbo.rand_30M(RowID bigint identity primary key, y int, rand1 int, rand2 int, rand3 int, rand4 int, rand5 int); INSERT INTO dbo.rand_30M(y, rand1, rand2, rand3, rand4, rand5) SELECT TOP(30000000) CAST(ABS(CHECKSUM(NEWID())) % 14 AS INT) , CAST(ABS(CHECKSUM(NEWID())) % 20 AS INT) , CAST(ABS(CHECKSUM(NEWID())) % 25 AS INT) , CAST(ABS(CHECKSUM(NEWID())) % 14 AS INT) , CAST(ABS(CHECKSUM(NEWID())) % 50 AS INT) , CAST(ABS(CHECKSUM(NEWID())) % 100 AS INT) FROM sys.objects o1 CROSS JOIN sys.objects o2 CROSS JOIN sys.objects o3 CROSS JOIN sys.objects o4; GO
Code Snippet 2: Setup of Database, Table and Data
The data we create in Code Snippet 2 makes absolutely no sense at all, but it will serve our purposes of have some volume of data to work with, and cause the execution to run for a little while.
NOTE: Go back to Internals - V if you are interested in seeing what this code does.
The code we want to execute in SQL Server looks like so:
EXEC sp_execute_external_script @language = N'R' , @script = N' #Sys.sleep(30) pid <- Sys.getpid() r <- rxLinMod(y ~ rand1 + rand2 + rand3 + rand4 + rand5, data=InputDataSet) coef <- r$coefficients icept <- coef\[1\]; OutputDataSet <- data.frame(pid=pid, nRows=r$nValidObs, intercept=icept)' , @input_data_1 = N' SELECT y, rand1, rand2, rand3, rand4, rand5 FROM dbo.rand_30M' WITH RESULT SETS ((pid INT NOT NULL, nRows INT NOT NULL, intercept FLOAT NULL) );
Code Snippet 3: Linear Regression in SQL Server
To run this:
- Restart the launchpad service (this is to clean-up any RTerm processes).
- Navigate to the
Launchpad.exeprocess in Process Explorer.
- Execute the code in Code Snippet 3.
- While the code is executing, look in Process Explorer for RTerm processes.
Among the Rterm and BxlServer processes you should see one
BxlServer.exe process that has a CPU value, as well as consuming some memory:
Figure 15: BxlServer Processing
So, as I mentioned above, it is the
BxlServer.exe which is doing he heavy lifting in SQL Server R Services.
In this post we have now seen what happens when executing
- SQL Server calls into the launchpad service.
- The launchpad service calls into the
Rlauncher.dllcreates Rterm processes.
- Through Rterm, the
R.dllis loaded together with
- To coordinate with SQL Server,
Below I have tried to describe it in a figure:
Figure 15: BxlServer Processing
Notice how, in Figure 15, there is nothing about how data gets back to SQL Server. That is something we will discuss in next post. Oh and yes, Figure 15 looks very much like one of the figures you can see in this post, and I freely admit I have looked at it :).
If you have comments, questions etc., please comment on this post or ping me.