This is the tenth post in a series about Microsoft SQL Server R Services, and the ninth post that drills down into the internal of how it works. To see other posts (including this) in the series, go to SQL Server R Services.
We are now nearing the end of the Internals part of this series (you probably say “about time”). When I started this series way back in March this year, I thought that the Internals would be two, maximum three episodes. Boy was I wrong!
In this post we’ll look at how the various components are communicating with each other.
In Internals - VIII we looked at what different components are involved when we execute
- SQL Server calls into the launchpad service.
- The launchpad service calls into the
Rlauncher.dllcreates Rterm processes.
- Through Rterm, the
R.dllis loaded together with
- To coordinate with SQL Server,
The above can be illustrated as in Figure 1 below
Figure 1: Components In Play
The questions are though how the components are communicating and what is involved in getting back to SQL Server after
BxServerLink.dll have done their “magic”?
Before we drill into the questions above, let’s look at the demo code we’ll use. Let’s start with setting up the database and a table with some data we can use. All in all 1 million rows:
USE master; GO SET NOCOUNT ON; GO DROP DATABASE IF EXISTS TestParallel; GO CREATE DATABASE TestParallel; GO USE TestParallel; GO DROP TABLE IF EXISTS dbo.rand_1M CREATE TABLE dbo.rand_1M(RowID bigint identity primary key, y bigint, rand1 bigint, rand2 bigint, rand3 bigint, rand4 bigint, rand5 bigint); INSERT INTO dbo.rand_1M(y, rand1, rand2, rand3, rand4, rand5) SELECT TOP(1000000) CAST(ABS(CHECKSUM(NEWID())) % 14 AS bigint) , CAST(ABS(CHECKSUM(NEWID())) % 20 AS bigint) , CAST(ABS(CHECKSUM(NEWID())) % 25 AS bigint) , CAST(ABS(CHECKSUM(NEWID())) % 14 AS bigint) , CAST(ABS(CHECKSUM(NEWID())) % 50 AS bigint) , CAST(ABS(CHECKSUM(NEWID())) % 100 AS bigint) FROM sys.objects o1 CROSS JOIN sys.objects o2 CROSS JOIN sys.objects o3 CROSS JOIN sys.objects o4; GO
Code Snippet 1: Setup of Database, Table and Data
The data is more or less the same data we used in Internals - V, and the code to execute looks also almost the same:
EXEC sp_execute_external_script @language = N'R' , @script = N' #Sys.sleep(10) pid <- Sys.getpid() d <- getwd() cat(paste0("ProcessId: ", pid)) cat("\n") cat(paste0("WorkDir: ", d)) cat("\n") r <- rxLinMod(y ~ rand1 + rand2 + rand3 + rand4 + rand5, data=InputDataSet) coef <- r$coefficients icept <- coef; OutputDataSet <- data.frame(pid=pid, nRows=r$nValidObs, intercept=icept)' , @input_data_1 = N' SELECT y, rand1, rand2, rand3, rand4, rand5 FROM dbo.rand_10M' WITH RESULT SETS ((pid BIGINT NOT NULL, nRows BIGINT NOT NULL, intercept FLOAT NULL));
Code Snippet 2: Code to Execute
In Code Snippet 2 we see how we read in data from the statement in the
@input_data_1 parameter. The actual R code retrieves the process id and work directory and assigns them to the
d variables which then are printed out through the
cat statements. After a model has been created in
rxLinMod some information about the model is returned as a resultset from the
OutputDataSet variable. The schema of the resultset is defined in
WITH RESULT SETS .... For us to be able to determine what is happening when executing the code, there is a
Sys.sleep statement, which for now is commented out. Before we go any further, let’s make sure it works:
- Restart the launchpad service (stop and start).
- Execute the code in Code Snippet 2.
If everything has worked OK, the Results tab in SSMS should look something like in Figure 2, and the Message tab like in Figure 3:
Figure 2: Results from Running the Code
Figure 3: Message Output from Running the Code
As we know that the execution of the R code happens outside of SQL Server, the question is what component(s) are responsible for returning results and messages to SQL Server. Before looking at that, let’s look at how communication happens between the various components.
The components we are interested in the communication between are:
- SQL Server -> Launchpad
- launchpad -> Rterm
- Rterm -> BxlServer
SQL Server -> Launchpad
So, let’s start with the communication between SQL Server and the launchpad service. In Internals - I and Internals - II we used WinDbg and we saw how SQL Server calls (among other things)
sqllang!CSQLSatelliteConnection::OpenNpConnection to create and open a named pipe to the launchpad service. Let us make double sure that there really is named pipe involved, and the way we’ll do that is by using WinDbg and Process Explorer.
What we’ll do is:
- Restart SQL Server and the launchpad service (please, do not do this on a production server :) ).
- Start WinDbg and Process Explorer as admin.
- In Process Explorer, ensure that the lower pane is visible (Ctrl + L).
To begin we’ll see what named pipes SQL Server has created “out of the box” so to speak. Named pipes are exposed as handles in Process Explorer, so by clicking on the
sqlserver.exe process in Process Explorer and then Ctrl + H you should see handles in the lower pane. By ordering on name and scrolling down to
Device\Named pipe, you should see something like so:
Figure 4: SQL Server Initial Named Pipes
As we see in Figure 4, we have (at least on my machine) six named pipes. There is nothing that would indicate a named pipe to the launchpad service.
Now you can go on and see what named pipes (if any) the launchpad service (
launchpad.exe) has. On my machine it looks like this:
Figure 5: Launchpad Named Pipes
Hmm, that is interesting - there are two named pipes with a name similar to what we have seen in previous posts: sqlsatellite. However, since the launchpad service is inactive right now, it seems those two named pipes are not connected to anything.
So far we do not have any evidence of named pipes between SQL Server and the launchpad service, but on the other hand we have not executed anything yet either (since the restart). Let us now go on and execute the code in Code Snippet 2 again, but before we do that let us do what we did in Internals - I and set a breakpoint at
sqllang!CSQLSatelliteConnection::OpenNpConnection. So attach WinDbg to the
sqlserver.exe process and set the breakpoint.
Before you execute the code, uncomment the
Sys.sleep call and change the time to sleep to 30 seconds (
Sys.sleep(30)), so you can investigate what is happening. Now, go ahead and execute.
The breakpoint is hit and execution stops. Look in Process Explorer and see what named pipes there are both for SQL Server as well as the launchpad service. At this stage there should not be any more named pipes than what you see in Figure 4 and Figure 5. Continue execution of the code, and look again in Process Explorer. You should now see for SQL Server something like in Figure 6:
Figure 6: SQL Server Named Pipes
For the launchpad service it looks something like Figure 7:
Figure 7: Launchpad Named Pipes During Execution
We see now how for SQL Server we have a new named pipe:
\Device\NamedPipe\sqlsatellitelaunch, and for the launchpad service a corresponding additional named pipe:
\Device\NamedPipe\sqlsatellitelaunch. By now we should be fairly certain that communication between SQL Server and the launchpad service is through named pipes.
Launchpad -> Rterm
NOTE: The heading above is somewhat misleading as it is not
Launchpad.exethat connects to Rterm but
Rlauncher.dll, however we’ll use launchpad for now.
When I started writing this section, I thought: “how hard can this be, the launchpad service obviously talks to Rterm via named pipes”. It turned out to be not that simple. The way I went about “proving” (or not) that named pipes were used was similar to what we did above between SQL Server and the launchpad service, but initially I didn’t use WinDbg:
- Restart SQL Server and the launchpad service.
- Make sure Process Explorer is up and running.
- Set a sufficiently high
Sys.sleepvalue in the code (I ended up using 120 seconds).
- Execute the code.
When I at this stage looked at the handles for the launchpad service, I saw the three named pipes we saw in Figure 7. That would make sense, we know that one of them is the named pipe connection from SQL Server, and then one of the other two would be used between the launchpad service and Rterm.
NOTE: When using named pipes, and you look at them in Process Explorer, you should see the same name in both source as well as destination.
The problem was that I did not see a named pipe in the executing Rterm process, with the same name as in the launchpad service. Hmm, there went that theory. Well, if named pipes are not used, what about TCP/IP and sockets? So I noted the PID of the launchpad service, executed the code again, and while the code was running I (from command line) used
netstat to see what socket connections there were. The command I used was
netstat -a -n -o, which means:
- show all connections and listening ports (
- display addresses and port numbers in in numerical form (
- display the owning PID associated with each connection (
I looked for the PID of the launchpad service to see if there were any connections, but none could be found. Now the original question of “how hard can this be” had an answer - “actually, pretty hard”! At this stage I resorted to some WinDbg “spelunking”, and looked for “promising” routine names. Seeing that a lot of the routines we have seen so far in this series comes from classes with “satellite” in them I executed:
x *!*satellite*. Eish, that returned a lot. However, while scrolling through the output, I came across following routine:
launchpad!CSQLSatelliteConnection::AddProviderDoneIOCPCallback. Hmm, something that has to do with callback from
IOCP, could it be that the launchpad service connects via IO Completion Ports (IOCP)? Only one way to find out - set some breakpoints in WinDbg.
In Internals - II where we discussed what happens inside the launchpad service, we had a diagram of the flow of a call looking like so:
Figure 8: Launchpad Service Call Flow
In Figure 8 we see how the communication to Rterm is done in the
launchpad!CSQLSatelliteConnection::WriteMessage call, so the setup of the IOCP should be done before the
WriteMessage call, and most likely after the
LaunchServTask call. So in the already attached WinDbg, I set three breakpoints:
bp KERNELBASE!CreateIoCompletionPort(I had read documentation of IOCP and saw that this was a likely call).
If my theory was to be correct, I would see the execution first stop at
LaunchServTask, followed by
CreateIoCompletionPort and then finally at
WriteMessage. I executed the code in Code Snippet 2 with
Sys.sleep(10) and this is what I saw in WinDbg:
0:028> g Breakpoint 3 hit launchpad!CLaunchContext::LaunchServTask: 0:009> g Breakpoint 0 hit KERNELBASE!CreateIoCompletionPort: 0:011> g Breakpoint 1 hit launchpad!CSQLSatelliteConnection::WriteMessage: 0:009> g
Code Snippet 3: Launchpad Breakpoints being Hit
This to me indicates that yes, IOCP is being used as communication mechanism between the launchpad service and Rterm.
Rterm -> BxlServer
In Internals - VIII we spoke about
BxlServer.exe, how it hosts RevoScaleR code and how it also coordinates with the R runtime in order to manage exchanges of data with SQL Server. The
BxlServer.exe process is created by the Rterm process via the
RxLink.dll which acts as a conduit. The source code for
RxLink.dll is available, and when I had a look at the source, I saw quite a lot of code related to named pipes. So my assumption is that communication between the Rterm process (via
RxLink.dll) and the BxlServer process happens through named pipes.
Seeing that I don’t have any debug symbols for
BxlServer.exe we’ll try and use other methods to prove (or disprove) that named pipes are used. The way we’ll do it is to use Process Explorer as we did with the communication between SQL Server and the launchpad service. Now is probably a good time to restart both the SQL Server service as well as the launchpad service (just to start afresh).
After the SQL Server and the launchpad service has been restarted bring back Process Explorer again (as admin). This time though we’ll look at the Rterm process and the hosted BxlServer process:
- In Process Explorer navigate to the launchpad process.
- Set big enough
Sys.sleepvalue in the code. 90 seconds worked fine for me.
When executing the code, look in Process Explorer for the Rterm process and BxlServer process where you have a
CPU value greater than 0:
Figure 8: Rterm and BxlServer Processes
In Figure 8 we see the Rterm and BxlServer processes that are active. When you have found the active processes check if there are any named pipe handles for those two processes (Ctrl+H):
Figure 9: Rterm Named Pipes
Figure 10: BxlServer Named Pipes
Figure 9 and Figure 10 show the named pipes file handles for the Rterm process (Figure 9), and BxlServer process (Figure 10). As we can, the same named pipes exists in both processes (they have the same names), so we can with some certainty say that communication between Rterm and BxlServer is done via named pipes. With this in mind we can enhance Figure 1 and show the communication mechanisms:
Figure 11: Communication Mechanisms
In Figure 11 the numbers indicate:
- 1 - named pipe.
- 2 - IOCP.
- 3 - named pipe.
Now, we’ve seen how the various components are communicating with each other, but - how is data exchanged? E.g, how does a resultset get back to SQL Server, and does the print (
cat) statements travel back the same way? In Figure 11 wee see how the BxlServer is doing “stuff” and then interacts with BxServerLink, but it kind of stops there. Surely a dataset like in our code wouldn’t go back over the IOCP? Would there be any other components in play? Let’s see what we can find out.
Once again use the Process Explorer, but now we are interested in what dll’s the BxlServer loads while it’s running. Execute the code again, and when you have found the “active” BxlServer process do a Ctrl+D to find the loaded dlls, and see what “pops up”:
Figure 12: SqlSatellite.dll
When we look at the various dll’s that are loaded by BxlServer we come across a name we have seen and briefly mentioned before:
SqlSatellite.dll. In Internals II, III and VII we saw how paths were set, pointing to where the
sqlsatellite.dll is located. In Internals - III we also mentioned how SqlSatellite is an API to support external code and external run times. It is loaded by the BxlServer process and it handles data transfers between SQL Server and external languages (in this case R).
Now we have an idea who handles data exchange (at least some of it), but what mechanism is used? Named pipes could be an option, and we saw in SQL Server that named pipe:
\Device\NamedPipe\sqlsatellitelaunch. However, that pipe is used by SQL Server for communication with the launchpad service, and there were really no other named pipes that seemed to fit in. How about sockets and TCP/IP? That could be an option, let’s see.
Once again, do a restart of SQL Server as well as the launchpad service to make sure we are in a clean state. Make a note of the process id (PID) of SQL Server, well use it to see what socket connections SQL Server has. If Process Explorer is not running, start it up as admin. When SQL Server is restarted check what TCP connections we have by running the
netstat -a -n -o -p TCP command.
TCP 127.0.0.1:1434 0.0.0.0:0 LISTENING 2900 TCP 127.0.0.1:52637 0.0.0.0:0 LISTENING 2900
Code Snippet 4: TCP Connections SQL Server
The PID for SQL Server after I restarted was 2900, and from Code Snippet 4 we can see two listening connections, with no source as of yet. The listener for port 1434 is for the SQL Server Browser Service, whereas port 52637 is something of a mystery. Let’s now execute our code and see if anything changes (I set
Sys.sleep to 90 seconds to make sure I had enough time to see what happens). While the code is executing run
netstat -a -n -o -p TCP -b (
-b will display the executable involved in creating the connection). When I executed the code and ran the netstat command I saw the following:
TCP 127.0.0.1:1434 0.0.0.0:0 LISTENING 2900 [sqlservr.exe] TCP 127.0.0.1:52637 0.0.0.0:0 LISTENING 2900 [sqlservr.exe] TCP 127.0.0.1:52637 127.0.0.1:64511 ESTABLISHED 2900 [sqlservr.exe] TCP 127.0.0.1:64511 127.0.0.1:52637 ESTABLISHED 8088 [BxlServer.exe]
Code Snippet 5: TCP Connections SQL Server while Executing
That is interesting! From Code Snippet 5 we see how SQL Server now has an established TCP connection on port 52637, and the connection originates from PID 8088. That PID is
BxlServer.exe, and - as we know from earlier -
SqlSatellite.dll. We now have a TCP/IP connection from SqlSatellite to SQL Server, and it is over that connection data is exchanged.
NOTE: Not all data is exchanged over the TCP/IP socket connection, as we will see in next post.
Finally, let’s do some more WinDbg “spelunking” in order to understand how and when the TCP connection is established:
- Restart SQL Server and the launchpad service, to clean things up.
- Attach WinDbg to the SQL Server process
- Browse symbols with the name TCP
When browsing for the symbols, you can use this command:
x /1 *!TCP*. By using the option
/1 you’ll only see the names, and no addresses. On my machine that gives me quite a lot, but there are two entries that catch my eye:
sqllang!Tcp::Close. So let us set breakpoints at those two symbols, and see what happens when we execute our code.
The result when executing the code is that we initially break at
sqllang!Tcp::AcceptConnection. Followed somewhat later by breaking at
sqllang!Tcp::Close. Cool, this seems to work - let us set some more breakpoints and try to figure out the flow of events.
To be able to see the flow, we need to involve the launchpad service as well, as it is that service who “spins up” the Rterm process who in turn creates the BxlServer process who loads BxServerLink as well as SqlSatellite. So, remember from Internals - III what the code looked like (from a very high level) to create the executing Rterm processes, it was something like this:
launchpad!SQLSatellite_LaunchProcess launchpad!CreateProcessForSatelliteSession launchpad!PhysicalUserContext::LaunchProcess KERNELBASE!CreateProcessInternalW launchpad!Satellite_ResourceManager::AssociateProcessToJobObject launchpad!SatelliteJobObject::AssociateProcess launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo launchpad!CSQLSatelliteConnection::WriteMessage
Code Snippet 6: Launchpad Code Flow
As discussed in Internal - II and Internals - III, when an external script is executed 6 different Rterm processes are “spun” up, bit only one will be the executing process, and the code in Code Snippet 6 illustrates the code creating the executing Rterm process.
With this in mind, let us:
- Stop the launchpad service.
- Delete any sub folders of the user account folders in the
C:\<path_to_sql_server_instance>\MSSQL\ExtensibilityDatadirectory. Do NOT delete the user account folders themselves.
- Restart the launchpad service.
- Attach WinDbg to the launchpad process.
Set breakpoints in the attached launchpad process at (make sure that the SQL Server breakpoints are enabled):
What we now want to do is to run the two WinDbg instances side by side and see when the individual breakpoints are hit, and in what order. Before we begin, clear out the WinDbg command output (
.cls). Execute the code and look closely at what happens in the two WinDbg instances.
What I see on my machine is that I break two or three times in launchpad at
launchpad!SatelliteJobObject::AssociateProcess, and after the second or third break, I break at
sqllang!Tcp::AcceptConnection in SQL Server. That is followed by
launchpad!CSQLSatelliteConnection::WriteMessage, plus more
launchpad!SatelliteJobObject::AssociateProcess. After a while I receive two more
launchpad!CSQLSatelliteConnection::WriteMessage in the launchpad service and a
sqllang!Tcp::Close in SQL Server.
The conclusion I draw from this is that after the executing Rterm process has been created (and BxlServer, BxLinkServer and SqlSatellite), SqlSatellite connects into SQL Server.
You may notice that the
launchpad!CSQLSatelliteConnection::WriteMessage breakpoint will be hit a couple of times after the SQL Server
sqllang!Tcp::AcceptConnection. We’ll discuss that in the next Internals blogpost.
In this post we set out trying to determine what communication mechanisms there are between the various components interacting when
sp_execute_external_script is called. We also saw how a “new” component
sqlsatellite.dll is in the mix, and in the next blogpost in the Internals series we’ll look more in detail what it does.
So, Figure 13 below illustrates quite well the communication mechanisms between the various components:
Figure 13: How Communication Happens
The figure looks quite like Figure 11 and the numbers indicate:
- 1 - named pipe.
- 2 - IOCP.
- 3 - named pipe.
- 4 - TCP/IP
If you have comments, questions etc., please comment on this post or ping me.