This is the tenth post in a series about Microsoft SQL Server R Services, and the ninth post that drills down into the internal of how it works. To see other posts (including this) in the series, go to SQL Server R Services.
We are now nearing the end of the Internals part of this series (you probably say “about time”). When I started this series way back in March this year, I thought that the Internals would be two, maximum three episodes. Boy was I wrong!
In this post we’ll look at how the various components are communicating with each other.
Recap
In
Internals - VIII we looked at what different components are involved when we execute sp_execute_external_script
:
- SQL Server calls into the launchpad service.
- The launchpad service calls into the
Rlauncher.dll
. - The
Rlauncher.dll
creates Rterm processes. - Through Rterm, the
R.dll
is loaded together withRxLink.dll
. - The
RxLink.dll
creates theBxlServer.exe
process. - To coordinate with SQL Server,
BxlServer.exe
loadsBxServerLink.dll
.
The above can be illustrated as in Figure 1 below
Figure 1: Components In Play
The questions are though how the components are communicating and what is involved in getting back to SQL Server after BxlServer.exe
and BxServerLink.dll
have done their “magic”?
Demo Code
Before we drill into the questions above, let’s look at the demo code we’ll use. Let’s start with setting up the database and a table with some data we can use. All in all 1 million rows:
|
|
Code Snippet 1: Setup of Database, Table and Data
The data is more or less the same data we used in Internals - V, and the code to execute looks also almost the same:
|
|
Code Snippet 2: Code to Execute
In Code Snippet 2 we see how we read in data from the statement in the @input_data_1
parameter. The actual R code retrieves the process id and work directory and assigns them to the pid
and d
variables which then are printed out through the cat
statements. After a model has been created in rxLinMod
some information about the model is returned as a resultset from the OutputDataSet
variable. The schema of the resultset is defined in WITH RESULT SETS ...
. For us to be able to determine what is happening when executing the code, there is a Sys.sleep
statement, which for now is commented out. Before we go any further, let’s make sure it works:
- Restart the launchpad service (stop and start).
- Execute the code in Code Snippet 2.
If everything has worked OK, the Results tab in SSMS should look something like in Figure 2, and the Message tab like in Figure 3:
Figure 2: Results from Running the Code
Figure 3: Message Output from Running the Code
As we know that the execution of the R code happens outside of SQL Server, the question is what component(s) are responsible for returning results and messages to SQL Server. Before looking at that, let’s look at how communication happens between the various components.
Communication
The components we are interested in the communication between are:
- SQL Server -> Launchpad
- launchpad -> Rterm
- Rterm -> BxlServer
SQL Server -> Launchpad
So, let’s start with the communication between SQL Server and the launchpad service. In
Internals - I and
Internals - II we used WinDbg and we saw how SQL Server calls (among other things) sqllang!CSQLSatelliteConnection::OpenNpConnection
to create and open a named pipe to the launchpad service. Let us make double sure that there really is named pipe involved, and the way we’ll do that is by using WinDbg and
Process Explorer.
What we’ll do is:
- Restart SQL Server and the launchpad service (please, do not do this on a production server :) ).
- Start WinDbg and Process Explorer as admin.
- In Process Explorer, ensure that the lower pane is visible (Ctrl + L).
To begin we’ll see what named pipes SQL Server has created “out of the box” so to speak. Named pipes are exposed as handles in Process Explorer, so by clicking on the sqlserver.exe
process in Process Explorer and then Ctrl + H you should see handles in the lower pane. By ordering on name and scrolling down to Device\Named
pipe, you should see something like so:
Figure 4: SQL Server Initial Named Pipes
As we see in Figure 4, we have (at least on my machine) six named pipes. There is nothing that would indicate a named pipe to the launchpad service.
Now you can go on and see what named pipes (if any) the launchpad service (launchpad.exe
) has. On my machine it looks like this:
Figure 5: Launchpad Named Pipes
Hmm, that is interesting - there are two named pipes with a name similar to what we have seen in previous posts: sqlsatellite. However, since the launchpad service is inactive right now, it seems those two named pipes are not connected to anything.
So far we do not have any evidence of named pipes between SQL Server and the launchpad service, but on the other hand we have not executed anything yet either (since the restart). Let us now go on and execute the code in Code Snippet 2 again, but before we do that let us do what we did in
Internals - I and set a breakpoint at sqllang!CSQLSatelliteConnection::OpenNpConnection
. So attach WinDbg to the sqlserver.exe
process and set the breakpoint.
Before you execute the code, uncomment the Sys.sleep
call and change the time to sleep to 30 seconds (Sys.sleep(30)
), so you can investigate what is happening. Now, go ahead and execute.
The breakpoint is hit and execution stops. Look in Process Explorer and see what named pipes there are both for SQL Server as well as the launchpad service. At this stage there should not be any more named pipes than what you see in Figure 4 and Figure 5. Continue execution of the code, and look again in Process Explorer. You should now see for SQL Server something like in Figure 6:
Figure 6: SQL Server Named Pipes
For the launchpad service it looks something like Figure 7:
Figure 7: Launchpad Named Pipes During Execution
We see now how for SQL Server we have a new named pipe: \Device\NamedPipe\sqlsatellitelaunch
, and for the launchpad service a corresponding additional named pipe: \Device\NamedPipe\sqlsatellitelaunch
. By now we should be fairly certain that communication between SQL Server and the launchpad service is through named pipes.
Launchpad -> Rterm
NOTE: The heading above is somewhat misleading as it is not
Launchpad.exe
that connects to Rterm butRlauncher.dll
, however we’ll use launchpad for now.
When I started writing this section, I thought: “how hard can this be, the launchpad service obviously talks to Rterm via named pipes”. It turned out to be not that simple. The way I went about “proving” (or not) that named pipes were used was similar to what we did above between SQL Server and the launchpad service, but initially I didn’t use WinDbg:
- Restart SQL Server and the launchpad service.
- Make sure Process Explorer is up and running.
- Set a sufficiently high
Sys.sleep
value in the code (I ended up using 120 seconds). - Execute the code.
When I at this stage looked at the handles for the launchpad service, I saw the three named pipes we saw in Figure 7. That would make sense, we know that one of them is the named pipe connection from SQL Server, and then one of the other two would be used between the launchpad service and Rterm.
NOTE: When using named pipes, and you look at them in Process Explorer, you should see the same name in both source as well as destination.
The problem was that I did not see a named pipe in the executing Rterm process, with the same name as in the launchpad service. Hmm, there went that theory. Well, if named pipes are not used, what about TCP/IP and sockets? So I noted the PID of the launchpad service, executed the code again, and while the code was running I (from command line) used netstat
to see what socket connections there were. The command I used was netstat -a -n -o
, which means:
- show all connections and listening ports (
-a
). - display addresses and port numbers in in numerical form (
-n
). - display the owning PID associated with each connection (
-o
).
I looked for the PID of the launchpad service to see if there were any connections, but none could be found. Now the original question of “how hard can this be” had an answer - “actually, pretty hard”! At this stage I resorted to some WinDbg
“spelunking”, and looked for “promising” routine names. Seeing that a lot of the routines we have seen so far in this series comes from classes with “satellite” in them I executed: x *!*satellite*
. Eish, that returned a lot. However, while scrolling through the output, I came across following routine: launchpad!CSQLSatelliteConnection::AddProviderDoneIOCPCallback
. Hmm, something that has to do with callback from IOCP
, could it be that the launchpad service connects via IO Completion Ports (IOCP)? Only one way to find out - set some breakpoints in WinDbg.
In Internals - II where we discussed what happens inside the launchpad service, we had a diagram of the flow of a call looking like so:
Figure 8: Launchpad Service Call Flow
In Figure 8 we see how the communication to Rterm is done in the launchpad!CSQLSatelliteConnection::WriteMessage
call, so the setup of the IOCP should be done before the WriteMessage
call, and most likely after the LaunchServTask
call. So in the already attached WinDbg, I set three breakpoints:
bp launchpad!CLaunchContext::LaunchServTask
bp KERNELBASE!CreateIoCompletionPort
(I had read documentation of IOCP and saw that this was a likely call).bp launchpad!CSQLSatelliteConnection::WriteMessage
If my theory was to be correct, I would see the execution first stop at LaunchServTask
, followed by CreateIoCompletionPort
and then finally at WriteMessage
. I executed the code in Code Snippet 2 with Sys.sleep(10)
and this is what I saw in WinDbg:
|
|
Code Snippet 3: Launchpad Breakpoints being Hit
This to me indicates that yes, IOCP is being used as communication mechanism between the launchpad service and Rterm.
Rterm -> BxlServer
In
Internals - VIII we spoke about BxlServer.exe
, how it hosts RevoScaleR code and how it also coordinates with the R runtime in order to manage exchanges of data with SQL Server. The BxlServer.exe
process is created by the Rterm process via the RxLink.dll
which acts as a conduit. The source code for RxLink.dll
is
available, and when I had a look at the source, I saw quite a lot of code related to named pipes. So my assumption is that communication between the Rterm process (via RxLink.dll
) and the BxlServer process happens through named pipes.
Seeing that I don’t have any debug symbols for Rterm.exe
, R.dll
, RxLink.dll
or BxlServer.exe
we’ll try and use other methods to prove (or disprove) that named pipes are used. The way we’ll do it is to use Process Explorer as we did with the communication between SQL Server and the launchpad service. Now is probably a good time to restart both the SQL Server service as well as the launchpad service (just to start afresh).
After the SQL Server and the launchpad service has been restarted bring back Process Explorer again (as admin). This time though we’ll look at the Rterm process and the hosted BxlServer process:
- In Process Explorer navigate to the launchpad process.
- Set big enough
Sys.sleep
value in the code. 90 seconds worked fine for me.
When executing the code, look in Process Explorer for the Rterm process and BxlServer process where you have a CPU
value greater than 0:
Figure 8: Rterm and BxlServer Processes
In Figure 8 we see the Rterm and BxlServer processes that are active. When you have found the active processes check if there are any named pipe handles for those two processes (Ctrl+H):
Figure 9: Rterm Named Pipes
Figure 10: BxlServer Named Pipes
Figure 9 and Figure 10 show the named pipes file handles for the Rterm process (Figure 9), and BxlServer process (Figure 10). As we can, the same named pipes exists in both processes (they have the same names), so we can with some certainty say that communication between Rterm and BxlServer is done via named pipes. With this in mind we can enhance Figure 1 and show the communication mechanisms:
Figure 11: Communication Mechanisms
In Figure 11 the numbers indicate:
- 1 - named pipe.
- 2 - IOCP.
- 3 - named pipe.
SqlSatellite
Now, we’ve seen how the various components are communicating with each other, but - how is data exchanged? E.g, how does a resultset get back to SQL Server, and does the print (cat
) statements travel back the same way? In Figure 11 wee see how the BxlServer is doing “stuff” and then interacts with BxServerLink, but it kind of stops there. Surely a dataset like in our code wouldn’t go back over the IOCP? Would there be any other components in play? Let’s see what we can find out.
Once again use the Process Explorer, but now we are interested in what dll’s the BxlServer loads while it’s running. Execute the code again, and when you have found the “active” BxlServer process do a Ctrl+D to find the loaded dlls, and see what “pops up”:
Figure 12: SqlSatellite.dll
When we look at the various dll’s that are loaded by BxlServer we come across a name we have seen and briefly mentioned before: SqlSatellite.dll
. In Internals
II,
III and
VII we saw how paths were set, pointing to where the sqlsatellite.dll
is located. In
Internals - III we also mentioned how SqlSatellite is an API to support external code and external run times. It is loaded by the BxlServer process and it handles data transfers between SQL Server and external languages (in this case R).
Communication
Now we have an idea who handles data exchange (at least some of it), but what mechanism is used? Named pipes could be an option, and we saw in SQL Server that named pipe: \Device\NamedPipe\sqlsatellitelaunch
. However, that pipe is used by SQL Server for communication with the launchpad service, and there were really no other named pipes that seemed to fit in. How about sockets and TCP/IP? That could be an option, let’s see.
Once again, do a restart of SQL Server as well as the launchpad service to make sure we are in a clean state. Make a note of the process id (PID) of SQL Server, well use it to see what socket connections SQL Server has. If Process Explorer is not running, start it up as admin. When SQL Server is restarted check what TCP connections we have by running the netstat -a -n -o -p TCP
command.
|
|
Code Snippet 4: TCP Connections SQL Server
The PID for SQL Server after I restarted was 2900, and from Code Snippet 4 we can see two listening connections, with no source as of yet. The listener for port 1434 is for the SQL Server Browser Service, whereas port 52637 is something of a mystery. Let’s now execute our code and see if anything changes (I set Sys.sleep
to 90 seconds to make sure I had enough time to see what happens). While the code is executing run netstat -a -n -o -p TCP -b
(-b
will display the executable involved in creating the connection). When I executed the code and ran the netstat command I saw the following:
|
|
Code Snippet 5: TCP Connections SQL Server while Executing
That is interesting! From Code Snippet 5 we see how SQL Server now has an established TCP connection on port 52637, and the connection originates from PID 8088. That PID is BxlServer.exe
, and - as we know from earlier - BxlServer.exe
loads SqlSatellite.dll
. We now have a TCP/IP connection from SqlSatellite to SQL Server, and it is over that connection data is exchanged.
NOTE: Not all data is exchanged over the TCP/IP socket connection, as we will see in next post.
Finally, let’s do some more WinDbg “spelunking” in order to understand how and when the TCP connection is established:
- Restart SQL Server and the launchpad service, to clean things up.
- Attach WinDbg to the SQL Server process
- Browse symbols with the name TCP
When browsing for the symbols, you can use this command: x /1 *!TCP*
. By using the option /1
you’ll only see the names, and no addresses. On my machine that gives me quite a lot, but there are two entries that catch my eye: sqllang!Tcp::AcceptConnection
and sqllang!Tcp::Close
. So let us set breakpoints at those two symbols, and see what happens when we execute our code.
The result when executing the code is that we initially break at sqllang!Tcp::AcceptConnection
. Followed somewhat later by breaking at sqllang!Tcp::Close
. Cool, this seems to work - let us set some more breakpoints and try to figure out the flow of events.
To be able to see the flow, we need to involve the launchpad service as well, as it is that service who “spins up” the Rterm process who in turn creates the BxlServer process who loads BxServerLink as well as SqlSatellite. So, remember from Internals - III what the code looked like (from a very high level) to create the executing Rterm processes, it was something like this:
|
|
Code Snippet 6: Launchpad Code Flow
As discussed in Internal - II and Internals - III, when an external script is executed 6 different Rterm processes are “spun” up, bit only one will be the executing process, and the code in Code Snippet 6 illustrates the code creating the executing Rterm process.
With this in mind, let us:
- Stop the launchpad service.
- Delete any sub folders of the user account folders in the
C:\<path_to_sql_server_instance>\MSSQL\ExtensibilityData
directory. Do NOT delete the user account folders themselves. - Restart the launchpad service.
- Attach WinDbg to the launchpad process.
Set breakpoints in the attached launchpad process at (make sure that the SQL Server breakpoints are enabled):
bp launchpad!SatelliteJobObject::AssociateProcess
.bp launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo
.bp launchpad!CSQLSatelliteConnection::WriteMessage
.
What we now want to do is to run the two WinDbg instances side by side and see when the individual breakpoints are hit, and in what order. Before we begin, clear out the WinDbg command output (.cls
). Execute the code and look closely at what happens in the two WinDbg instances.
What I see on my machine is that I break two or three times in launchpad at launchpad!SatelliteJobObject::AssociateProcess
, and after the second or third break, I break at sqllang!Tcp::AcceptConnection
in SQL Server. That is followed by launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo
and launchpad!CSQLSatelliteConnection::WriteMessage
, plus more launchpad!SatelliteJobObject::AssociateProcess
. After a while I receive two more launchpad!CSQLSatelliteConnection::WriteMessage
in the launchpad service and a sqllang!Tcp::Close
in SQL Server.
The conclusion I draw from this is that after the executing Rterm process has been created (and BxlServer, BxLinkServer and SqlSatellite), SqlSatellite connects into SQL Server.
You may notice that the launchpad!CSQLSatelliteConnection::WriteMessage
breakpoint will be hit a couple of times after the SQL Server sqllang!Tcp::AcceptConnection
. We’ll discuss that in the next Internals blogpost.
Summary
In this post we set out trying to determine what communication mechanisms there are between the various components interacting when sp_execute_external_script
is called. We also saw how a “new” component sqlsatellite.dll
is in the mix, and in the next blogpost in the Internals series we’ll look more in detail what it does.
So, Figure 13 below illustrates quite well the communication mechanisms between the various components:
Figure 13: How Communication Happens
The figure looks quite like Figure 11 and the numbers indicate:
- 1 - named pipe.
- 2 - IOCP.
- 3 - named pipe.
- 4 - TCP/IP
~ Finally
If you have comments, questions etc., please comment on this post or ping me.
comments powered by Disqus