This is the 20:th post in a series about Microsoft SQL Server R Services, and the 19:th post that drills down into the internal of how it works. To see other posts (including this) in the series, go to SQL Server R Services.
We are nearing the end of the Internals series, and if we look at what we have covered, we have almost come full circle.
Recap
In this Recap, let’s look back at some of the posts that has “bearing” on this particular post.
The
Internals - I post discussed how SQL Server opened a named pipe connection into, and called the, launchpad service when the proc sp_execute_external_script
is executed. In
Internals - VIII we looked at what different components are involved when the procedure executes:
- SQL Server calls into the launchpad service.
- The launchpad service calls into the
Rlauncher.dll
. - The
Rlauncher.dll
creates Rterm processes. - Through Rterm, the
R.dll
is loaded together withRxLink.dll
. - The
RxLink.dll
creates theBxlServer.exe
process. - To coordinate with SQL Server,
BxlServer.exe
loadsBxServerLink.dll
.
The above can be illustrated as in Figure 1 below
Figure 1: Components In Play
Based on the above, the question we then tried to answer in Internals - IX was how the components are communicating and what sort of communication mechanisms there are. We determined that there were different communication mechanisms, and used a figure to illustrate those various mechanisms:
Figure 2: How Communication Happens
The numbers indicate:
- 1 - named pipe.
- 2 - IOCP.
- 3 - named pipe.
- 4 - TCP/IP
In Internals - XVI and XVII we looked at how result sets were returned from the SqlSatellite to SQL Server, and we realized the data were delivered via data packets over the TCP connection between the SqlSatellite and SQL Server. In Internals - XVII, we also tried to understand what happened, code wise, in SQL Server when data packets were returned, and the following figure were used to illustrate the flow:
Figure 3: Code Flow Receiving Data from SqlSatellite
Finally, in Internals - XVIII we talked about how output parameters are handled between the SqlSatellite and SQL Server and we saw how the output parameters come back in a data packet over the TCP connection between the SqlSatellite and SQL Server. They are handled in SQL Server as per this figure:
Figure 4: Code Flow Receiving Output Arguments from SqlSatellite
The numbered arrows in both Figure 3 and Figure 4 show the communication out to and in from SQL Server, and in what order it happens:
- SQL Server opens named pipe connection to the launchpad service.
- Message sent to the launchpad service.
- After the call above, the SqlSatellite opens a TCP/IP connection to SQL Server.
- SQL Server sends the first packet to the SQL Satellite for authentication purposes.
- A second authentication packet is sent to SqlSatellite.
- The script is sent from inside
sqllang!CSatelliteProxy::PostSetupMessage
- The data for
@input_data_1
is sent together with two end packets. - Packet are coming back to SQL Server containing meta-data and the actual data.
PullRow
is being signaled.- Execution continues to process meta-data as well as result rows.
- After processing result rows, output arguments are processed.
So, we have seen in various posts how meta-data, result-sets and output parameters, etc. are send to SQL server from the SqlSatellite using the TCP connection. In this post we’ll look at other types of data going back to SQL server.
Housekeeping
Before we “dive” into todays topics let’s look at the code and the tools we’ll use. This section is here here for those of who want to follow along in what we’re doing in the posts.
Helper Tools
To help us figure out the things we want, we will use Process Monitor, and WinDbg:
- Process Monitor, will be used it to filter TCP traffic. If you want to refresh your memory about how to do it, we covered that in Internals - X as well as in Internals - XIV.
- In the last few posts I have used WinDbg preview a my debugger, but in this post I am reverting back to WinDbg “classic”. If you need help with setting it up, that is covered in Internals - I.
Code
In this post we won’t need a specific database, we’ll just execute variants of some very, very basic code:
|
|
Code Snippet 1: Base Code to Execute
The Sys.sleep
is in the script to make it easier for us to see what happens, and in what order.
Messages
What is this Messages in the header of this section? I am using that name for data that is not result-set data or output parameter values. E.g “stuff” that appears in the Messages tab in SSMS. Execute the code in Code Snippet 2 and you see something like so (you can comment out the Sys.sleep
if you wish):
Figure 5: Message in SSMS
The question is that, were does the message in Figure 5 originate from? What you see in Figure 5 may not be the best example, so let’s look at the following code:
|
|
Code Snippet 2: Code with Print
When the code in Code Snippet 2 is executed (once again, you can comment out the Sys.sleep
), the output looks like so:
Figure 6: Message From cat Statement
In Figure 6 it is clear that the “message” comes from the R engine, and the question is how does it get to SQL Server? As mentioned above, we have seen in previous posts how the data has been sent in data packets over the TCP connection. Is that perhaps the same thing here, and how do we find out? We’ll start with what we saw in Figure 5 - the output from Code Snippet 1.
Acknowledgments
When writing this post I realized that what we see in Figure 5 is the result of an acknowledgment from the R “environment”, that the “job” is finished, and below we’ll see how I came to that conclusion.
We’ll use Process Monitor to begin with:
- Run Process Monitor as admin and load the filter we used in
Internals - XVI where we filtered for
TCP Send
andTCP Receive
forBxlServer.exe
. - Execute the code in Code Snippet 1 (without the
Sys.sleep
if you want) and look at the output from Process Monitor:
Figure 7: Return Packets
As you see in Figure 7 three packets are returned to SQL Server after execution (outlined in red), and we have discussed in previous posts what they are. Potentially the “Commands completed successfully.” message could have been part of one of those three packets, but from what we know it is unlikely. We should be able to determine if a message is part of any of those three packets by executing the code in Code Snippet 2, but with a cat
statement like this: cat("A very, very, very long string to see if the packet sizes differ")
.
The idea is that having a long message string, we should see an impact on either of the data packets being sent back to SQL Server. However after we execute, the packets sent back to SQL Server look no different, so it seems that the message is sent back by other means, or generated in some other way.
Ok, so if the data is not sent back on the TCP connection between the SqlSatellite and SQL Server, then the data has to go back over the launchpad connection. Can we somehow see if that is what’s happening? If we believe that some data is passed from the R engine to SQL Server via the launchpad service, then the data has to be passed over the IOCP as well as the named pipes. So if we can “trap” the message when it arrives in the launchpad service, also see if/when it is sent back to SQL Server, and then finally “catch” it in SQL Server. That should “prove” out theory.
NOTE: As we only have symbol files for SQL Server and the launchpad service, we need to do it the way it is outlined above.
Right:
- Start up two instances of WinDbg as admin.
- Attach WinDbg to the SQL Server process (
sqlservr.exe
), and the launchpad process (launchpad.exe
) - DO NOT DO THIS ON A PRODUCTION SERVER.
Let’s start with the launchpad service and search for routines that might be involved in reading data coming into the launchpad service as well as sending data out to SQL Server. So let’s start with looking for routines in the launchpad
module: x launchpad!*Read*
. When you execute that, you see there are quite a few. However when browsing through the output, there are some that looks interesting in the CSQLSatelliteConnection
class:
launchpad!CSQLSatelliteConnection::ReadCallBack
launchpad!CSQLSatelliteConnection::ReadCallBackInternal
launchpad!CSQLSatelliteConnection::ReadOneFullMessage
Let’s set a breakpoint at launchpad!CSQLSatelliteConnection::ReadOneFullMessage
and execute the code in Code Snippet 1, this time ensure that the Sys.sleep
statement is in the code, so that if the breakpoint is hit, we’ll know that it is after execution. Sure enough the breakpoint is hit almost immediately, and after letting the code continue, there is a pause and the breakpoint is hit a second time. At the second “hit” have a look at the call-stack: k
:
|
|
Code Snippet 3: Call Stack at ReadOneFullMessage
The call-stack we see in Code Snippet 3 clearly shows that our “hunch” about IOCP seems correct (sqldk!SOS_Node::ListenOnIOCompletionPort
), and we see that the call flow includes all three Read...
routines that we found above. If we wanted to see what happens inside the routines, we can disable the break-point at ReadOneFullMessage
and set a break-point at ReadCallBackInternal
instead, and then execute the code in Code Snippet 1 again. When the ReadCallBackInternal
break-point is hit the second time (after the pause), do a “trace and watch” wt
and let the code run to completion.
NOTE: The reason why you set the break-point on
ReadCallBackInternal
, and notReadOneFullMessage
is that awt
onReadOneFullMessage
won’t give you much information at all, whereasReadCallBackInternal
shows you quite a lot of interesting “stuff”.
When the code has run to completion, copy the trace to a separate file, to make it easier to investigate:
|
|
Code Snippet 4: Routines During ReadCallBackInternal
When I looked at the trace, I found some interesting routines, which are shown in Code Snippet 4. When I see the code above, I can’t help but to think about what we saw in
Internals - XVII, where sqllang!CUDXR_ExternalScript::PullRow
was signaled by sqllang!EventInternal<SuspendQueueSLock>::SignalAll
. Here the signal happens through launchpad!EventInternal<SuspendQueueSLock>::SignalAll
, and after the signal, “stuff” is enqueued and locks are released. If this is the case, what is being signaled then?
Hmm, let us think back one of the posts in the beginning of the Internals “journey”, and specifically Internals - II. In that post we looked at the code-flow, when SQL Server called into the launchpad service, and we saw a figure like so:
Figure 8: Launchpad Service Call Flow
From Figure 8 we see how the routine launchpad!CLaunchContext::LaunchServTask
seems to be the “main” method. Maybe that’s what’s being signaled, let’s see if we can figure it out:
- Disable the existing break-points in the launchpad process.
- Set a break-point at
launchpad!CLaunchContext::LaunchServTask
. - Change the
Sys.sleep
in Code Snippet 1 toSys.sleep(120)
.
Execute the code and when you hit the LaunchServTask
break-point, do a wt -l4
(trace four levels deep). The execution will continue, until the control has been passed over to the R engine, and its various components. At this stage LaunchServTask
has not finished execution, and you’ll see something like so at the end of the trace:
|
|
Code Snippet 5: Wait for Continuation
So, in Code Snippet 5, we see where LaunchServTask
“waits” for something to complete. When the script code has finished executing (after the Sys.sleep
), you’ll see the trace continue - and it will continue with:
|
|
Code Snippet 6: Code Continues
What we see in Code Snippet 6 is the continuation of the code, based on the “signal” in Code Snippet 4. If we then investigate the latter part of the trace further we’ll see:
|
|
Code Snippet 7: Tear Down Connection and Send Ack
In Code Snippet 7 we see how after continuing the execution and being signaled in Code Snippet 6, we:
- Retrieve the message
- Tear down the connection
- Removes the session
- Sending an ack message, through
WriteMessage
NOTE: Seeing the
launchpad!CSQLSatelliteCommunication::SendAckMessage
was what made me realize that we were looking at acknowledgments.
Seeing that the connection to the SqlSatellite is torn down, we can with some certainty say that the ack will be sent to SQL Server. So now, let’s include SQL Server in our debug session as well:
- Disable the
LaunchServTask
break-point in the launchpad process. - Set a break-point in the launchpad process at
launchpad!CSQLSatelliteCommunication::SendAckMessage
. - In the SQL Server process set a break-point at
sqllang!NP::ReadAsync
.
The theory with the above is that after we break at the SendAckMessage
; when we continue we should immediately break at the SQL Server ReadAsync
break-point. Change the Sys.sleep
back to Sys.sleep(10)
and execute the code in Code Snippet 1. You’ll now break in the SQL Server process at ReadAsync
, this is when the connections etc., are set up between SQL Server and the launchpad. Just continue on from there, and there will be a couple of more breaks before SendAckMessage
- three more on my box, so four in total. Now wait until the break-point is hit at SendAckMessage
, continue on from there and watch how you immediately break in the SQL Server process at ReadAsync
. The call-stack at this stage:
|
|
Code Snippet 8: Call Stack at ReadAsync
Hmm, there are some routines we recognize in the call-stack in Code Snippet 8, we see the ReadCallback...
which we also saw in the launchpad process and if you remember
Internals - XVII, we realized that ReadCallbackInternal
was used to signal to PullRow
, that data had arrived. The question is if something is signaled here as well? I doubt it is PullRow
, seeing that this has nothing to do with processing result-sets. How can we find out about this?
In
Internals - XVII we saw how after PullRow
was signaled, ProcessResultRows
were called and in there as one of the first calls were a call to sqllang!CSQLSatelliteCommunication::GetSQLSatelliteMessage
. Maybe something similar is happening here. The way we can find is to set a break-point in the SQL Server process at sqllang!CSQLSatelliteCommunication::GetSQLSatelliteMessage
, but disable it initially.
Execute the code in Code Snippet 1 again, and continue until you hit the ReadAsync
break-point in the SQL server process, after the SendAckMessage
(as we did above). While you are in the ReadAsync
break-point, enable the GetSQLSatelliteMessage
in the SQL Server process, and continue the code execution. You’ll see how the GetSQLSatelliteMessage
break-point is hit immediately after you have enabled it and continued from ReadAsync
. When you look at the call-stack (k
), it looks something like so:
|
|
Code Snippet 9: Call Stack at GetSQLSatelliteMessage
It looks like we are onto something here, e.g there is the sqllang!CSQLSatelliteCommunication::WaitForAckMessage
routine, which seems to be called during shutdown of the session. From what we can see from the call-stack it also looks like the “main” routine is sqllang!CXStmtQuery::ErsqExecuteQuery
, and my guess it is that routine which is being signaled. That’s what it looks like for code like in Code Snippet 1:
In the launchpad service:
- It gets an IOCP notification.
- It reads the notification through the
Read...
routines, and signals it. - The
LaunchServTask
wakes up and callslaunchpad!CSQLSatelliteCommunication::SendAckMessage
.
In SQL Server:
- It reads the message sent from the launchpad service.
- Signals into
ErsqExecuteQuery
- Handles the message.
What about code like in Code Snippet 2 where there is the equivalent of a “print” statement - what happens there?
Messages Generated in the External Script
What I noticed when I was playing researching this post, was that when I executed the code in Code Snippet 1 - I saw one sqllang!Np::ReadAsync
in the SQL Server process after the sleep in the code, and we now know it is because of the the SendAckMessage
. However, when executing the code in Code Snippet 2 (with the cat
statement), all of a sudden I saw two ReadAsync
but only one SendAckMessage
. So something else seems to send a packet to SQL Server, but what?
I started trying to trace through launchpad!CLaunchContext::LaunchServTask
with wt
, but couldn’t do it completely as I received errors. So instead I looked at sending of packets in the launchpad service, and I set a break-point at launchpad!CSQLSatelliteConnection::WriteMessage
. When executing the code in Code Snippet 1, the break-point was hit initially once immediately and the once after the sleep. The call-stack at this point looked like so:
|
|
Code Snippet 10: Call Stack in Launchpad SendAckMessage
The call-stack we see in Code Snippet 10, is more or less what we see in Code Snippet 7. However, when we execute the code in Code Snippet 2, and we break at the first WriteMessage
after the sleep, the call-stack looks as follows:
|
|
Code Snippet 11: Call Stack in Launchpad with Print Message
So the call-stack in Code Snippet 11 is completely different than in Code Snippet 10. Well, we are still inside LaunchServTask
, but LaunchServTask
calls a cleanup routine, and within that routine calls to log routines are called: launchpad!SQLSatellite_LogWString
, and launchpad!CSatelliteCargoContext::SendLogMessage
. The SendLogMessage
routine is the routine which ultimately sends the message to SQL Server. When the code continues, the WriteMessage
breakpoint is hit the second time, and at this stage the breakpoint looks exactly the same as in Code Snippet 10. This then means that the ack message is sent after the message from which to log.
What is worth noticing is that even though the launchpad service sends two messages to SQL Server - there is only one message coming in from the external engine, e.g. the actual “print” message comes together with the ack message, as in Code Snippet 3.
Now, let’s see what it looks like at the SQL Server side, when we execute code like in Code Snippet 2:
- Disable all breakpoints in the launchpad service, except for
SendAckMessage
. - Set a break-point at
launchpad!CSatelliteCargoContext::SendLogMessage
. - Keep the break-point in the SQL Server process for
ReadAsync
. - Disable all other breakpoints in the SQL server process, including the
GetSQLSatelliteMessage
break-point.
Execute the code in Code Snippet 2, and let the code continue until you hit the SendLogMessage
break-point in the launchpad process. Continue the execution, and when you hit ReadAsync
in SQL server, enable the GetSQLSatelliteMessage
break-point. Continue the execution and when the GetSQLSatelliteMessage
breakpoint is hit, check the call-stack. Continue execution until the SendAckMessage
break-point is hit in the launchpad process, and then continue until GetSQLSatelliteMessage
in SQL Server is hit again. Check the call-stack, and compare them.
You’ll see how they are identical, and exactly like what we see in Code Snippet 9. So if the call-stacks look exactly the same (from what we can see), what then causes the “print” statement to be output? It turns out that it is somewhat similar to what happens in the launchpad process where the Cleanup
routine leads into the SendLogMessage
call. So in the SQL Server process, the sqlmin!CQueryExecContext::FinalCleanup
routine calls into sqllang!CUDXR_ExternalScript::ShutdownService
, which - dependent on if a message comes back from the R engine - will read it in and then call a routine which handles the message: sqllang!CUDXR_ExternalScript::HandleLogMessage
:
|
|
Code Snippet 12: Call Flow in SQL Server Log Message
When a “log” message is passed into SQL Server the code path is as in Code Snippet 12, when no log message is passed in after the BindReadSNIPacket
in Code Snippet 12, the code will drop out until the GetNextMessage
at the end of Code Snippet 12
We started this blog-post by asking where the message we see in Figure 5 comes from, and what causes it. By now is should be clear that what causes it; is the SendAckMessage
routine in the launchpad service. The message indicates that all work has been done. For the actual message we see, I believe it is generated by SSMS directly, it does not come from the launchpad service.
When a message is explicitly generated in the R script, that message goes back to the launchpad service which calls SendLogMessage
. After the SendLogMesage
the SendAckMessage
will be called as well.
Error Messages
So far we’ve seen how “print” messages generated from R are handled. What about error messages?
That has to be covered in next post. When I started this post I thought I knew how error messages was handled - but when I looked at it closer, I realized there was more to it than I thought, so I need to do some more investigations. Sorry!
Summary
In this post we have seen how “print” messages originating form the external script as well as acknowledgment messages from the external environment are handled by the launchpad service as well as SQL Server.
Launchpad
The launchpad service:
- Listens on callbacks from the external engine
- The callback signals
launchpad!CLaunchContext::LaunchServTask
. LaunchServTask
continues after signal intolaunchpad!CLaunchContext::Cleanup
- If a “print” message comes from the external engine,
launchpad!CSatelliteCargoContext::SendLogMessage
is called. - Regardless if there is a “print” message or not, launchpad will tell SQL Server that the external engine is “done” via
launchpad!CSQLSatelliteCommunication::SendAckMessage
.
The figure below tries to illustrate the launchpad part:
Figure 9: Launchpad Flow
In Figure 9 we see the main calls of leading up to and including when messages come back from the external engine. The numbered arrows show some of the calls and the communication in the launchpad service:
- During startup an IOCP “listener” is set up.
- Creates supporting directories and process, and initiates the connection to the external engine.
- After work is done, a call comes back from the external engine.
- This causes the “listener” to call
Read...
routines. LaunchServTask
is signaled.- Execution in
LaunchServTask
continues into launchpad!CLaunchContext::Cleanup. - If a “print” message has been passed back from the external engine, eventually
launchpad!CSatelliteCargoContext::SendLogMessage
is called. launchpad!CSatelliteCargoContext::SendLogMessage
sends the message to SQL Server.LaunchServTask
continues intolaunchpad!CSQLSatelliteCommunication::SendAckMessage
, which sends an ack that the “job is done” to SQL Server.
SQL Server
In SQL Server the flow is something like so:
- As in the launchpad service, SQL Server listens for callbacks, on the named pipe connection to the launchpad service.
- A callback comes in, is being read and signals
sqllang!CXStmtQuery::ErsqExecuteQuery
. - Execution continues and
sqlmin!CQueryExecContext::FinalCleanup
is called. FinalCleanup
calls intosqllang!CUDXR_ExternalScript::ShutdownService
.- If a “print” message has been sent,
sqllang!CUDXR_ExternalScript::HandleLogMessage
is called. - Independent of whether a “print” message is sent or not,
sqllang!CSQLSatelliteCommunication::WaitForAckMessage
will handle the final ack message.
As with the launchpad service I have tried to create a figure illustrating the flow in SQL Server:
Figure 10: SQL Server Flow
The numbered arrows in Figure 10 illustrates:
- During launch of the launchpad service, an IOCP “listener” is set up.
- Message sent to the launchpad service.
- After the call above, the SqlSatellite opens a TCP/IP connection to SQL Server.
- Authentication packets and script packet is sent to the SqlSatellite.
- If there is data for
@input_data_1
, it is sent, together with two end packets. - Packets are coming back to SQL Server containing meta-data and the actual data.
PullRow
is being signaled, and processing of meta-data as well as result rows continues.- The launchpad service sends “print”/ack messages to SQL Server. In the figure it is illustrated as one packet, but there is one message for “print” and one for ack.
- A signal is sent to
ErsqExecuteQuery
. - Execution continues by calling into
sqllang!CXStmtQuery::ShutdownNormal
, which calls intoFinalCleanup,
ShutdownService, and a call tosqllang!CSQLSatelliteCommunication::WaitForAckMessage
is made followed bysqllang!CSQLSatelliteConnection::GetNextMessage
. - If a “print” message has been sent to SQL Server
ReadPayload
is called and thensqllang!CUDXR_ExternalScript::HandleLogMessage
. If no “print” message has been sent in, the code loops onsqllang!CSQLSatelliteCommunication::WaitForAckMessage
. - SQL Server loops on
sqllang!CSQLSatelliteCommunication::WaitForAckMessage
and waits for the ack message, - When the ack message has been received, the eventual messages are sent to the calling code.
~ Finally
If you have comments, questions etc., please comment on this post or ping me.
comments powered by Disqus