Microsoft SQL Server R Services - Internals XI

Posted by nielsb on Friday, October 20, 2017

This is the twelfth post in a series about Microsoft SQL Server R Services, and the eleventh post that drills down into the internal of how it works. To see other posts (including this) in the series, go to SQL Server R Services.

In the last post we spoke about how the transfer of data happens between SQL Server and the R components, and we saw how data was transferred over the TCP connection between the SqlSatellite and SQL Server. In the post we mentioned that the protocol being used is Binary eXchange Language protocol(BXL), and this post was supposed to talk about BXL.

However, while I was investigating BXL I did some WinDbg “spelunking” in SQL Server and came across some interesting “stuff” that ties into how data is being sent, so I decided to post about that instead (a future post will definitely cover BXL).

Recap

In Internals - X, as mentioned above, we looked at how data was transferred from SQL Server to R. By using WireShark we could see how the data was transferred over the TCP connection between the SqlSatellite and SQL Server.

NOTE: We discussed the communications mechanisms in general and the TCP connection specifically in Internals - IX.

To see what happens in SQL Server as well as the launchpad service when a script executes we set some break-points in WinDbg, for both processes:

  • SQL Server: sqllang!SpExecuteExternalScript
  • SQL Server: sqllang!CSQLSatelliteConnection::OpenNpConnection
  • SQL Server: sqllang!CSQLSatelliteConnection::WriteMessage
  • SQL Server: sqllang!CUDXR_ExternalScript::ConnectToSatellite
  • Launchpad: launchpad!Np::AcceptConnection
  • Launchpad: launchpad!Np::ReadAsync
  • Launchpad: launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo
  • Launchpad: launchpad!CSQLSatelliteConnection::WriteMessage

The script used when investigation what happens is very simple:

1
2
3
4
exec sp_execute_external_script
                 @language = N'R' ,
                 @script = N'Sys.sleep(30)
                 d<-42'

Code Snippet 1: Basic External Script

The reason for using simple code like in Code Snippet 1, is that it might make it easier to understand what is happening, and we can compare with what is happening when executing some other, not so basic, code. Notice how in Code Snippet 1 there is a Sys.sleep. It is there to make it easier to determine - when debugging - when data is sent to R and when data is coming back.

When we executed the code in Code Snippet 1, we saw how the breakpoints where hit as below:

  1. sqllang!SpExecuteExternalScript is called.
  2. sqllang!CSQLSatelliteConnection::OpenNpConnection
  3. launchpad!Np::AcceptConnection
  4. launchpad!Np::ReadAsync
  5. launchpad!Np::ReadAsync
  6. launchpad!Np::ReadAsync
  7. sqllang!CSQLSatelliteConnection::WriteMessage
  8. launchpad!Np::ReadAsync
  9. launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo
  10. launchpad!CSQLSatelliteConnection::WriteMessage - notice how nothing happens in the SQL process until WriteMessage is executed.
  11. sqllang!CUDXR_ExternalScript::ConnectToSatellite
  12. sqllang!CSQLSatelliteConnection::WriteMessage
  13. sqllang!CSQLSatelliteConnection::WriteMessage
  14. Pause for Sys.sleep
  15. sqllang!CSQLSatelliteConnection::WriteMessage
  16. sqllang!CSQLSatelliteConnection::WriteMessage
  17. launchpad!Np::ReadAsync
  18. launchpad!CSQLSatelliteConnection::WriteMessage

The above led us to the conclusion - which I voiced in the start of this recap - that data may not be sent via the launchpad service, but over the SqlSatellite TCP connection. The reason that conclusion was achieved was that some of the sqllang!CSQLSatelliteConnection::WriteMessage did not have a corresponding launchpad!Np::ReadAsync.

In Internals - X we further proved that theory by using Process Monitor and capturing what TCP packets were sent to the SqlSatellite (or rather BxlServer.exe - which hosts SqlSatellite). When executing the code in Code Snippet 1 we saw Process Monitor output like so:

Figure 1: Process Monitor Base Output

Combined with the breakpoints the flow looked like below:

  1. sqllang!SpExecuteExternalScript is called.
  2. sqllang!CSQLSatelliteConnection::OpenNpConnection
  3. launchpad!Np::AcceptConnection
  4. launchpad!Np::ReadAsync
  5. launchpad!Np::ReadAsync
  6. launchpad!Np::ReadAsync
  7. sqllang!CSQLSatelliteConnection::WriteMessage
  8. launchpad!Np::ReadAsync
  9. TCP Connect
  10. launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo
  11. launchpad!CSQLSatelliteConnection::WriteMessage
  12. sqllang!CUDXR_ExternalScript::ConnectToSatellite
  13. TCP Receive
  14. TCP Receive
  15. sqllang!CSQLSatelliteConnection::WriteMessage
  16. TCP Receive
  17. sqllang!CSQLSatelliteConnection::WriteMessage
  18. TCP Receive
  19. Pause for Sys.sleep
  20. sqllang!CSQLSatelliteConnection::WriteMessage
  21. TCP Receive
  22. TCP Receive
  23. sqllang!CSQLSatelliteConnection::WriteMessage
  24. launchpad!Np::ReadAsync
  25. launchpad!CSQLSatelliteConnection::WriteMessage

From what we can see, the three first TCP events happens while we connect. The fourth happens after a WriteMessage, and the fifth after the second WriteMessage before Sys.sleep. The remainders will be covered when we investigate how data is returned to SQL Server.

By using *WireShark we saw how the actual script was sent to SqlSatellite in one of the TCP packets, and if data (@input_data_1 = SELECT ...) was sent, that also was transferred in TCP packet(s). In Figure 2 below we see what it looks like in Process Monitor when both script and data is sent:

Figure 2: Process Monitor Output Data Select

The two highlighted packets (with length of 350, and 6300 respectively) contains the data that is sent to SqlSatellite, but what are the two first packets being sent? That, and what in SQL Server that sends these packets are what we’ll try to figure out in the rest of this post.

Packets Sent from SQL Server

So let’s start at the top; when we look at Figure 1 we see two TCP Receive after the connection is opened, and these two have a length of 217 and 17 respectively. Regardless what we execute, we’ll always see these two “events” and they will always have the same lengths. Seeing that they appear after sqllang!CUDXR_ExternalScript::ConnectToSatellite, it may have something to do with setup of the connection, or something like that. To find out what these events are, let’s do some “spelunking” using WinDbg. Run WinDbg as admin and attach to the sqlservr.exe process.

NOTE: The Internals - I has more information how to attach WinDbg to a process, and what commands to use.

When we look at the code flow above combining breakpoints together with the Process Monitor output we see that the two first TCP Receive events (the ones we are interested in appears after sqllang!CUDXR_ExternalScript::ConnectToSatellite, but there are no sqllang!CSQLSatelliteConnection::WriteMessage preceding them. Let’s see if we can find out something based on what sqllang!CUDXR_ExternalScript::ConnectToSatellite calls. In Internals - I we used the uf [options] <address> WinDbg command to see what a particular routine did, so let’s do that for sqllang!CUDXR_ExternalScript::ConnectToSatellite. In the WinDbg console execute uf /c sqllang!CUDXR_ExternalScript::ConnectToSatellite (we’re using the /c options which displays only the call instructions). That returns something like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
sqllang!CUDXR_ExternalScript::ConnectToSatellite
  sqllang!CUDXR_ExternalScript::ConnectToSatellite+0x30:
    call to sqllang!CSatelliteProxy::RetrieveConnection
  sqllang!CUDXR_ExternalScript::ConnectToSatellite+0x5d:
    call to sqllang!CSatelliteProxy::PostSetupMessage
  sqllang!CUDXR_ExternalScript::ConnectToSatellite+0x90:
    unresolvable call: call    qword ptr [rax+48h]
  sqllang!CUDXR_ExternalScript::ConnectToSatellite+0x9e:
    call to sqllang!ex_raise_oom (00007ff8`3d183fe0)
  sqllang!CUDXR_ExternalScript::ConnectToSatellite+0xab:
    call to sqllang!CServerCargoContext::CServerCargoContext
  sqllang!CUDXR_ExternalScript::ConnectToSatellite+0xc2:
    call to sqllang!CSatelliteCargoContext::Init
  sqllang!CUDXR_ExternalScript::ConnectToSatellite+0x111:
    call to sqllang!CSatelliteCargoContext::SetupMessageWriter

Code Snippet 2: ConnectToSatellite Calls

Nothing, in Code Snippet 2, really stands out and screams “pick me, pick me” (think the donkey in Shrek) - for being responsible for the TCP events, so let’s approach it from another angle. Let’s try and see if we can find some routine that actually sends the TCP packets.

For that we’ll search in WinDbg for some routines that has TCP and Write in their names. So in the WinDbg console, execute the following: x *!*TCP*Write*. Hmm, that returned some interesting “stuff” (the result below is somewhat reduced for readability):

1
2
3
4
5
6
7
00007ff8\`3d1bac30 sqllang!Tcp::WriteSync (<no parameter info>)
00007ff8\`3d1ba9b0 sqllang!Tcp::WriteDone (<no parameter info>)
00007ff8\`3d1baf30 sqllang!Tcp::WriteSyncPost (<no parameter info>)
00007ff8\`3d1b6f40 sqllang!Tcp::GatherWriteAsync (<no parameter info>)
00007ff8\`3d1bb200 sqllang!Tcp::WriteSyncWait (<no parameter info>)
00007ff8\`3cf79e40 sqllang!Tcp::WriteAsync (<no parameter info>)
...

Code Snippet 3": Partial Output for TCP::Write

The output in Code Snippet 3 has some “promising” routines, where sqllang!Tcp::WriteAsync looks very interesting. Let’s set some breakpoints:

  • bp sqllang!CSQLSatelliteConnection::WriteMessage
  • bp sqllang!CUDXR_ExternalScript::ConnectToSatellite
  • bp sqllang!Tcp::WriteAsync

The breakpoints above are “loosely” based on what we did in [Internals - X], and hopefully they’ll give us some clarity in what happens. After having set the breakpoints execute the code in Code Snippet 1 and notice how the breakpoints are hit:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
sqllang!CSQLSatelliteConnection::WriteMessage:
sqllang!CUDXR_ExternalScript::ConnectToSatellite:
sqllang!Tcp::WriteAsync:
sqllang!Tcp::WriteAsync:
sqllang!CSQLSatelliteConnection::WriteMessage:
sqllang!Tcp::WriteAsync:
sqllang!CSQLSatelliteConnection::WriteMessage:
sqllang!Tcp::WriteAsync:
# here is sys.Sleep
sqllang!CSQLSatelliteConnection::WriteMessage:
sqllang!Tcp::WriteAsync:
sqllang!CSQLSatelliteConnection::WriteMessage:

Code Snippet 4": Breakpoints Hit

The code in Code Snippet 4 matches quite well up with what we saw in the flow in the beginning; ConnectToSatellite followed by two TCP events, followed by WriteMessage, TCP event, WriteMessage and TCP event, and then Sys.sleep. Here we see two sqllang!Tcp::WriteAsync calls after ConnectToSatellite without any WriteMessage. Maybe if we investigated the call-stack at the first sqllang!Tcp::WriteAsync, that could give us some more information:

  1. Execute the code in Code Snippet 1 again.
  2. Let it run through the WriteMessage and ConnectToSatellite breakpoints.
  3. When the first sqllang!Tcp::WriteAsync breakpoint is hit, check the call-stack: kc.

NOTE: By using kc we display a clean stack trace where each display line includes only the module name and the function name.

The call-stack after step 3 above, looks like so (I have “snipped” out some of the initial calls):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Call Site
00 sqllang!Tcp::WriteAsync
01 sqllang!CryptoBase::HandshakeWriteToken
02 sqllang!CryptoBase::ProcessAndWriteSecurityToken
03 sqllang!CryptoBase::ProcessAddProvider
04 sqllang!CryptoBase::NegotiateSecurityContext
05 sqllang!Sspi::InitX
06 sqllang!AddProvider
07 sqllang!SNIAddProviderEx
08 sqllang!CSQLSatelliteConnection::AuthenticateConnection
09 sqllang!CSQLSatelliteConnection::DataCargoAcceptTask
0a sqldk!SOS_Task::Param::Execute
[snip]

Code Snippet 5: Call Stack at First WriteAsync

When we look at the call-stack we see some routines that have to do with security; NegotiateSecurityContext, ProcessAndWriteSecurityToken, etc. We also see a routine: sqllang!SNIAddProviderEx. This refers to the SQL Server Network Interface which is a protocol layer that establishes the network connection between the client and the server. Paired this with the routine sqllang!CSQLSatelliteConnection::AuthenticateConnection it seems that our initial thoughts that the first packet has something to do with setting up the connection to the satellite may not be totally off. Now, let the code run through by pressing F5 or g in the WinDbg console?

NOTE: At this stage, the TCP connection may be closed due to timeout and you eventually will get an error.

So let’s ensure that we have the part about sqllang!CSQLSatelliteConnection::AuthenticateConnection correct, by setting a breakpoint at it (bp sqllang!CSQLSatelliteConnection::AuthenticateConnection). When you execute you will see how you hit that breakpoint breakpoint almost immediately. You can ignore that, and continue executing until you hit sqllang!CUDXR_ExternalScript::ConnectToSatellite. After you have hit ConnectToSatellite you break at sqllang!CSQLSatelliteConnection::AuthenticateConnection. When you now continue execution you’ll see something like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
sqllang!CSQLSatelliteConnection::AuthenticateConnection:
00007ff8\`9fbc1990 488bc4          mov     rax,rsp
0:054> g
Breakpoint 2 hit
sqllang!Tcp::WriteAsync:
00007ff8\`9e864a40 4c8bdc          mov     r11,rsp
0:054> g
Breakpoint 2 hit
sqllang!Tcp::WriteAsync:
00007ff8\`9e864a40 4c8bdc          mov     r11,rsp
0:050> g
Breakpoint 2 hit
sqllang!Tcp::WriteAsync:
00007ff8\`9e864a40 4c8bdc          mov     r11,rsp

Code Snippet 6: Breakpoints Hit

What happens is that straight after the AuthenticateConnection you will hit the WriteAsync breakpoint twice, the same way as we see in Code Snippet 4. The first hit at WriteAsync sort of makes sense, as it ties up with the call-stack we see in Code Snippet 5. But what about the second WriteAsync (the one that causes the second package to be sent), where does that come from? To try to figure that out, we start with the call-stack for that particular WriteAsync.

Execute the code in Code Snippet 1 again, and continue to the second WriteAsync. When you hit the breakpoint do a kc again. The call-stack should now look somewhat like this (this time it is the full call-stack):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
 # Call Site
00 sqllang!Tcp::WriteAsync
01 sqllang!CryptoBase::HandshakeWriteToken
02 sqllang!CryptoBase::ProcessAndWriteSecurityToken
03 sqllang!CryptoBase::ProcessAddProvider
04 sqllang!SNIProcessAddProviderOnWorker
05 sqldk!SOS_Task::Param::Execute
06 sqldk!SOS_Scheduler::RunTask
07 sqldk!SOS_Scheduler::ProcessTasks
08 sqldk!SchedulerManager::WorkerEntryPoint
09 sqldk!SystemThread::RunWorker
0a sqldk!SystemThreadDispatcher::ProcessWorker
0b sqldk!SchedulerManager::ThreadEntryPoint
0c KERNEL32!BaseThreadInitThunk
0d ntdll!RtlUserThreadStart

Code Snippet 7: Call Stack at Second WriteAsync

Comparing the call-stack in Code Snippet 7 with the one in Code Snippet 5 we see that they are somewhat similar. We also see that sqllang!CSQLSatelliteConnection::AuthenticateConnection is nowhere to be seen in the second call-stack. However it looks like this call has something to do with SNI as there is the sqllang!SNIProcessAddProviderOnWorker call. So what causes the second WriteAsync? As we did with sqllang!CUDXR_ExternalScript::ConnectToSatellite let’s see what calls AuthenticateConnection makes (uf /c sqllang!CSQLSatelliteConnection::AuthenticateConnection):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
sqllang!CSQLSatelliteConnection::AuthenticateConnection
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x9d:
    call to sqllang!SNISetInfo
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0xbc:
    call to sqllang!SNIAddProviderEx
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x100:
    call to sqllang!EventInternal<SuspendQueueSLock>::Wait
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x13c:
    call to sqllang!RefCountImpl<CSQLSatelliteConnection>::Release
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x14a:
    call to sqllang!CSQLSatelliteConnection::FireAuthenticationXEvent
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x163:
    call to KERNEL32!TlsGetValueStub
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x17a:
    call to sqldk!SystemThread::MakeMiniSOSThread
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x189:
    call to KERNEL32!TlsGetValueStub
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x1bf:
    call to sqllang!AutoSwitchPreemptive::AutoSwitchPreemptive
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x1d5:
    call to sqllang!CSQLSatelliteConnection::AuthorizeNamedPipeConnection
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x204:
    call to sqllang!SNIGetInfo
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x21c:
    call to SSPICLI!QueryContextAttributesW
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x239:
    call to ADVAPI32!CheckTokenMembershipStub
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x241:
    call to KERNEL32!GetLastErrorStub
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x26c:
    call to sqllang!CSQLSatelliteConnection::FireAuthorizationXEvent
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x2a8:
    unresolvable call: call    qword ptr [rax+10h]
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x2b0:
    call to sqllang!SOS_ExternalAutoWait::~SOS_ExternalAutoWait
  sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x2da:
    call to sqllang!SNIRemoveProvider

Code Snippet 8: Calls from AuthenticateConnection

Looking at the code in Code Snippet 8, we can see the call to sqllang!SNIAddProviderEx (line 5), but nowhere we see sqllang!SNIProcessAddProviderOnWorker. A first thought would be that SNIProcessAddProviderOnWorker is called from another routine, but looking at the call-stack for the second WriteAsync call (Code Snippet 7), we cannot see any particular method that would call SNIProcessAddProviderOnWorker. The only thing we see is that the first call in the call-stack is a start of a new thread: ntdll!RtlUserThreadStart.

Hmm, based on the fact that a new thread is started and we’re doing SNI “stuff”, we may assume that we are still inside AuthenticateConnection. With that in mind we can now start “spelunking” away in AuthenticateConnection. From the code in Code Snippet 8 wee see that the call to SNIAddProviderEx happens at sqllang!CSQLSatelliteConnection::AuthenticateConnection+0xbc. We know that the first call to WriteAsync happens after SNIAddProviderEx, which then tells us that the second WriteAsync must happen after sqllang!CSQLSatelliteConnection::AuthenticateConnection+0xbc. Next call is at sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x100, so let us set a break-point there.

When you execute again, (and ignore the first breaks on AuthenticateConnection and AuthenticateConnection+0x100), you will eventually see the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
sqllang!CUDXR_ExternalScript::ConnectToSatellite:
mov     qword ptr [rsp+8],rcx ss:0000010d\`6e4fc2e0=000002ab0d57d620

sqllang!CSQLSatelliteConnection::AuthenticateConnection:
mov     rax,rsp

sqllang!Tcp::WriteAsync:
mov     r11,rsp

sqllang!CSQLSatelliteConnection::AuthenticateConnection+0x100
call    sqllang!EventInternal<SuspendQueueSLock>::Wait

sqllang!Tcp::WriteAsync:
mov     r11,rsp

Code Snippet 9: Calls from AuthenticateConnection 2

Our hunch was right, we can see how we are breaking at AuthenticateConnection+0x100 right before the second WriteAsync. In other words the two packets we are interested in are both sent during authentication of the TCP/IP connection between SQL Server and SqlSatellite.

Summary

In this post we set out to find out more about the two TCP/IP packets that are sent from SQL Server to the SqlSatellite, before the actual packages with the script etc., are sent. After some WinDbg “spelunking” we figured out that the packets were sent during the connection to the SqlSatellite, more specifically during sqllang!CSQLSatelliteConnection::AuthenticateConnection.

In Internals - I I had a figure which tried to illustrate what happens in SQL Server when executing sp_execute_external_script, and - based on that figure - I created a new figure where I have included what we found out in this post:

Figure 3: Flow in SQL Server

The numbered arrows shows the communication out from SQL Server, and in what order it happens:

  1. sqllang!CSQLSatelliteConnection::OpenNpConnection - SQL Server opens named pipe connection to the launchpad service.
  2. sqllang!CSQLSatelliteConnection::WriteMessage - Message sent to the launchpad service.
  3. After the call above, the SqlSatellite opens a TCP/IP connection to SQL Server.
  4. During sqllang!CSQLSatelliteConnection::AuthenticateConnection SQL Server sends the first packet to the SQL Satellite. The packet has something to do with authentication.
  5. After the first packet is sent, a second packet (still within sqllang!CSQLSatelliteConnection::AuthenticateConnection), which also is related to authentication, is sent on a separate thread to SqlSatellite.

Thus we have figured out what the two first packets are, and where they are sent from.

In the next post we’ll look at what is sending the packets containing the scripts, and what the flow looks like.

~ Finally

If you have comments, questions etc., please comment on this post or ping me.


comments powered by Disqus