Installing R Packages in SQL Server Machine Learning Services - II

Posted by nielsb on Saturday, June 30, 2018

This post is the second post in a series about installing R packages in SQL Server Machine Learning Services.

To see all posts in the series go to Install R Packages in SQL Server ML Services Series.

Why this series came about is a colleague of mine Dane pinged me and asked if I had any advice as he had issues installing an R package into one of their SQL Server instances. I tried to help him and then thought it would make a good topic for a blog post. Of course, at that time I didn’t think it would be more posts than one, but here we are.

Recap

In Installing R Packages in SQL Server Machine Learning Services - I we introduced Dane and how he wanted to install some R packages into SQL Server Machine Learning Service instance. I said that before you start installing packages into SQL Server Machine Learning Services, you should make sure you have Rtools installed on the machine SQL Server ML Services runs on.

You need RTools if you ever need to compile R packages from source, as Rtools contains compilers and utilities to do this. You do not install Rtools for each SQL Server ML Services instance you have on the server, but just once, and you need to install it from an account with elevated permissions.

In the first post I said that there were three ways to install packages into SQL Server ML Services:

  • R packet managers
  • T-SQL
  • RevoScaleR

So in the first post, I covered R packet managers, where an R packet manager is an R command line tool or GUI installed on the SQL Server Machine Learning Services machine. The packet manager should be run with elevated permissions and target the R engine for the instance on which you want to install the package. The easiest is to use either of the R tools that come as part as part of SQL Server’s R service:

  • The command line tool: Rterm.exe.
  • The GUI: Rgui.exe.

These two packet managers lives in the \\<path_to_SQL_Server_instance>\R_SERVICES\bin\x64 directory. When you install packages via an R packet manager, they can only be installed to the default packet library for that instance. The file system folder for this library has restricted access and to write to this folder you need elevated permissions. Typical code for installing packages from a packet manager can look like so:

1
2
3
4
5
# get the library path
libPath <- .libPaths()[1]
install.packages("pkg_name", lib = libPath, 
                  repos = "url_for_the_repo", 
                  dependencies = TRUE)

Code Snippet 1: Install Packages Command

In Code Snippet 1 we see how I first retrieve the library path. The library path is where I install the package to, and once again - I can only install to the default instance directory. Then in the install.packages call I use these parameters:

  • First parameter is always the name of the package(s) to install.
  • lib: is for the library folder to install to.
  • repos: the base URL(s) of the repositories to use. If left out, the repo used is the Microsoft MRAN repo, which may not be what you want.
  • dependencies: indicating whether to also install missing packages which these packages depend on/link to/import/suggest (and so on recursively).

The big drawback with using an R packet manager is that the user who does it need to be able to run the chosen packet manager from an elevated prompt and from the box SQL Server is installed on!

In this post, we look at how we can install without being admin, and without having to be logged onto the box as admin.

Housekeeping

Before we “dive” into today’s topics let us look at the code we use today. This section is here for those of who want to follow along in what we are doing in the post.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
IF NOT EXISTS(SELECT 1 FROM sys.server_principals 
              WHERE name = 'dane')
BEGIN
  CREATE LOGIN dane
  WITH PASSWORD = 'password1234$';
END

IF NOT EXISTS(SELECT 1 FROM sys.server_principals 
              WHERE name = 'nielsb')
BEGIN
  CREATE LOGIN nielsb
  WITH PASSWORD = 'password1234$';
END

IF NOT EXISTS(SELECT 1 FROM sys.server_principals 
              WHERE name = 'user1')
BEGIN
  CREATE LOGIN user1
  WITH PASSWORD = 'password1234$';
END

IF NOT EXISTS(SELECT 1 FROM sys.server_principals 
              WHERE name = 'user2')
BEGIN
  CREATE LOGIN user2
  WITH PASSWORD = 'password1234$';
END

IF NOT EXISTS(SELECT 1 FROM sys.server_principals 
              WHERE name = 'user3')
BEGIN
  CREATE LOGIN user3
  WITH PASSWORD = 'password1234$';
END

DROP DATABASE IF EXISTS DataScienceDB;
GO

CREATE DATABASE DataScienceDB;
GO

USE DataScienceDB;
GO

CREATE USER dane
FROM LOGIN dane;

CREATE USER nielsb
FROM LOGIN nielsb;

CREATE USER user1
FROM LOGIN user1;

CREATE USER user2
FROM LOGIN user2;

CREATE USER user3
FROM LOGIN user3;
GO

ALTER ROLE db_owner
  ADD MEMBER nielsb;
GO

Code Snippet 2: Create Logins, Database and Users

In Code Snippet 2 we create some logins as well as a database and in that database users for the logins. As you see, we do continue with the “theme” of Dane the data scientist wanting to do “stuff” in the database. As nielsb is seen to be “trustworthy” (take that Dane), we add him to the db_owner role.

Permissions and sp_execute_external_script

This topic is not directly related to installing R packages via RevoScaleR, but unless you get the permissions right for users that want to call sp_execute_external_script (SPEES), the rest of the post will not make any sense anyhow. Fortunately I wrote a blog post about SPEES and the permissions required, so let us follow what is in that post and set the permissions for the users we created in Code Snippet 2. The code below must be run by sysadmin:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
-- the grant to execute SPEES to public only 
-- needs to be run once on the instance
USE master;
GO

GRANT EXECUTE ON sp_execute_external_script TO public;
GO

USE DataScienceDB;
GO

GRANT EXECUTE ANY EXTERNAL SCRIPT TO dane;
GRANT EXECUTE ANY EXTERNAL SCRIPT TO user1;
GRANT EXECUTE ANY EXTERNAL SCRIPT TO user2;
GRANT EXECUTE ANY EXTERNAL SCRIPT TO user3;
GO

Code Snippet 3: Grant Execute Permissions to Users

We see how the code in Code Snippet 3 first, from within the master database, grants execution rights on SPEES to public. This only needs to be run once, and arguably this should be part of enabling SPEES.

We then switch over to the database where the users exist, and we grant the EXECUTE ANY EXTERNAL SCRIPT rights to the individual users that should be able to use SPEES. However, what about nielsb, should he not be granted the permission as well? No, as he is part of db_owner he automatically has the necessary permissions.

RevoScaleR Package Management

In the recap above I mentioned about being able to install from a remote box and not being an admin on the SQL Server box, and that is what RevoScaleR gives you. RevoScaleR 9.0.1 and later includes functions for R package management where these functions can be used by remote, non-administrators to install packages on SQL Server without direct access to the server. To install you can use both remote R clients as well as sp_execute_external_script from SSMS. To install using RevoScaleR functions you need to execute inside a SQL Server Compute Context (SQLCC).

NOTE: To read more about SQLCC look at the sp_execute_external_script and SQL Compute Context - I post.

Remember from my SQLCC post how we set up an SQLCC:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# set up the connection string
sqlServerConnString <- "Driver=SQL Server;server=<instance>;
                        database=<the_db>;
                        uid=<the user id>;pwd=<the password>"

# set up a compute context
sqlCtx <- RxInSqlServer(connectionString = sqlServerConnString, 
                        numTasks = 1)
# set the compute context to be the sql context
rxSetComputeContext(sqlCtx)

Code Snippet 4: R Code - SQL Server Compute Context

In Code Snippet 4 we see how we set up an SQLCC by defining a connection string which then implies that any R package installation is specific to a SQL Server instance as well as a database.

Configuration

Above I wrote that even non-admins could install R packages but that requires configuration and to use RevoScaleR you need to enable remote management of R packages both on the SQL Server instance as well as for a particular database. To enable, an admin on the SQL Server box uses a RevoScaleR command line utility: RegisterRext.exe. The same utility can also be used to disable package management on a database and/or instance level.

The location for RegisterRext.exe is at: <path_to_sql_instance>\R_SERVICES\library\RevoScaleR\rxLibs\x64\RegisterRext.exe and when you run it you need to run it as admin. These are the available commands when executing:

  • /install - copies the launcher binaries to the right location for a SQL server instance. Not applicable in this post.
  • /uninstall - removes the launcher binaries from the SQL server instance. Not applicable in this post.
  • /installpkgmgmt - installs package management support for given SQL server instance and database also if specified.
  • /uninstallpkgmgmt - uninstalls package management support from given SQL server instance and database also if specified.
  • /installrts - installs real time scoring support for given SQL server instance and database. Topic for a future post.
  • /uninstallrts - uninstalls real time scoring support for given SQL server instance and database. Topic for a future post.

We send in parameters to tell RegisterRext where to execute:

  • /instance:value is an optional parameter, we use <value> as the instance name, else the commands are performed on the default instance. Please note that the name is just that; not /instance:server_name\instance_name but /instance:instance_name. It took me a while to figure that out.
  • /database:value is an optional parameter where <value> indicates what database to run on.
  • /user:username is an optional parameter, and we use the <username> specified to connect via the SQL authentication mode. Note that once this parameter is specified, /password:<value> is a required parameter.
  • /password: * | <password> is a required parameter once the /user parameter is provided. You can use /password:* in order to be prompted for the password when running the tool instead of providing it in cleartext as a parameter. Note that providing the password without the user is ignored.

If no username and password is supplied, then Windows authentication is used (in the context of the user which is logged on to the SQL Server box and runs RegisterRext.exe).

Let us go back to Dane and his situation and see what we can do. Admin of the SQL Server instance Dane plays on works hard on has created a database for Dane with a lot of data where Dane now want to do data science “things”. Part of the things Dane want to do requires packages that are not part of a SQL Server ML Services installation.

So the admin decides to enable Dane’s database for remote package management. As this is the first time a database on the instance is enabled, the admin needs to enable the instance itself before the database.

To enable the instance a user with admin rights on the box, logs on and runs RegisterRext.exe from an elevated command prompt:

1
> RegisterRext.exe /installpkgmgmt

Code Snippet 5: Enable Default Instance

In Code Snippet 5 we enable packet management on the default instance (as that is where Dane’s database is), and we enable it using Windows Authentication. Regardless of type of authentication, the user has to have VIEW SERVER STATE permissions on the SQL Server instance. The output from Code Snippet 5 is like so (edited for readability):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
>RegisterRExt /installpkgmgmt

Source directory to pick the RExtension binaries determined 
to be "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\
R_SERVICES\library\RevoScaleR\rxLibs\x64\".

Connecting to SQL server...

Sql server binn directory is 
"C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn".

Stopping service MSSQLLaunchpad...
Copied RLauncher.dll from C:\Program Files\Microsoft SQL Server\
MSSQL14.MSSQLSERVER\R_SERVICES\library\RevoScaleR\rxLibs\x64\ 
to C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn.

Installing SQL R services package management for instance ''

Installing version 1.0 artifacts

Creating package management account MSSQLSERVERPKG...

Saving package management user account configuration...

Working directory long path:'C:\Program Files\Microsoft SQL Server\
MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData' and short path:
'C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1'.

Working directory long path:'C:\Program Files\Microsoft SQL Server\
MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData' and short path:
'C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1'.

Working directory long path:'C:\Program Files\Microsoft SQL Server\
MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData' and short path:
'C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1'.

Configuring security for folder C:\Program Files\Microsoft SQL Server\
MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData...

SQL R services package management installed successfully.

Starting service MSSQLLaunchpad...

Code Snippet 6: Output Enable Instance

A couple of interesting points from what we see in Code Snippet 6:

  • The launchpad service is stopped when the install process starts and at the end restarted.
  • The install process creates a new packet management account. This account appears together with the worker accounts in the user accounts pool that you find in the ExtensibilityData folder.

NOTE: As the instance only is enabled once, this could potentially be part of when we enable an instance for external scripts.

Having enabled the instance, admin can now go ahead and enable the database:

1
> RegisterRext.exe /installpkgmgmt /database:DataScienceDB

Code Snippet 7: Enable Database

In Code Snippet 7 we see how we enable packet management on the DataScienceDB database, once again using Windows Authentication. Whereas enabling an instance requires VIEW SERVER STATE permissions for the authenticated user, when enabling a database the user needs both VIEW SERVER STATE as well as being part of the db_owner database role.

The output from Code Snippet 7 is like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
>RegisterRExt /installpkgmgmt /database:DataScienceDB

Source directory to pick the RExtension binaries determined 
to be "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\
R_SERVICES\library\RevoScaleR\rxLibs\x64\".

Connecting to SQL server...

Sql server binn directory is "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn".

Installing SQL R services package management for database 'DataScienceDB'

Installing version 1.0 artifacts

Working directory long path:'C:\Program Files\Microsoft SQL Server\
MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData' and short path:
'C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1'.

Creating package management folder C:\Program Files\Microsoft SQL Server\
MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData\
rpkgs\4ecd5724-d180-47da-a8e0-01d6c1b9bd0f...

Creating package management table rpackages...

Creating package management role rpkgs-users...

Creating package management role rpkgs-private...

Creating package management role rpkgs-shared...

Creating package management stored procedures ...

Installing version 1.1 artifacts

Altering package management table rpackages to include attributes flags...

Creating package management stored procedures ...

SQL R services package management installed successfully for database 'DataScienceDB'

Code Snippet 8: Output Enable Database

As with the output from enabling an instance (Code Snippet 6) we see some interesting things in Code Snippet 8 for when enabling a database, and we see how the process:

  • Creates a table: rpackages.
  • Creates roles.
  • Creates stored procedures:

Figure 1: Stored Procedures Created

In Figure 1 we see the procedures that the enabling process created. In a future post, we talk more about these procedures

RevoScaleR Functions, Scopes & Roles

RevoScaleR exposes functions similar to CRAN R functions for package management:

  • rxSqlLibPaths: Retrieves the path of the instance library on the remote server.
  • rxFindPackage: retrieves the path for one or more packages on the remote SQL Server.
  • rxInstallPackages: Install packages and their dependencies in an SQLCC from a remote client. You can specify which repo to install from or also install from locally saved zipped packages.
  • rxInstalledPackages: Retrieves a list of packages installed in the specified compute context.
  • rxSyncPackages: Copy information about a package library between the file system and database.
  • rxRemovePackages: Removes packages from a specified compute context. It also removes dependencies if other packages on SQL Server no longer use them.

Back to our “hero” Dane. Admin has just told him that the SQL Server instance together with his database are now package management enabled, so he can begin to do his awesome “stuff”. So what Dane first want to do is to see what packages are available to him in his database, and he does not really want to use SSMS, but instead use his R client of choice; Visual Studio together with Microsoft R Client:

Figure 2: Visual Studio with Microsoft R Client

In Figure 2 we see how Dane’s R project in Visual Studio targets Microsoft R Client. Dane now wants to use the following code to see packages available on his database:

1
2
3
4
5
6
> sqlCCConnString <- "Driver=SQL Server; server=<server_name>; 
                     database=daneb; uid=daneb; pwd=<danes_pwd>"
> sqlCC <- RxInSqlServer(connectionString = sqlCCConnString, numTasks = 1)
> rxSetComputeContext(sqlCC)
> sqlPackages <- rxInstalledPackages(computeContext = sqlCC)
> sqlPackages

Code Snippet 9:

In Code Snippet 9 we see how Dane:

  • Sets up a connection string pointing to his database for the SQLCC.
  • Creates the SQLCC and sets it.
  • Calls rxInstalledPackages to retrieve packages.

When Dane executes the code in Code Snippet 9 it does not go as he wants - he gets an error at the rxInstalledPackages call:

1
2
3
4
Error in rxCheckPackageManagementVersion
     (connectionString = computeContext@connectionString) : 
  The package management feature is not enabled for the current 
  user or not supported on SQL Server version 14.0.1000.169

Code Snippet 10: Execution Error

Hmm, the error we see in Code Snippet 10 says something about “The package management feature is not enabled…” What is this all about?

Scopes & Roles

The error above is related to what happened when we enabled the database. Remember how the process created roles in the database and the user that tries to execute some of the package management functions need to be in some of the roles unless the user is part of db_owner. In Code Snippet 8 we see the roles, and there is a role named rpkgs-users. When I see that role, it seems like a good candidate for Dane to be part of, so I add Dane to the role:

1
2
ALTER ROLE [rpkgs-users] 
  ADD MEMBER dane;

Code Snippet 11: Add Member to Role

Notice in Code Snippet 11 how I have to enclose the role name in brackets([]) as the name has a hyphen in it. When Dane now tries to execute the code:

Figure 3: Error

Ok, we seem to be a little bit further than before, and it seems we now can execute, but there is still an error, and it looks related to “scope”, whatever that is.

Scopes has to do with installation of packages via the SQLCC and the usage of those packages. There are two scopes:

  • Shared: shares packages with other users in the database.
  • Private: accessible only to the user installing the package.

Scope and the roles we see above are related in that the roles define what the user can do and in what scope:

  • rpkgs-users: users in this role can use packages installed by users belonging to the rpkgs-shared role. Cannot read via the rx... functions.
  • rpkgs-private: this role has all permissions as the rpkgs-users role has. A user in this role can furthermore, install, remove and use private packages installed by the user in question.
  • rpkgs-shared: The same permissions as the rpkgs-private role. Users in this role can install, remove, and use shared packages.

So, let us now see what we can do with these roles:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
ALTER ROLE [rpkgs-users] 
  ADD MEMBER user1;

ALTER ROLE [rpkgs-shared] 
  ADD MEMBER dane;

ALTER ROLE [rpkgs-shared] 
  ADD MEMBER user2;

ALTER ROLE [rpkgs-private] 
  ADD MEMBER user3;

Code Snippet 12: Add Users to Roles

After we have added users to various roles, let us see what happens when different users want to install packages.

RevoScaleR Package Installation

In the following examples, I have chosen a package abc mostly due to it not having too many dependencies, which makes it quicker to install. The abc package contains tools for “Approximate Bayesian Computation” (ABC).

Something to think about are the versions of RevoScaleR on your development box vs the version on the remote SQL Server. My, admittedly, limited investigations lead me to believe that the version on the local machine cannot be an earlier version than on the remote SQL Server. If the version is earlier on the local machine, you get permission errors. From what I can tell, the local version can be the same, or newer.

Ok, so to install packages we use rxInstallPackagaes():

1
2
3
4
5
rxInstallPackages(pkgs, skipMissing = FALSE, 
                  repos = getOption("repos"), 
                  verbose = getOption("verbose"), 
                  scope = "private", owner = '', 
                  computeContext = rxGetOption("computeContext"))

Code Snippet 13: Signature rxInstallPackages

Let us look at the parameters we see in Code Snippet 13:

pkgs:

String array of the packages you want to download and install. By default, it downloads the latest version of the package(s). When installing through an SQLCC the arrays can be .zip files as well as URL’s, http:// and file://. If the URL is a file, the location must be available to SQL Server.

skipMissing:

Boolean (TRUE, FALSE) indicating whether not to download dependent packages. The argument defaults to false.

repos:

String array of base URL’s of repos to use. It defaults to the MRAN repo. If you want to install from local files repos has to be explicitly set to NULL.

verbose:

Boolean (TRUE, FALSE) indicating whether to output progress during installation.

scope:

Related to what I said above about scope. The scope argument indicates whether to install packages, so they are available for other users on the database: shared, or only accessible to the user calling rxInstallPackages: private. The default value is private. To install to shared scope the user has to be in the rpkgs-shared or db_owner role.

owner:

A user in the db_owner role can install packages on behalf of other users, and set the owner parameter to a valid database user account.

computeContext:

The compute context to install under.

Install Already

Finally, we can talk about actually installing a package. Dane now wants to install the abc package:

1
2
3
4
5
6
7
8
sqlCCConnString <- "Driver=SQL Server; 
                    server=192.168.57.3;
                    database=DataScienceDB; 
                    uid=dane; pwd=password1234$"
sqlCC <- RxInSqlServer(connectionString = sqlCCConnString, numTasks = 1)
rxSetComputeContext(sqlCC)
pkgs <- c("abc")
rxInstallPackages(pkgs = pkgs, verbose = TRUE, computeContext = sqlCC)

Code Snippet 14: Install R Package

In Code Snippet 14 we see how Dane:

  • Sets up the connection string for the SQLCC.
  • Created the SQLCC and sets it.
  • Defines the package(s): pkgs <- c("abc").

When he calls rxInstallPackages, he follows the progress in the Visual Studio R interactive window:

Figure 4: Installation Progress

Dane sees in Figure 4 how the various packages needed for abc downloads and subsequently installed. After a couple of minutes the installation is completed:

Figure 5: Installation Finished

After the installation has finished, let us see how it works.

Usage

So in Figure 5 we see that the installation finished successfully. Dane wants now to start using the new packages he installed in SQL Server, and for testing purposes, he uses the abc.data package (installed as a dependency of the abc package):

1
2
3
4
5
6
7
EXEC sp_execute_external_script 
         @language = N'R',
         @script = N'
                library("abc.data")
                data(human)
                OutputDataSet <-  data.frame(stat.voight)'
WITH RESULT SETS(([pi] float, [TajD.m] float, [TajD.v] float ))

Code Snippet 15: Using R Package

In Code Snippet 15 it is assumed that Dane has logged in as himself and we see how Dane uses human which is a set of R objects containing observed data from three human populations. The call data(human), loads in four R objects and Dane is interested in stat.voight, which is a data frame with 3 rows and 3 columns. The data frame contains the observed summary statistics for three human populations. In his code, Dane outputs the stat.voight data frame. However, when Dane executes the code, this happens:

Figure 6: Package does not Exist

Hmm, what happened here? We know that Dane installed the package, so why does he get an exception that the package does not exist? The reason is that the package he installed is not installed in the R engine, but installed on the database and is not visible. To be able to use the package Dane needs to execute under the SQLCC.

NOTE: In future posts we talk more about where the packages are installed and how they get loaded etc.

So, how does Dane make the abc.data package execute in the SQLCC. Well, the abc.data package does not know about SQLCC, so Dane needs to use a RevoScaleR wrapper function: rxExec. The rxExec function allows distributed execution of a function, and the signature looks like so:

1
2
3
4
5
6
7
rxExec(FUN,   ...  , elemArgs, elemType = "nodes", 
       oncePerElem = FALSE, timesToRun = -1L,
       packagesToLoad = NULL, execObjects = NULL, 
       taskChunkSize = NULL, quote = FALSE, 
       consoleOutput = NULL, autoCleanup = NULL, 
       continueOnFailure = TRUE, RNGseed = NULL, 
       RNGkind = NULL, foreachOpts = NULL)

Code Snippet 16: Signature of rxExec

In Code Snippet 16 we see rxExec’s signature. In a future post we look at rxExec in more detail, but for now what interests us are the two first parameters:

  • FUN: The function to execute.
  • ...: Arguments passed to the function (FUN).

The first parameter is the function we want rxExec to execute and the second parameter passes in the argument(s) the function requires. So, if we go back to what Dane wants to do, he re-writes his code like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
EXEC sp_execute_external_script 
 @language = N'R',
 @script = N'
   # define a function to be passed to rxExec
   usePackageInRxFunction <- function()
   {
     library("abc.data")
     data(human)
     return(stat.voight)
   }
   # the "normal" SQLCC stuff
   sqlCCConnString <- "Driver=SQL Server; server=.; 
                       database=DataScienceDB; 
                       uid=dane; pwd=password1234$"
   sqlCC <- RxInSqlServer(connectionString = sqlCCConnString, 
                          numTasks = 1)
   rxSetComputeContext(sqlCC)

   ret <- rxExec(usePackageInRxFunction)
   OutputDataSet <- data.frame(ret)'
WITH RESULT SETS(([pi] float, [TajD.m] float, [TajD.v] float ))

Code Snippet 17: Executing in SQLCC

We see how Dane in Code Snippet 17:

  • Creates a function usePackageInRxFunction where he does the abc.data related work.
  • Creates and sets up the SQLCC.
  • Calls rxExec with usePackageInRxFunction as argument.
  • Returns a data set.

When Dane executes the code the result is:

Figure 7: Success

So, everything works fine, and Dane gets a result back. What happens if someone else wants to execute the same code, for example, nielsb who is in the db_owner role? The only thing nielsb changes is the sqlCCConnString parameter:

1
2
3
 sqlCCConnString <- "Driver=SQL Server; server=.; 
                       database=DataScienceDB; 
                       uid=nielsb; pwd=password1234$"

Code Snippet 18: SQLCC Connectionstring for nielsb

When nielsb executes the code in Code Snippet 17 but with the sqlCCConnString parameter looking like in Code Snippet 18 the outcome is:

Figure 7: No Package Exception

Hmm, that is strange - we know Dane installed the package (and he just used it), so it has to exist. Dane is part of the rpkgs-shared role so he has permissions to create shared packages? The reason here is even though Dane was in rpkgs-shared role when he ran rxInstallPackages (Code Snippet 14) he did not define the scope for the package. Remember when we looked at the signature and the arguments for rxInstallPackages in Code Snippet 13 we said that default value for the scope argument is private and that is how the package is installed regardless of what role Dane is in.

So to finish off, let user2 who is in rpkgs-shared install the same packages in shared scope. Does that work, can you have the same packages installed multiple times? Yes, you can in that you can have the same packages installed in both private as well as shared scope. So user2 executes this code:

1
2
3
4
5
6
7
8
sqlCCConnString <- "Driver=SQL Server; 
                    server=192.168.57.3;
                    database=DataScienceDB; 
                    uid=user2; pwd=password1234$"
sqlCC <- RxInSqlServer(connectionString = sqlCCConnString, numTasks = 1)
rxSetComputeContext(sqlCC)
pkgs <- c("abc")
rxInstallPackages(pkgs = pkgs, verbose = TRUE, scope = "shared", computeContext = sqlCC)

Code Snippet 19: Install R Package

The only changes here in Code Snippet 19 compared to the code in Code Snippet 14 are the user id (uid -obviously), and scope = "shared" in the rxInstallPackages call. The abc package is now installed in shared scope, and anyone in any of the rpkgs-... roles can use the package.

Summary

This turned out to be a much longer post than I thought it would be, and there are still topics related to RevoScaleR installation functionality that I want to cover, but that comes in future posts.

So, here is what we covered:

  • To use RevoScaleR for package installation both the SQL Server instance as well as the database need to be enabled for package management. You enable package management via RegisterRExt.exe tool and the /installpkgmgmt option. There are additional flags for database enabling, authentication and so forth.
  • When enabling the database the process creates a table, stored procedures and roles.
  • For a user to be able to install packages he needs to have necessary permissions on sp_execute_external_script1 as well as the EXECUTE ANY EXTERNAL SCRIPT` permission. He also needs to be in a role which allows him to install packages.
  • The roles that the enabling process creates are: rpkgs-users, rpkgs-private and rpkgs-shared.
  • The roles which allows the user to install packages are rpkgs-private and rpkgs-shared (and db_owner).
  • The roles define the scope of the installed packages: private and shared.
  • When a user installs a package with private scope, only he can see and use the package.
  • If the user installs a package with shared scope, all users in any of the roles, including rpkgs-users can use that package. The user needs to be in the rpkgs-shared (or db_owner) to install a shared package.
  • You use the function rxInstallPackages to install a package, and the function needs to run in an SQLCC.
  • When the user calls rxInstallPackages he needs to define which scope the package has through the scope argument. If the scope is not defined, it defaults to private.
  • To use a package, either in private or shared scope, the code needs to run in SQLCC.
  • For a package that has no knowledge about SQLCC, the functions in the package can be run via rxExec.

~ Finally

If you have comments, questions etc., please comment on this post or ping me.


comments powered by Disqus