In September 2020, Microsoft open-sourced SQL Server Machine Learning Services, (SQLMLS), language extensions for R and Python. If you want more information, here are some blog posts I have written about it:
- Bring Your Own R & Python Runtimes to SQL Server Extensibility Framework
- Write a Python 3.9 Language Extension for SQL Server Machine Learning Services
In the last post, which looks at using Python 3.9 in SQL Server Machine Learning Services, I wrote this at the very end:
It looks like all is good, but maybe not? In a future post we’ll look at an issue we have introduced - but for now, let us bask in the glory of having created a new Python language extension.
In the post, we wrote a new language extension to handle Python 3.9, and that just worked fine. However, when I was doing some other things, I noticed some side effects, and in this post, we look at those side effects and how to solve them.
NOTE: The two posts mentioned above has been updated as a result of this post. If you now follow the posts above you may not see the errors we discuss here.
Pre-reqs
Before we dive in, let us look at the pre-reqs (if you want to follow along).
In this post, I am not going into detail about language extensions and external languages and such. If you are unsure about this, please read the posts linked to above.
Environment Variables
If you followed along in post about Python 3.9 and created the PYTHONHOME
environment variable, please delete it. After you have deleted it restart the Launchpad service for the SQL Server instance where you run the code.
SQL Server
We need SQL Server CU3+ with Machine Learning Services and Language Extensions installed together with Python. Execution of external scripts needs to be enabled as well. You also need a database where to deploy the external languages to. In this post I use a database called ExtLangDB
:
|
|
Code Snippet 1: Create Database
Since in my previous posts I have used the same database name we see in Code Snippet 1 we see how I drop the database if it exists before I create it. I do this to clear out any old “cruft” in the db.
NOTE: Above I say Machine Learning Services and Language Extensions installed together with Python. I mean that, during the installation of SQL and when you tick the box for Machine Learning Services and Language Extensions, you also choose Python. I refer to this as PythonSQLMLS - Python for SQL Server Machine Learning Services, the “native” Python.
Let us make sure that everything looks OK in your database:
|
|
Code Snippet 2: Make Sure All is OK
In Code Snippet 2 we look for what external languages we have installed, and the result is like so:
Figure 1: External Languages
OK, so according to Figure 1 all seems OK in the database. We have both R, and Python outlined in red, installed as part of Machine Learning Services and Language Extensions.
The last thing we want to do related to SQL Server is to install Python 3.7.9, and Python 3.9.1 on the SQL Server box. Both installations need pandas
, and Python needs to be added to the PATH
, (root directory, and .\Scripts
directory). The Python installations need to be done with administrator permissions: “Run as administrator”.
Language Extensions
We need the open-sourced Microsoft Python language extension, which you download from here, (click on “Python language extension”). As I mentioned in this post, it supports Python 3.7.x. In the rest of this post, I call it the Python37 extension.
We also need an extension for Python 3.9. You can follow my post here, and build it yourself. If you don’t feel like building it, you can download the extension we built in the post in the previous sentence from here. Going forward I call this extension Python39.
After downloading/building, place them in a location from where SQL Server can access them, and name them so you can see who is who.
PythonSQLMLS
Let us make sure that the “native” Python, the one that comes as part of SQLMLS, (seen in Figure 1), works correctly in our database:
|
|
Code Snippet 3: Get Path & Executable
The code we see in Code Snippet 3 is the same as in the previous posts.
You see in Code Snippet 3 how the @language
parameter is set to Python
. That indicates using the Python installed as part of Machine Learning Services and Language Extensions when we check the Python check-box, (PythonSQLMLS).
The result of running the code in Code Snippet 3 is like so:
Figure 2: Python SQLMLS
We see in Figure 2 how the SQLMLS Python is version 3.7.1.
Python 3.7 as Language Extension
We did the same as above in the Bring Your Own R & Python Runtimes to SQL Server Extensibility Framework, and we then continued with creating an external language from the Microsoft Python language extension:
|
|
Code Snippet 4: Create Python 3.7 External Language
In Code Snippet 4 we see how:
- we create an external language with the name of
p37
. We cannot usePython
as it is a reserved word. - I renamed the Microsoft language extension to
python-lang-extension-windows-3.7.zip
.
We check that the creation of p37
succeeded by executing the code in Code Snippet 2. In the result, we should now see p37
in addition to what we see in Figure 1.
When we see that we indeed have a new external language we:
- create a system environment variable
PYTHONHOME
pointing to the Python 3.7 install directory on the SQL box (in my caseC:\Python37
). This is required as per the Microsoft documentation. We’ll see later that this may not be required. - assign read and write permissions to the Launchpad server for the SQL Server instance and
ALL APPLICATIONS GROUP
. The permissions are given to the root directory and underlying files and directories ofPYTHONHOME
, (read more about assigning permissions in this post).
Having created the PYTHONHOME
variable, we restart the Launchpad service. We are now ready to execute the code in Code Snippet 3, but we need to change the @language
parameter to p37
instead of Python
.
The result we get back when executing the code should look like so:
Figure 3: Python 3.7.9 External Language
Yay, we see from Figure 3 how we now execute against Python 3.7.9 instead of 3.7.1. So yes, we are executing against another Python runtime than PythonSQLMLS.
If you get an error when you execute the code and the error looks something like this: “A
- Your Python installation is not on the
PATH
. - Permissions have not been set on the Python directories and files.
Cool, we have now executed using a later version of the Python runtime than what is installed through SQLMLS. That’s where we finished the Bring Your Own R & Python Runtimes to SQL Server Extensibility Framework post.
What we didn’t do was to see what happens if we now want to run the code in Code Snippet 3 and use PythonSQLMLS, (@language=Python
):
Figure 4: Python Error - I
This.Is.Not.Good! We get a ModuleNotFoundError
, as in Figure 4, and the module we cannot find is revoscalepy
, which one of the proprietary Microsoft Python modules.
What we see here is one of the side effects I observed when doing my previous blog posts. We’ll come back to this after looking at using the Python 3.9.1 runtime.
Python 3.9 as Language Extension
In the Write a Python 3.9 Language Extension for SQL Server Machine Learning Services post we said that the Microsoft open-sourced language extension can only be used for Python 3.7.x. If we want to use another runtime, like 3.9.x, we need to re-compile the source code against the Python version we want to use. In the post we did that and saw how we executed against Python 3.9.1.
For Python 3.9.1, let us do what we did above with 3.7.9:
- create an external language. Let’s call it
p39
, based on the extension for Python 3.9.1 I built, which is [here][]. - change/create the
PYTHONHOME
variable to point to the Python 3.9 installation. - assign the necessary permissions against the root directory, subdirectories, and Python 3.9 installation files.
- restart the Launchpad service.
- change the @language parameter in Code Snippet 3 to
@language=p39
.
When we run the code, we see:
Figure 5: Python Version
In Figure 5 we see, as we did in the Python 3.9 post, how we get back Python 3.9.1. We are running against the Python 3.9 runtime. Awesome!
What happens if we now change back and execute with @language=Python
, or @language=p37
:
|
|
Code Snippet 5: Python Error - II
Once again - not good! As we see in Code Snippet 5 we get a new error back saying something about not initialising something.
The question now is if this elated to Python in SQL Server only, or to Python in general on the SQL Server box?
Python Stand-alone
As Python 3.9 seems to work, let us just confirm that. From command prompt we cd
into the Python 3.9 installation. For me, it looks like: cd C:\Python39
.
In there, we execute:
|
|
Code Snippet 6: *Execute Python 3.9 from Command Prompt"
In Code Snippet 6 we see how we want to execute some Python code that prints out the Python version we execute. When we run the code, we get back the same we see in Figure 5. So that works.
What about Python 3.7? We do the same:
cd
into the Python 3.7 installation, (for me:C:\Python37
).- execute the code in Code Snippet 6.
Hmm, that - once again - does not look good:
Figure 6: Python Error - III
We see in Figure 6 how we get exactly the same error as in Code Snippet 5. What is going on here?
PYTHONHOME
A clue to the issue is highlighted in yellow in Figure 6: File "C:\Python39\lib\io.py"
. We are executing against Python 3.7, but for some reason Python tries to touch a file in the Python 3.9 lib
directory. Huh?
The culprit here is the PYTHONHOME
system environment variable we created. We created it as the documentation around the Python language extension states it is required.
The PYTHONHOME
system environment variable is a well known variable to the Python engine. The variable changes the location of the standard Python libraries. In the example where the variable was set to Python 3.9 and we executed against the Python 3.7 executable, Python tried to load files from the 3.9 installation. Now we have a version mismatch, and that’s why we got the error.
In the PythonSQLMLS example where the variable was set to Python 3.7, we still got an error, why is that? In this case, the PythonSQLMLS runtime tried to load a Microsoft specific module that does not exist in the “normal” Python 3.7 runtime.
Solution
How do we solve this? Well, above I say that the PYTHONHOME
system environment variable is required. That is not entirely true. It is required if you want to create an external library for your language. If you only want to execute Python code, then you don’t need it. Having Python on the PATH
is required though in either case.
So to confirm that I am not lying, (too much), to you let’s go ahead and:
- delete the
PYTHONHOME
system environment variable. - restart the Launchpad service.
When we now execute the code in Code Snippet 3 with the @language
parameter being Python
, p37
, and p39
respectively we see the expected result for all three executions!
But what do we do if we also want to create external libraries for for example Python 3.9, and at the same time in the database where we want to create the external library also need to execute PythonSQLMLS code?
In that case, we can not use PYTHONHOME
as an environment variable, so we need to create a variable with a random name pointing to the Python extension.
Having done that we:
- go to the source code for the extension.
- change the variable name in the method where it is used:
std::string PythonExtensionUtils::GetPathToPython()
. This is in the source filePythonExtensionUtils_win.cpp
, (remember, I do this for SQL on Windows). - recompile the extension, and redeploy.
In a future post, we will look more in detail at how to do this.
Summary
This post came about as I saw strange errors after having deployed Python language extensions. Well, what I saw was that if I deployed a Python 3.7 extension, (Python37), I could not execute code with the “native” Python for SQL Server Machine Learning Services, (PythonSQLMLS). If I deployed an extension for Python 3.9, I could not execute code for either PythonSQLMLS, or Python37.
Based on the errors we received, we determined the issue was with the PYTHONHOME
system environment variable. The official Microsoft documentation says this variable needs to be set for the Python extension. That requirement is only partially true, yes - it needs to be set if you want to create an external library for the language extension. However, if you just want to execute code, it is not needed.
NOTE: Something that is not entirely clear in the official documentation is that the Python installation needs to be on the
PATH
in either case.
~ Finally
If you have comments, questions etc., please comment on this post or ping me.
comments powered by Disqus