Interesting Stuff - Week 2, 2021

Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:

data science
data in general
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.

Data Architecture

Data Mesh Simplified: A Reflection Of My Thoughts On Data Mesh. The post linked to here gives an interesting take on what a Data Mesh is, and what problems it solves.
Lakehouse Architecture Realized: Enabling Data Teams With Faster, Cheaper and More Reliable Open Architectures. This post by Databricks serves as a review of 2020 of what has happened at Databricks and in the Big Data world.

Data

around-dataengineering. The link here is not, so much a blog post, but a page with links to various interesting machine learning and data engineering technologies. When reading this, I found a lot of new interesting “stuff”.
Change Data Analysis with Debezium and Apache Pinot. This blog post looks at real-time analytics based on combining Debezium, with the real-time OLAP datastore, Apache Pinot. Very, very cool!

Streaming

TwelveDaysOfSMT. Kafka has a functionality called Single Message Transforms (SMT). Using SMT, you can modify the data and its characteristics as it passes through the Kafka Connect pipeline, without needing additional stream processors. This page I have linked to here contains a list of blog posts by Robin Moffat, where he looks at various SMT types.

WIND (What Is Niels Doing)

Recently I have looked quite a bit at the open sourced SQL Server Language extensions, and the Python one specifically. In last weeks roundup I mentioned I had written a blog post looking at how to using the language extension with a Python runtime other than the one shipping in SQL Server Machine Learning Services. That post resulted in a couple of follow up posts, published the last few days:

How to build Boost.Python with the view to be able to create a Python SQL Language extension.. Since the SQL Server language extensions are open-sourced, you can build your own language extension. I started a post looking at recompiling the Python extension to cater for a newer Python version. To recompile the Python extension, you need to use Boost.Python. It turned out that was more complex than I initially thought, so it deserved its own post. In the post linked to we look at building Boost.Python so we can create a Python SQL Language extension.
Write a Python 3.9 Language Extension for SQL Server Machine Learning Services. This post is the one resulting in the Boost.Python post. This post looks at how to write a Python 3.9 SQL Server Language extension to use in SQL Server Machine Learning Services.
Solve Python Issues in SQL Server Machine Learning Services After Deploying Python 3.9. Having written the posts above and trying to use the deployed languages, I realized that I could not execute against other Python languages after deploying a new language. After some investigation, I managed to figure out why, and the post linked to tries to explain what the problem is, and how to solve it.

There are a couple of more things I would like to look at around language extensions, so, you can expect some more posts about the new open-sourced language extensions in the following weeks.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.

Data Architecture

Data

Streaming

WIND (What Is Niels Doing)

~ Finally

CATALOG