Interesting Stuff - Christmas, New Year, Week 1, 2021

Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:

data science
data in general
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This is the “roundup” of the posts that have been most interesting to me over the Christmas and New Year period 2020, and the first week of 2021.

Data Architecture

Data Mesh Principles and Logical Architecture. The post here is a follow up to How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. It summarizes the data mesh approach by enumerating its underpinning principles, and the high level logical architecture that the principles drive. If you are into data architecture, you have to read this post!

Streaming

Handling Late Arriving Dimensions Using a Reconciliation Pattern. The blog post linked to here looks at a few use cases of late-arriving dimensions and potential solutions to handle it in Apache Spark pipelines.
Introducing the Confluent Parallel Consumer. This blog post looks at a new Kafka consumer client: the Confluent Parallel Consumer. The post covers why a new consumer client is needed and the use cases for this consumer. Very interesting!
Event Streaming with Kafka Streams and ksqlDB. The link here is to the revised new edition of Kafka Streams in Action. It has been expanded to cover more of the Kafka platform used for building event-based applications, including full coverage of ksqlDB. I bought it, and you should buy it as well!
Announcing ksqlDB 0.14.0. As the title implies, a new version of ksqlDB is out in the wild. This post looks at some of the most notable changes, and new features of this release. Some quite “juicy stuff” in the release!

WIND (What Is Niels Doing)

When I went on leave for Christmas, and New Year I said to myself that I had to get some blog-posts out, and for once my plans came together:

A Lap Around SQL Server 2019 Big Data Cluster: Architecture. Finally, finally, finally! This post is a follow on from A Lap Around SQL Server 2019 Big Data Cluster: Background & Technology, and it has been in the works for nearly eight months. What can I say? In the post, we look at the architecture of a SQL Server 2019 Big Data Cluster, and the various components of a BDC.
Bring Your Own R & Python Runtimes to SQL Server Extensibility Framework. In September 2020, Microsoft announced that they have open-sourced the R and Python language extensions for SQL Server Machine Learning Services. As a result, we can now bring our own versions of R and Python to SQL Server 2019. In the post linked to I look at how to use a Python runtime with a later version then what is by default shipped in SQL Server Machine Learning Services.

So that’s what I have done.

I am now working on a couple of posts on how to create your own Python language extension from the open-sourced code Microsoft released. Expect something to be out fairly soon. Yeah, yeah, I know - that’s what I said about the Big Data Cluster architecture post as well back in April 2020. I guess we’ll see.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.

Data Architecture

Streaming

WIND (What Is Niels Doing)

~ Finally

CATALOG