Interesting Stuff - Week 47, 2021

Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:

AI/data science
data in general
data architecture
streaming
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.

Machine Learning / AI

SynapseML: A simple, multilingual, and massively parallel machine learning library. This post introduces SynapseML. SynapseML was previously known as MMLSpark and is an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. SynapseML unifies several ML frameworks and new Microsoft algorithms in a single, scalable API usable across Python, R, Scala, and Java.

Databases

What’s Really New with NewSQL?. In this post, Murat looks at the evolution of NoSQL into NewSQL and what NewSQL is. Very informative; I liked the post a lot.

Distributed Computing

Ray on Databricks. No, this is not a post where someone named Ray talks about Databricks. Ray is an open-source project that makes it simple to scale any compute-intensive Python workload. Running Ray on top of an Apache Spark cluster creates the ability to distribute the internal code of PySpark UDFs and Python code that used to be only run on the driver node. But hang on a sec; Spark is a distributed framework. Why would I want to run another distributed framework on top of Spark? Well, read the post and find out.

Streaming

Streaming Data Exchange with Kafka and a Data Mesh in Motion. In quite a few roundups, I have linked to posts about Data Mesh. In even more roundups, I have linked to Kafka material and posts about streaming data. The post linked to looks at the principle behind the Data Mesh and why we need multiple technologies to build a Data Mesh. The post dives into why Kafka is a good solution for the foundation of a Data Mesh.
Announcing ksqlDB 0.22.0. I guess the post title says it all: it looks at some of the new features of ksqlDB 0.22. And some very cool new features they are as well! Please read the post to find out more!
How to Efficiently Subscribe to a SQL Query for Changes. This post looks at one of the new features in ksqlDB 0.22; enhancements to push queries and increased scalability of said queries. Very, very cool!

WIND (What Is Niels Doing)

It is not so much of what I am doing as of what did I do:

Figure 1: Cloud Data Driven

I did a presentation a little while back at Cloud Data Driven, where I looked at Customer Lifetime Value (CLV), and how you can use Azure Databricks to calculate the CLV. As I said, it was a week or two ago, why i mention it now is because the recording of the webinar is up on YouTube. So, if you are interested - go and have a look at Improve Customer Lifetime Value using Azure Databricks Delta Lake.

While you are at it, register with Cloud Data Driven’s Meetup group. The group is awesome if you are interested in everything data!

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.