Interesting Stuff - Week 24, 2021

Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:

AI/data science
data in general
data architecture
streaming
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.

Distributed Computing

Introduction to Chaos Engineering. The post linked here looks at the origin, principles, and benefits of Chaos Engineering. Chaos Engineering is when you try to disrupt and break an application system to build resilience. Notice that the post is behind a paywall.

Machine Learning / AI

Flink-powered model serving & real-time feature generation at Razorpay. This post, which is from back in December 2020, looks at how Apache Flink is being utilized as a way to overcome challenges around feature generation and machine learning model serving in real-time. Very interesting!
How to Build a Scalable Wide and Deep Product Recommender. A Wide and Deep Learning Model consists of two parts; a machine learning part (linear model) and a neural network part. This type of model is often used in recommender systems, and the blog post linked looks at how you can do it using Databricks.
Automate Machine Learning using Databricks AutoML — A Glass Box Approach and MLFLow. During the Data + AI Summit 2021, Databricks announced their Databricks AutoML platform. This post looks at using Databricks AutoML Platform to automatically apply machine learning to a dataset and deploy the model to production using the REST API.

Streaming

Block Aggregator: Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Retries. This post discusses the message-processing engine eBay developed to avoid data loss or duplication during delivery from Kafka to ClickHouse. I found the post very interesting as we are looking at similar things at Derivco right now.
Serverless Event Driven Systems with Confluent Cloud and AWS Lambda. This post presents an end-to-end example of a Serverless event-driven architecture using Confluent Cloud for stream processing paired with AWS Lambda for event responsive logic using the Serverless Application Model (SAM) framework.
How to Better Manage Apache Kafka by Creating Kafka Messages from within Control Center. This post is the first in a series looking some new features to Control Center, introduced in Confluent Platform 6.2.0. Some very cool stuff here!

WIND (What Is Niels Doing)

In last weeks roundup, I mentioned I was looking into Azure Data Explorer as I was thinking about creating some presentations for upcoming conferences. I did submit a couple of topics to some conferences, and I had two conferences accepting talks! Woohoo! The conferences and talks are:

2021 Data Platform Summit: How to do Real-Time Analytics Using Apache Kafka and Azure Data Explorer.
Future Data Driven: Analyze Billions of Rows of Data in Real-Time Using Azure Data Explorer.

So now having the above talks accepted, I really need to get going with prep. I am stoked!

Oh, and I am still doing the Big Data & Analytics with SQL Server 2019 Big Data Cluster training class for the 2021 Data Platform Summit, and if you sign up for the class you get free access to the summit itself!

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.

Distributed Computing

Machine Learning / AI

Streaming

WIND (What Is Niels Doing)

~ Finally

CATALOG