Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:

AI/data science
data in general
data architecture
streaming
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.

Machine Learning / Data Science

Open-Sourcing a Monitoring GUI for Metaflow, Netflix’s ML Platform. This Netflix post looks at Metaflow GUI. This GUI for their open-sourced, Metaflow library allows data scientists to monitor their workflows in real-time, track experiments, and see detailed logs and results for every executed task. The GUI can be extended with plugins, allowing the community to build integrations to other systems, etc.
Moneyball 2.0: Real-time Decision Making With MLB’s Statcast Data. Back in 2003, Michael Lewis wrote the book Moneyball. The book was about how a baseball manager used data analysis to identify undervalued players. The post here looks at how baseball teams today use streaming data and Databricks to do real-time analysis and decisions. Very interesting! Oh, BTW, Micheal Lewis is an excellent author, and the book mentioned is great!

Azure Data Explorer

How to do Real-Time Analytics Using Apache Kafka and Azure Data Explorer. In a couple of blog posts, I have mentioned how I did a session about Apache Kafka and Azure Data Explorer at the Data Platform Summit 2021. The recordings from the Summit has now been made available for FREE, and the link is to my session. Notice that you need to join Data Platform Geeks unless you are a member already, but it is free, and by joining, you get access to all recordings!
How to Ingest Into Azure Data Explorer From Apache Kafka using Kafka Connect. This post is also from “yours truly”. In the post we look at how to configure and set up Kafka Connect to allow ingestion into Azure Data Explorer.

Streaming

Architecting a Kafka-centric Retail Analytics Platform — Part 1. This post is the first of a series looking at building a Kafka-centric analytics platform that ingests and processes business data at scale. By the way, the author of the post, Dunith Dhanushka, is someone you should follow if you are interested in event driven architecture, streaming, Kafka, etc. He is excellent, and I am subscribing to his writings on Medium!
Stream Governance – How it Works. I have written previously about the new Stream Governance functionality Confluent introduced recently. This post is the first in a series about Stream Governance and how it works: Stream Governance – How it Works. This first post looks at some of the key features of Stream Governance. At Derivco, we are highly interested in the topic of data governance. I will follow this closely!

WIND (What Is Niels Doing)

Well, I have mentioned it before, but:

Figure 1: PASS Session

Yeah, I am delivering Analyze Billions of Rows of Data in Real-Time Using Azure Data Explorer. The session is recorded and will be available for viewing from when the conference starts. Then Thursday, Nov 11 15:15 - 15:45 UTC, (you have to register to access this link), is a virtual live Q&A with me where we discuss Azure Data Explorer.

So, register now, view the recorded video and come and chat to me Thursday, Nov 11 15:15 - 15:45 UTC. The registration is FREE, and besides me, you get to hear from the people that really know what they are talking about!

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.

Interesting Stuff - Week 44, 2021

Machine Learning / Data Science

Azure Data Explorer

Streaming

WIND (What Is Niels Doing)

~ Finally

CATALOG