Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog post is the “roundup” of the things that have been most interesting to me for the week just ending.
- This Is How You Can Build a Churn Prediction Model Using Apache Spark. At Derivco, we’re looking at creating ML models for churn, so lately, I’ve been reading posts around this subject. I came across this post, a tutorial on building a churn prediction classifier using the ML stack from Spark. Very interesting and informative!
- Introduction to Confusion Matrix. When I first heard the term confusion matrix, I thought it was called so because you became confused trying to interpret it, 😄. Anyway, this post tries to explain what a confusion matrix is and how you can plot one using Python. When reading the post, remember that the algorithm used is to predict someone is sick.
- Raft - Understandable Distributed Consensus. Raft is a consensus algorithm that is designed to be easy to understand. Going forward, Kafka will use Raft instead of ZooKeeper. The page linked to is a visualization of how Raft works. My first thought was that “this looks very much like a distributed transaction”. Anyway, the visualization is very cool!
- Patterns of Distributed Systems. If you think: “has Niels not linked to this before”, you are absolutely correct. In 2020 I linked to this in one of my roundups. At that stage, the compendium about the patterns for distributed systems was a work in progress. Now it is completed, and if you are interested in distributed systems, you must read it!
Azure Data Explorer
- Check it out: Azure Data Explorer MS Learn modules. The title says it all! Azure Data Explorer (ADX) has new MS Learn modules. If you want to learn ADX - check them out!
- Keeping Multiple Databases in Sync Using Kafka Connect and CDC. This blog post looks at how to keep databases in sync using Kafka technologies and CDC. It reviews the advantages and disadvantages of using JDBC and CDC for moving data. It then explores the real use case of how a legacy bank used Kafka Connect to bridge the silos and keep multiple applications/databases in sync.
- Error Handling with Apache Kafka extension for Azure Functions and more!!. The Kafka extension for Azure functions was released recently with some cool features. As a side note, we (Derivco) are using it - awesome! Anyway, this post looks at some features in the Kafka extension. I particularly like the error handling (not that my code has any errors, but … 😄).
WIND (What Is Niels Doing)
Figure 1: Azure Data Explorer
The Azure Durban User Group are back to in-person meetings - yay! This Wednesday (Sep 28), I am doing an Azure Data Explorer presentation.
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.