Interesting Stuff - Week 5, 2020

Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:

This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.

Data Science / Machine Learning

CI/CD for Machine Learning. The presentation this links to is an InfoQ presentation where the presenter discusses the challenges with CI/CD for machine learning and shows how a CI/CD pipeline for Machine Learning can greatly improve both productivity and reliability.

The ultimate performance for your big data with SQL Server 2019 Big Data Clusters. This post summarizes a Microsoft white paper discussing the performance of SQL Server 2019 Big Data Cluster. After I read the post, I went back and looked at the white paper. The Big Data Cluster offers quite impressive performance, I must say!

Microservices architecture on Azure Kubernetes Service (AKS). The link here is to a Microsoft document covering a reference architecture for microservices applications running on Azure Kubernetes Service. I found the document quite interesting, and I hope to be able to do some POC’s around this shortly.

How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. This is a very interesting post, looking at the state of today’s enterprise data architecture. It is a must-read for anyone interested in the subject.
What is a Lakehouse?. The post linked to here is similar to the one above in that it looks beyond data lakes. From the post: “The lakehouse is a new data management paradigm that radically simplifies enterprise data infrastructure and accelerates innovation in an age when machine learning is poised to disrupt every industry.”.

Streaming Machine Learning with Tiered Storage and Without a Data Lake. Once again, a post which discusses data lakes, or rather the lack thereof. This post introduces a new feature in Kafka: the ability to add external storage to a Kafka broker. A very interesting topic, (pun intended), and this definitely moves Kafka towards being a complete data store. My only concern when thinking about this is how to query the data from Kafka? I guess time will tell.
Streams and Monk – How Yelp is Approaching Kafka in 2020. This is a very interesting post, in that it describes how Yelp moves towards data as a service using Kafka and some internal applications. I will recommend this post to the people at Derivco working with Kafka.

I just came back from the Johannesburg leg of Microsoft Ignite The Tour.

I want to thank the ones of you that came to my sessions, you guys rocked!

At the moment I am cleaning up my presentation decks and the demo code. I’ll publish it for download in a couple of days time.

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.