Interesting Stuff - Week 9, 2020

Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:

data science
data in general
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.

Azure

Tips for Creating Custom Azure DevOps Build Tasks. At Derivco we have started using Azure DevOps in earnest, so this post by Travis comes in real handy.

Big Data

Data Mesh Paradigm Shift in Data Platform Architecture. In my weekly roundup for week 5, I mentioned a post about the state of today’s data architecture. The InfoQ presentation I link to here, is done by the same person that wrote the post, and the presentation is essentially the content of the blog post, (or the other way around :)).

Machine Learning / Data Science

How to embed a Spark ML Model as a Kafka Real-Time Streaming Application for Production Deployment. This is a very interesting post. As the title says, it covers the use of Spark ML together with Kafka, and how a streaming application can make ML predictions in real-time.

Streaming

How to implement retry logic with Spring Kafka. An informative post with ideas on how to implement retry logic, (exactly as the title says).
99th Percentile Latency at Scale with Apache Kafka. The post linked to here is a must-read for you who want to get the best performance out of your Kafka clusters. The post discusses how to configure Kafka to minimize latency.
Introducing Confluent Developer. In this post, Confluent’s director of developer relations, Tim Berglund, introduces the goto place for everything Kafka - Confluent Developer. I’ve had a look over the weekend, and it is a treasure trove of material to go through if you are into Kafka.

WIND (What Is Niels Doing)

The title in this section is not entirely correct as this is not so much about what I am doing right now as it is what I did the week just passed.

Anyway, the week just passed, I did two webinars for DataPlatformGeeks, (DPG):

Data Virtualization in SQL Server 2019 Big Data Cluster. Where we look at how we do data virtualization in SQL Server 2019 Big Data Cluster.
Deep Dives into the Storage and Data Pools in SQL Server 2019 Big Data Cluster. A closer look at the storage and data pools in SQL Server 2019 Big Data Cluster.

I recommend all of you to register with DPG, as they have a plethora of free learning resources! Oh, and they also run a yearly conference; Data Platform Summit. I hope to be able to deliver a couple of sessions at the conference this year.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.