Interesting Stuff - Week 44, 2020

Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:

  • data science
  • data in general
  • distributed computing
  • SQL Server
  • transactions (both db as well as non db)
  • and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.

Big Data

  • Helios: hyperscale indexing for the cloud & edge – part 1. In this post Adrian from the morning paper dissects a white-paper about Helios. Helios is a distributed, highly-scalable system used at Microsoft for flexible ingestion, indexing, and aggregation of large streams of real-time data that is designed to plug into relational engines. Adrian is as thorough as usual, and the conclusions he draws are very interesting. I can’t wait for part 2.

Distributed Systems


  • Preparing Your Clients and Tools for KIP-500: ZooKeeper Removal from Apache Kafka. The Kafka community has for quite a while been talking about removing the dependency of ZooKeeper, (ZK), from Kafka, and it seems we are getting closer. In the post I have linked to here, the author looks at what is needed to do in Kafka consumers so that nothing “bad” happens when ZK is eventually removed.
  • Streaming Machine Learning with Kafka-native Model Deployment. Kafka is used more and more for real-time machine learning purposes, and we are moving towards Kafka as a native streaming model server. This blog post explores the architectures and trade-offs between various options for model deployment with Kafka.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.

