Interesting Stuff - Week 39, 2022

Posted by nielsb on Sunday, October 2, 2022

Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:

  • AI/data science
  • data in general
  • data architecture
  • streaming
  • distributed computing
  • SQL Server
  • transactions (both db as well as non db)
  • and other “stuff”

This blog post is the “roundup” of the things that have been most interesting to me for the week just ending.

Distributed Systems

  • An overview of distributed systems architectures. This post is the second in a series on system design. I found this post really useful as an introduction. Very good! Also, look at the links at the right of the post - some excellent “stuff” in there!

Azure Data Explorer

  • Ingesting Protobuf data from Kafka to Azure Data Explorer. So, we all know about Protocol buffers (Protobuf) and how it is a language and platform-neutral extensible mechanism for serializing and deserializing structured data. The Protobuf protocol is becoming increasingly popular in the IoT world, especially since Apache Kafka supports Protobuf. In the IoT world, we also see Azure Data Explorer being popular as the choice for data processing and analytics. To ingest data from Kafka into Azure Data Explorer, we use the ADX Kafka sink connector. This post looks at how to set up and configure the ADX Kafka sink connector for ingesting Protobuf serialized data.

Streaming

  • Comparing Stateful Stream Processing and Streaming Databases. When you build an event-driven stream processing system, the question is whether you should use a stateful stream processor or a streaming database. This post aims to clarify stream processing vs streaming database. After reading the post, you should know how these technologies work internally, their differences, and when to use them.
  • Getting Started with the Confluent Terraform Provider. Terraform is an open-source infrastructure-as-code tool that lets you build, change, and version your cloud or on-prem data infrastructure safely and efficiently. Back in July, Confluent announced support for a Terraform provider for Confluent Cloud. This post takes a closer look at what the provider supports and how to use it.
  • Exploring Popular Open-source Stream Processing Technologies: Part 1 of 2. This post is part 1 of a two-part series demonstrating stream processing using: Apache Spark Structured Streaming, Apache Kafka Streams, Apache Flink, and Apache Pinot with Apache Superset! The post is awesome! Part of what makes it extraordinary is the accompanying Streaming Synthetic Sales Data Generator, a Python project for generating synthetic data. I have forked the project and will use it whenever I need to generate streaming data.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.


comments powered by Disqus