Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
Data & Data Architecture
- The State of Data Infrastructure Landscape in 2022 and Beyond. This post looks at the evolution of the data infrastructure landscape over the last decade. It then goes on to look at key trends and what can be expected from now and onwards.
- The Next Evolution of the Database Sharding Architecture. In this InfoQ article, the author discusses the data sharding architecture patterns in a distributed database system. She explains how the Apache ShardingSphere project solves the data sharding challenges. Also discussed are two practical examples of how to create a distributed database and an encrypted table with DistSQL. Very, very interesting!
Azure Data Explorer
- Big Data Analytics using Azure Data Explorer. Learn from the masters of Azure Data Explorer. This link is to register for an Azure Data Explorer talk by the Azure Data Explorer gurus Minni Walia and Uri Barash5. Read the abstract on the sign-up page to see all the goodies you’ll hear about. See you there!
- Azure Data Explorer offers on AMD SKUs. The post linked to covers “what it says on the tin”; it talks about how we can now run Azure Data Explorer on AMD SKUs. This is cool as we can now gain higher performance while keeping the costs low.
- Confluent Streaming for Databricks: Build Scalable Real-time Applications on the Lakehouse. This post is about the fully managed Confluent connector against Databricks Delta Lake: the ability to ingest directly into Delta Lake tables from Kafka topics. I got really excited when I saw this post because this would solve some issues for us: us as in Derivco. Unfortunately, we cannot use it yet, as it is AWS only - for now. Regardless of that, this is really cool!
- Announcing ksqlDB 0.23.1. A new version of ksqlDB is out in the wild! Some of the new exciting features are: perform pull queries on streams, access topic partition and offset through pseudo-columns, and use grace periods when joining streams. All very cool!
- Scaling Kafka Consumer for Billions of Events. This post provides a “ton” of helpful information about configuring Kafka consumers for optimal throughput. I have made this post a mandatory read for my developers that write Kafka applications.
- Transform your Kafka data into real-time insights. The post looks at ksqlDB recipes for the most popular stream processing use cases. Each recipe provides a set of ksqlDB queries you can run to process real-time data streams and take immediate action.
WIND (What Is Niels Doing)
In last week’s roundup, I mentioned how I had started writing a post using Debezium and Kafka Connect to publish events to Event Hubs. The one blog post turned into two, and I published both during last week:
- How to Stream Data to Event Hubs from Databases Using Kafka Connect & Debezium in Docker - I. In this post I looked at the configuration of Kafka Connect in
docker-compose.ymlto enable connection to Event Hubs.
- How to Stream Data to Event Hubs from Databases Using Kafka Connect & Debezium in Docker - II. This, the second post, concluded the “adventure” of streaming data to Eent Hubs using Debezium and Kafka Connect. More specifically, I looked at the configuration of the Debezium connector and the various properties required to push data to Event Hubs.
When I started with the two posts above I was not 100% sure it would work (publishing to Event Hubs using Kafka Connect). Fortunately it turned out that, yes - you can use Kafka Connect to publish to Event Hubs.
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.