Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
- Evolution to the Data Lakehouse. Data lakehouses have been a hot topic the last year or two, and Databricks, with its lakehouse implementation Delta Lake has been at the forefront. The post linked to looks - as the title implies - at the evolution from data warehouses to data lakes to data lakehouses. Very interesting!
- 6 Event-Driven Architecture Patterns — Part 1. This post is the first of two. The author looks at key patterns of event-driven messaging designs that have facilitated creating a robust distributed system that can easily handle increasing traffic and storage needs. The second part of the series is here.
- “Harder, Better, Faster, Stronger”: Apache Pinot as a Kafka Consumer and Datastore for Fast On-the-Fly Aggregations. You who read my blog have probably noticed that I am quite partial to Kafka and Apache Pinot. Well, in this blog post, we get the best of both worlds. It covers how to use Apache Pinot to do aggregations in “near” real-time. I found the post very interesting.
- Error Handling Patterns for Apache Kafka Applications. In one of the teams I worked at in Derivco, we had a saying when we were going to do something we were not 100% sure of: How hard can it be, what can possibly go wrong?. In some cases, quite a lot could go wrong, and if we look at distributed systems, what can go wrong often goes wrong. This blog post covers different ways to handle errors and retries in event streaming applications.
Figure 1: Big Data & Analytics
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.