Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- streaming
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
Distributed Computing
- Saga Orchestration for Microservices Using the Outbox Pattern. The last few weeks, at Derivco, I have been ~playing around~ researching the use of CDC, Debezium and the outbox pattern (a blog post or two may come soon). I’ve been looking at it in relation to publishing events from the database. It was then interesting to come across this post discussing CDC and Debezium and how these technologies combined can be used for implementing the SAGA pattern. Very cool!
Data Architecture
- Data Movement in Netflix Studio via Data Mesh. I have previously covered posts discussing Data Mesh. In this post, Netflix talks about their Data Mesh. Data Mesh, in this context, is a fully managed, streaming data pipeline product used for enabling Change Data Capture (CDC) use cases. The post is very informative, and there are quite a few concepts worth investigating!
Big Data
- Pinot Real-Time Ingestion with Cloud Segment Storage. This post, by Uber, discusses how Uber added a deep store to Pinot’s real-time ingestion protocol.
- Getting started with Azure Data Explorer and Azure Synapse Analytics for Big Data processing. Azure Data Explorer is a fully managed data analytics service that can handle large volumes of diverse data from any data source, such as websites, applications, IoT devices, and more. This post looks at leveraging integration between Azure Data Explorer and Azure Synapse for processing data with Apache Spark.
Streaming
- Making Apache Kafka Serverless: Lessons From Confluent Cloud. From a developers perspective, serverless in the cloud is awesome and easy to use. However, the system designer and the engineer who has to design and implement a serverless system have challenges. This post starts with looking at the confluent cloud architecture and then dives into how some of the difficulties mentioned above have been overcome.
- Speed, Scale, Storage: Our Journey from Apache Kafka to Performance in Confluent Cloud. Hmm, Confluent Cloud seemed popular this week. This post looks at optimizing Apache Kafka for Confluent Cloud. Even if you are not interested in the cloud, the post is full of good advice and best practices. Oh, and I have to look at the test framework mentioned in the post: Trogdor.
~ Finally
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.
comments powered by Disqus