Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
- Building a Data Platform in 2021. This post looks at building a modern, scalable data platform to power analytics and data science projects. It was a handy read for me, as we are looking at these things in Derivco at the moment.
- What Is Starburst Data And Why You Should Use It – Data Engineering Consulting. Trino, (the “artist” formerly known as PrestoDB) is an open-source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. As Trino was developed as a bare-bones SQL Engine, you’ll have to manage scaling, security, monitoring, and create new connections on your own. That is where Starburst Data comes in, it makes quite a few things a lot easier for using Trino, and this post looks more in detail at what Starburst Data can do.
- Under the Hood of Real-Time Analytics with Apache Kafka and Pinot. Recently I have linked to posts about Apache Pinot, and I do so here again. This post looks at the inner workings of Kafka and Pinot when using them together. Very interesting!
- Disaster Recovery for Multi-Region Kafka at Uber. Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. They want to provide a scalable, reliable, performant, and easy-to-use messaging platform on top of Apache Kafka. This article highlights how they solved recovering from disasters like cluster downtime, and it also describes how they built a multi-region Apache Kafka infrastructure.
- Integrating Apache Kafka Clients with CNCF Jaeger at Funding Circle Using OpenTelemetry. A key challenge in a Kafka based microservice architecture is understanding the system as a whole due to the decentralized nature and constant evolution of new and existing services. This post covers the basics for understanding what options are available for Apache Kafka telemetry when it comes to distributed tracing.
- How to Tune RocksDB for Your Kafka Streams Application. When building Kafka Streams applications holding state, Kafka Streams uses local state stores that are made fault-tolerant by associated changelog topics stored in Kafka, and RocksDB backs these stores. The blog post linked to covers key concepts that show how Kafka Streams uses RocksDB to maintain its state and how RocksDB can be tuned for Kafka Streams’ state stores.
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.