Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
Big Data
- Real-time Analytics with Presto and Apache Pinot — Part I. I have written in previous roundups about Presto, which is now called Trino, and Apache Pinot. The blog post linked to here is the first in a two part series about how to use Presto and Pinot together. The second part is here.
Data Architecture
- The 3 Things to Keep in Mind While Building the Modern Data Stack. Building a data stack, never mind a modern data stack, can be confusing and complicated. This post proposes a simplified framework for creating the stack. The post looks at a conceptual model to help us when we pick the tools for the stack. I found the post very informative!
- What Is Data Mesh? And Should You Mesh It Up Too?. Recently I have mentioned data meshes quite a lot. Here is another post about data meshes. It looks at what a Data Mesh is, and why more and more companies are looking to implement them.
- How Lakehouses Solve Common Issues With Data Warehouses. In last weeks roundup I linked to a video about data Lakehouses. The post I link to here is the first in a series about Lakehouses, and it is based on this white-paper. I am certainly looking forward to the other posts in the series.
Streaming
- Consuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure. Last week I posted a link about integration between Confluent Cloud and Microsoft Azure. I wrote that I hoped to see blog posts from the Confluent guys, (and girls), where they do “cool stuff” on Azure and not only AWS and Google Cloud. Well ask, and you shall be given! The post linked to here discusses how to configure Azure Databricks to interact with Confluent Cloud so that you can ingest, process, store, make real-time predictions and gain business insights from your data.
- Simplify Kafka at Scale with Confluent Tiered Storage. In October 2020, Confluent announced Confluent Platform 6.0, and how one of the new features was tiered storage. This post looks at how tiered storage works, how to set it up, and performance implications. Very interesting!
~ Finally
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.