Interesting Stuff - Week 9, 2023

Posted by nielsb on Sunday, March 5, 2023

This week it is mostly streaming related “stuff” that has caught my eye. Regardless of that, we start with some Generative AI - or rather prompt engineering, where I link to a post around different types of “frameworks” for prompt engineering. When you read the post you will see why “frameworks” are in quotes.

From that we go into streaming “land” and look at stream processing vs. real-time OLAP databases. There’s an interesting article about building data lakes on AWS using open-source technologies. I round off the streaming with a post about Spark Structured Streaming and REST APIs.

AI/ML

  • Learn To Master Prompt Engineering With This Singular (Triple) Framework. Lately, there has been a lot of “noise” around prompt engineering (I have linked to posts about it here and here). This post covers several aspects of prompt engineering, including the importance of understanding the task at hand, selecting the right dataset, and evaluating the model’s performance. The post also looks at different types of “frameworks” for prompt engineering. Very interesting.

Streaming

  • Stream Processing vs Real-time OLAP vs Streaming Database. This post examines the differences and similarities between stream processing and real-time OLAP databases. The post also provides an overview of the different tools and technologies available for implementing each technique, such as Apache Kafka for Stream Processing and Apache Druid for Real-time OLAP. I found the post good! My only “gripe” was that the author didn’t include Azure Data Explorer in the mix.
  • Building Data Lakes on AWS with Kafka Connect, Debezium, Apicurio Registry, and Apache Hudi. As the title implies, the linked post provides a comprehensive guide to building a data lake on AWS using several open-source tools. The post is a must-read for those interested in data management and analysis.
  • Scalable Spark Structured Streaming for REST API Destinations. Spark Structured Streaming is the engine at the foundation of data streaming on the Databricks Lakehouse Platform. This post describes using Spark Structured Streaming to build a REST API by mapping HTTP requests to Spark Structured Streaming queries. The post discusses how to do this in theory and provides an example of using the API to process real-time data from Twitter. Very cool!

WIND (What Is Niels Doing)

I am still working on the next post in the Develop a Real-Time Leaderboard Using Kafka and Azure Data Explorer series. It is going slower than expected since “real work” has a way of rearing its “ugly” head 😄. I hope to publish the post by the end of this coming week.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.


comments powered by Disqus