Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
Azure Data Explorer
- Azure Data Explorer Shorts: Managed Ingestion. An excellent short (~9 minutes) video explaining the ins and outs of data ingestion into Azure Data Explorer.
- Apache Kafka and R: Real-Time Prediction and Model (Re)training. This blog post looks at how KStreams, ksqlDB, and R can be used to create a data pipeline in which a machine learning model is applied to streaming data. The post also looks at how the model can be automatically retrained once the prediction results exceed a certain threshold. Very Cool!
- Native Support of Session Window in Spark Structured Streaming. The post linked to, looks at a new window type in the upcoming Apache Spark 3.2 version. Before Spark 3.2, Spark supported tumbling and sliding windows. In the 3.2 version, the session window is introduced. The interesting thing with a session window is that it has a dynamic size of window length depending on the input.
- Introducing Single Message Transforms and New Connector Features on Confluent Cloud. Part of Confluent cloud is managed Kafka Connect connectors, and this post announces new features for most of the managed connectors. I am quite “chuffed” about seeing Single Message Transforms as one such new feature.
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.