Interesting Stuff - Week 13, 2023

Posted by nielsb on Sunday, April 2, 2023

Some interesting stuff this week: a “cool” post about Polar bears vs Pandas, no doubt who’ll win. Read about how to ingest data into Azure Data Explorer - very interesting. On the streaming side really interesting about how to do Canary releases for streaming platforms. Also something extremely interesting: Machine Learning and Data Streaming! Have fun!!

AI/ML

  • The 3 Reasons Why I Have Permanently Switched From Pandas To Polars. In this post, the author shares their experience of switching from the Python library Pandas to the newer and faster Rust-based library called Polars for data manipulation and analysis. The author outlines three main reasons for the switch: the performance benefits of Polars, better memory management capabilities, and more modern and concise syntax. The author also provides some code examples and benchmarks to demonstrate the superiority of Polars over Pandas in certain use cases.

Azure Data Explorer

  • Programmatically ingest data into Azure Data Explorer. Since, as you probably know by now 😄, I am “fond” of Azure Data Explorer, this post was very interesting. In the post, the author discusses how to programmatically ingest data into Azure Data Explorer (ADX). The author explains the ADX data model and how data can be stored in tables and databases. The post includes code examples and step-by-step instructions on setting up the necessary environment and configurations for the script to work. The author also provides tips and best practices for data ingestion into ADX.

Streaming

  • Canary release with Kafka. The author discusses implementing a canary release strategy for data streaming in this blog post. The author describes how to set up a canary release pipeline using Kafka by creating two separate Kafka topics for the canary and production environments and routing a portion of the traffic to the canary topic. Very interesting!
  • Uniting the Machine Learning and Data Streaming Ecosystems - Part 1. The author discusses integrating machine learning (ML) and real-time data streaming using Apache Kafka and its ecosystem in this post. The post first outlines the benefits of combining ML with streaming data, such as improved prediction accuracy and real-time decision-making capabilities. The author then describes how Kafka can be a central platform for collecting, processing and delivering streaming data to ML models using the Kafka Connect framework and the KSQL streaming SQL engine. This post is extremely interesting for us at Derivco right now!

WIND (What Is Niels Doing)

Earlier today, I finished the third post in the Develop a Real-Time Leaderboard Using Kafka and Azure Data Explorer series. It is not published yet; I need to give it a “once-over”. Expect the post to be published in a couple of days.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.


comments powered by Disqus