Interesting Stuff - Week 3, 2023

So, what do we have for Week 3, 2023?

A mixed bag: cool visualizations in Azure Data Explorer using Plotly. Interesting about the benefits and challenges of distributed computing. Comparison of different SQL-based streaming approaches and their use cases, including a look at a streaming database. An excellent post about Time in Flink. As a side note, I wish Flink’s time functions supported a more granular precision than milliseconds.

Azure Data Explorer

Plotly visualizations in Azure Data Explorer. Azure Data Explorer (ADX) supports various data visualizations, including time, bar and scatter charts, maps, funnels and many more. The visualization you chose can be rendered using the render operator in KQL or selected when you build an ADX dashboard. This post announces the support for advanced interactive visualizations using the Plotly graphics library. I am not a visualization guy, but this sounds awesome!
Kusto Detective Agency (Part 3) - Challenge 2: Election fraud?. The Kusto Detective Agency is an interactive big data contest with five different challenges, where you use Kusto Query Langauge to solve the cases. In the Christmas New Year roundup, I wrote how I had worked through and solved the challenges. The challenge which gave me the most problem was Challenge 2: Election Fraud. The linked post looks at that challenge and proposes one way of solving it (I did it in another way). If you want to learn KQL in a fun way, join the Kusto Detective Agency.

Distributed Computing

Distributed Computing Concepts. This post discusses the concept of distributed computing and how it allows for the distribution of tasks across multiple computers or devices. This can increase efficiency and reduce the workload on a single machine. The article mentions different types of distributed systems, such as peer-to-peer networks, grid computing, and cluster computing, and also covers the benefits and challenges of distributed computing. I especially like the part where the author discusses network partitions.

Streaming

Comparing SQL-based streaming approaches. This article is from back in April 2022, and I must have missed it somehow back then. Anyway, the article compares different SQL-based streaming approaches and their use cases. The author discusses Apache Kafka, Apache Flink, Apache Spark, and the streaming database Materialize and compares their features and capabilities. The article highlights each approach’s pros and cons and provides guidance on when to use them. The article also points out that the best approach for data streaming depends on the specific use case and requirements.
Flink SQL: Queries and Time. Since I have started looking into Flink, more specifically Flink SQL, this post comes at the right time. It discusses the use of Flink SQL for querying time-based data and the various time attributes available in Flink SQL. The article explains that Flink SQL supports event time and processing time, which are used to determine when events occurred and when they were processed, respectively. It also mentions the concept of watermarks and how they are used to ensure that late-arriving events are handled correctly. The article provides examples of how to use these time attributes in Flink SQL queries and covers other time-related features of Flink SQL, such as windowing and sessionization. Very good!

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.

Azure Data Explorer

Distributed Computing

Streaming

~ Finally

CATALOG