Interesting Stuff - Week 46, 2021

Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:

AI/data science
data in general
data architecture
streaming
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.

Distributed Computing

Azure Chaos Studio. At Microsoft Ignite, a week or two ago, Microsoft announced the public preview of Azure Chaos Studio. Azure Chaos Studio is a fully-managed experimentation service to help customers track, measure, and mitigate faults with controlled chaos engineering to improve the resilience of their cloud applications. This looks very interesting, and we will definitely have a look at it.

Azure Data Explorer

Long-term security log retention with Azure Data Explorer. Having access to long-term security logs is essential. Querying long-term logs is critical for identifying the impact of threats and investigating illicit access attempts. This post outlines a solution for long-term retention of security logs where Azure Data Explorer is at the core of the architecture.
Query past data with hot windows. Azure Data Explorer has the notion of hot and cold data. Hot data is stored on SSD’s on cluster nodes, whereas cold data is stored in Azure Blob Storage. Hot data offers the best query performance: an order of magnitude more performant than cold data. Sometimes you may want to query the hot data together with some of the cold. This post looks recently added functionality to Azure Data Explorer, creating a time window in the past which we want to be part of the hot data: Hot Window.
Train your Model on Spark/Databricks, score it on ADX. Recently, I have been doing conference talks around Azure Databricks and Apache Spark and Azure Data Explorer. How cool would it be if you could combine the two?! The post linked to does just that. It looks at training and creating Machine Learning models using Azure Databricks and Spark and then using those models from Azure Data Explorer. Very cool! Oh, BTW - with Azure Data Explorer Pool’s being made available in Azure Synapse, you no longer need Azure Databricks. You can do the same thing with Azure Synapse Analytics. The Azure Synapse Analytics - Operationalize your Spark ML model into Data Explorer pool for scoring post looks at that.

Streaming

Building Real-Time Hybrid Architectures with Cluster Linking and Confluent Platform 7.0. Confluent recently released Confluent Platform 7.0, and this post looks at one of the new features in detail, the ability to directly connect clusters and mirror topics from one cluster to another: Cluster Linking. This is something that we at Derivco are really interested in.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.

Distributed Computing

Azure Data Explorer

Streaming

~ Finally

CATALOG