Interesting Stuff - Week 32, 2021

Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:

AI/data science
data in general
data architecture
streaming
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.

Big Data / Data Analytics

Building a Low-Latency Fitness Leaderboard with Apache Pinot. The terms user-facing/site-facing analytics are “popping up” more and more. When I first heard it, I was pretty confused (pretty typical for me) about what it means - analytics is analytics, after all. But when reading this post, it dawned on me what it is. However, I won’t “spoil” the explanation here. Apart from explaining what user-facing analytics mean, this post covers using Apache Pinot to ingest fitness band events from a Kafka topic and make them available for immediate querying. Very cool!
‘Orders Near You’ and User-Facing Analytics on Real-Time Geospatial Data. When it rains, it pours, hey? Another post about user-facing analytics and Apache Pinot. In this post, Uber explains the implementation of the ‘Orders Near You’ feature and how they generate insights across geospatial data.
Uber’s Finance Computation Platform. For a company of Uber’s size and scale, it is required to have robust, accurate, and compliant accounting and analytics. The post looks at how they built their own in-house platform - the Finance Computation Platform - to meet their demanding requirements.

Streaming

The New One-Stop Shop for Learning Apache Kafka. This is awesome, awesome, awesome! Did I say it was awesome? OK, Niels, calm down - what is this? The post announces an all-new website dedicated to Apache Kafka, event streaming, and associated cloud technologies. As the title says, the site is really a one-stop-shop for everything Kafka! Have a look at the various courses they offer - it is a gold mine!

WIND (What Is Niels Doing)

Well, apart from spending waaaayyy too much time on Confluent Developer, I am prepping for the upcoming 2021 Data Platform Summit where I am doing one conference presentation:

How to do Real-Time Analytics Using Apache Kafka and Azure Data Explorer: This session shows how to do near-real-time analysis on data streaming from Apache Kafka (running on Confluent Cloud in Azure) using Azure Data Explorer.

In addition to the session above, I am also doing an eight-hour post-con training class (split over two days):

Big Data & Analytics with SQL Server 2019 Big Data Cluster: This training covers big data, data virtualization and analytics in SQL Server 2019 Big Data cluster. There are still some seats left, so you can sign up here if you are interested. Apart from getting to know BDC, an added benefit of signing up is getting a free submission to the summit!

Lately, I have been investigating SQL Server CDC and the use of Debezium to publish data from SQL Server. For my investigations, I have used Kafka running in Docker. Every time I have set this up, I have struggled with deploying the Debezium SQL Server Connector to the Kafka Connect container. I finally decided to write a blog post about so I have something to go back to for next time, and I published the post yesterday:

How to Deploy the Debezium SQL Server Connector to Docker

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.

Big Data / Data Analytics

Streaming

WIND (What Is Niels Doing)

~ Finally

CATALOG