Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
Big Data / Data Analytics
- Building a Low-Latency Fitness Leaderboard with Apache Pinot. The terms user-facing/site-facing analytics are “popping up” more and more. When I first heard it, I was pretty confused (pretty typical for me) about what it means - analytics is analytics, after all. But when reading this post, it dawned on me what it is. However, I won’t “spoil” the explanation here. Apart from explaining what user-facing analytics mean, this post covers using Apache Pinot to ingest fitness band events from a Kafka topic and make them available for immediate querying. Very cool!
- ‘Orders Near You’ and User-Facing Analytics on Real-Time Geospatial Data. When it rains, it pours, hey? Another post about user-facing analytics and Apache Pinot. In this post, Uber explains the implementation of the ‘Orders Near You’ feature and how they generate insights across geospatial data.
- Uber’s Finance Computation Platform. For a company of Uber’s size and scale, it is required to have robust, accurate, and compliant accounting and analytics. The post looks at how they built their own in-house platform - the Finance Computation Platform - to meet their demanding requirements.
- The New One-Stop Shop for Learning Apache Kafka. This is awesome, awesome, awesome! Did I say it was awesome? OK, Niels, calm down - what is this? The post announces an all-new website dedicated to Apache Kafka, event streaming, and associated cloud technologies. As the title says, the site is really a one-stop-shop for everything Kafka! Have a look at the various courses they offer - it is a gold mine!
WIND (What Is Niels Doing)
- How to do Real-Time Analytics Using Apache Kafka and Azure Data Explorer: This session shows how to do near-real-time analysis on data streaming from Apache Kafka (running on Confluent Cloud in Azure) using Azure Data Explorer.
In addition to the session above, I am also doing an eight-hour post-con training class (split over two days):
- Big Data & Analytics with SQL Server 2019 Big Data Cluster: This training covers big data, data virtualization and analytics in SQL Server 2019 Big Data cluster. There are still some seats left, so you can sign up here if you are interested. Apart from getting to know BDC, an added benefit of signing up is getting a free submission to the summit!
Lately, I have been investigating SQL Server CDC and the use of Debezium to publish data from SQL Server. For my investigations, I have used Kafka running in Docker. Every time I have set this up, I have struggled with deploying the Debezium SQL Server Connector to the Kafka Connect container. I finally decided to write a blog post about so I have something to go back to for next time, and I published the post yesterday:
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.