Interesting Stuff - Week 40, 2022

Posted by nielsb on Sunday, October 9, 2022

Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:

  • AI/data science
  • data in general
  • data architecture
  • streaming
  • distributed computing
  • SQL Server
  • transactions (both db as well as non db)
  • and other “stuff”

This blog post is the “roundup” of the things that have been most interesting to me for the week just ending.

SQL Server 2022

  • Data Virtualization with PolyBase for SQL Server 2022. One of the big things back in the day (well, maybe a year or two ago) fro SQL Server 2019 Big Data Cluster (BDC) was the Polybase support for External Tables against a lot of data sources. That support has now been introduced in your “normal” SQL Server 2022, and this post looks at this new PolyBase and what you can do with it.

Data Architecture

  • The InfoQ eMag: Modern Data Architectures, Pipelines, & Streams. This InfoQ post contains a download link for an eMag book: Modern Data Architectures, Pipelines, & Streams. I downloaded the book and found it useful. The book looks at up-to-date case studies and real-world data architectures. Very cool!
  • Data lake architecture. This post gives an excellent overview of the various parts of a data lake. It looks at things like the raw data layer, cleansed data layer, and presentation data layer and links to useful resources.

Streaming

  • What’s New in Apache Kafka 3.3. I guess the title says it all. Kafka 3.3 was just released, and this post looks at some of the new features. The big one in this release is that KRaft is production ready. I.e. you can now use KRaft instead of Zookeeper as your metadata controller.
  • Introducing Stream Designer: The Visual Builder for Streaming Data Pipelines. What this post looks at is something I can’t wait to start “playing” with, the Stream Designer. The Stream Designer is a visual interface for rapidly building, testing, and deploying streaming data pipelines natively on Kafka. It is of particular interest as we at Derivco are now doing some very interesting “stuff” related to data pipelines.

WIND (What Is Niels Doing)

This:

Figure 1: Azure Data Explorer Ingestion

The next meeting at Azure Durban User Group is held this Wednesday (Oct 12). At this meeting I am continuing my Azure Data Explorer investigations and we look at how to ingest data into ADX. Things I cover are:

  • Batch Ingestion
  • Streaming Ingestion
  • Ingestion from Event Hubs.

Come and join us if you are in the ‘hood. The event is FREE, and you register here. See you on Wednesday!

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.


comments powered by Disqus