Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog post is the “roundup” of the things that have been most interesting to me for the week just ending.
Azure Data Explorer
- Use fresh and unlimited volume of ADX data (Kusto) from your favorite analytic tool - Excel pivot. For a while, Excel has had the ability to query data in Azure Data Explorer - yay! The slightly negative aspect of this was that typically data in ADX is counted in billions, and to query the data in Excel, you had to bring the data into Excel first - goodbye memory! The post linked to looks at a way to query ADX data in real-time without importing any data and without any volume limitations.
Data Science / Machine Learning / AI
- 6 Papers Every Modern Data Scientist Must Read. This post brings up a point which is very dear to me - that to be on top of your chosen technologies, you need to understand the foundation and building blocks of said technologies. Anyway, the technology referred to in this post is deep learning, and the post lists white papers covering some of the most essential modern fundamentals of Deep Learning everyone in the field show be familiar with.
Data Architecture / Big Data
- Building a Data Mesh Architecture in Azure – Part 13 – How To Organise Data Domains In Practice. This post is part of a series on how to build a Data Mesh on Azure. As the post’s subtitle implies, this “episode” looks at data domains. Use this link if you are interested in reading more in this series. As a side note - there has been quite a debate (link here) around this implementation vs. the original data mesh “principles” as outlined by Zhamak Dehghani here.
- Announcing the Preview of Serverless Compute for Databricks SQL on Azure Databricks. Databricks SQL is a serverless data warehouse on the Databricks Lakehouse Platform. It was initially previewed in late 2020, and this post announces the public preview on Azure. We at Derivco have been waiting for this!
- Getting Started with Database Modernization. Before you read the post linked to, I suggest you read Accelerate Cloud Database Modernizations and Migrations with Confluent, which starts looking at the journey to easily migrate data to any cloud database for real-time data streaming, integration, and analytics. I suggest you read that post before this because this post builds on that one and looks at the steps you need to take to get started on your database modernization journey.
- Data Mesh — A Data Movement and Processing Platform @ Netflix. Another post about Data Mesh, but in this case, the Data Mesh is not what we would think it is, i.e. what’s mentioned above. Data Mesh is the Netflix Data Mesh, a fully managed streaming data pipeline product. The Data Mesh was introduced around a year ago. Since then, it has evolved, and this post is the first in a series covering different aspects of Data Mesh and lessons learned throughout the journey.
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.