Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog post is the “roundup” of the things that have been most interesting to me for the week just ending.
- A New Microsoft Platform in Town: the Microsoft Intelligent Data Platform. This InfoQ post looks at the recently introduced Microsoft Intelligent Data Platform. This new platform fully integrates Microsoft’s database, analytics, and governance offerings. It encompasses everything already available in the Azure Data space (Azure Data Factory, Azure Data Explorer, etc.) to the Synapse Analytics products, Power BI, and the newly rebranded Purview data governance service.
- Announcing the Availability of Data Lineage With Unity Catalog. This post talks about the Databricks Data Lineage offering using the Unity Catalog. It looks at how the Unity Catalog provides automated and real-time data lineage at a granular level for all workloads (SQL, R, Python, Scala) and across all asset types (notebooks, workflows, dashboards). Very cool!
- Debezium to Snowflake: Lessons learned building data replication in production. This blog looks at lessons learned when using Debezium to replicate data at scale in near real-time to Snowflake. There are some very useful tidbits in the post, even though we don’t use Snowflake at Derivco!
- How to Elastically Scale Apache Kafka Clusters on Confluent Cloud. In last week’s roundup, I mentioned Confluent Cloud’s elasticity compared with on-prem Kafka, and we saw how much better it was in Confluent Cloud. The post here covers how we can easily resize a Confluent Cloud cluster and how it works internally. Very interesting!
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.