Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
SQL Server 2019 Big Data Cluster
- What’s new with SQL Server Big Data Clusters—CU10 release. This post announces the release of SQL Server 2019 Big Data Cluster CU10 and some of the new and improved functionality. As soon as I have time, I will install it and “take it for a ride”.
- Data Discovery: The Future of Data Catalogs for Data Lakes. The post linked to here discusses how we can prevent our data lakes from becoming data swamps. The key to this is data discovery and data catalogs. I like this post, and it has given me a lot to think about.
- Introduction to Upserts in Apache Pinot. In version 0.6 of Apache Pinot, a new feature was made available for stream ingestion, allowing you to upsert events from an immutable log. You may be familiar with upserts from the database world, however in Apache Pinot, an upsert is somewhat different than what you have in a database, and this post looks at what it is and why it is exciting.
- Integrating whylogs into your Kafka ML Pipeline. WhyLogs is an open-source data quality library that uses advanced data science statistics to log and monitor data used in AI/ML applications. This blog post looks at how we can integrate WhyLogs in Kafka to evaluate, monitor and detect statistical anomalies in streaming data. This is very interesting!
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.