Throughout the week, I read a lot of blog posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- streaming
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog post is the “roundup” of the things that have been most interesting to me for the week just ending.
Machine Learning / Data Science / AI
- Building Real-Time ML Pipelines with a Feature Store. The term Feature Store is gaining popularity in the Machine Learning world. It is - as the name implies - something that stores feature data. However, it also runs pipelines that transform raw data into feature values, and it serves feature data for training and inference purposes. Most feature stores are batch-oriented, but they must move beyond batch and also become able to handle real-time data. This blog post looks at transitioning from batch to real-time.
- Data + AI Summit Is Back. This post leads to a link for registration for the North American leg of Data + AI Summit. The schedule looks awesome, and I’ll definitely register!
Data Architecture
- Open sourcing Querybook, Pinterest’s collaborative big data hub. Pinterest is a data-driven company, and it is more important than ever for teams to be able to compose queries, create analyses, and collaborate with one another. To enable that, Pinterest built Querybook. This post looks at what Querybook is and how they got to the point of open-sourcing it.
- Top Questions from Customers about Delta Lake. Databricks Delta Lake is a hot topic, and many people have questions about it. This post aims to answer some of those questions.
Streaming
- Apache Kafka Made Simple: A First Glimpse of a Kafka Without ZooKeeper. There has been lots of talk about removing ZooKeeper as a dependency for Kafka. Finally, we are almost there, and the upcoming Kafka release will have the ability to run without ZooKeeper - yay! The blog post linked to looks at the implications of the removal and its impact on - among other things - scalability and performance (spoiler alert: improvements!). Very cool “stuff”!
- Monitoring Your Event Streams: Integrating Confluent with Prometheus and Grafana. Managing and monitoring a system like Kafka, is not and easy feat. But there are help; this is part 1 of a three-part blog series that will explain how to effectively monitor your event streams. This post looks at integration with third party tools such as Prometheus and Grafana.
~ Finally
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.
comments powered by Disqus