Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
Machine Learning
-
A Comprehensive Look at Dates and Timestamps in Apache Spark 3.0. Apache Spark 3.0 was released recently with quite a lot of new features and also changes to existing functionality and data types. This blog post deep dives into the
Date
andTimestamp
types and tries to explain their behavior.
Streaming
- Data Privacy, Security, and Compliance for Apache Kafka. There are increased regulatory demands for protecting personal/sensitive data. The blog post linked to here introduces Privitar Data Privacy Platform, and it looks at how it integrates to Confluent Platform through Privitar Kafka Connector.
- Measuring and Monitoring a Stream Processing Cloud Service: Inside Confluent Cloud ksqlDB. This is the third is a series of posts on enhancements to ksqlDB to enable its offering in Confluent Cloud. Very interesting read!
- Improved Robustness and Usability of Exactly-Once Semantics in Apache Kafka. Back in 2017, Confluent introduced exactly-once semantics in Kafka. Initially, it was met with a healthy dose of scepticism, but after a while, the doubt died away when people saw it working. This post discusses the recent improvements on exactly-once semantics (EOS) to make it simpler to use and more resilient. We are getting there, my friends!
~ Finally
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.
comments powered by Disqus