Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
- Microsoft Build 2020: Highlights. A couple of weeks ago Microsoft held its annual developer conference, Build. Due to the pandemic, it was a virtual 48 hour, around the clock conference, and it was free. The article linked to here is an InfoQ article listing noteworthy “stuff” from the conference. As a side note: I wonder how much changes we will see in the conference landscape after when this pandemic is over. Will there be “on-prem” conferences any more, or will most conferences go virtual?
- Apache Arrow and Java: Lightning Speed Big Data Transfer. So Apache Arrow is a cross-language, cross-platform, columnar in-memory data format for data. The InfoQ article linked to here introduces Apache Arrow and gets you acquainted with the basic concepts of the Apache Arrow Java library. If you’re not a Java developer, don’t worry - Apache Arrow offers libraries for many other languages as well: C, C++, C#, Go, to name a few.
- Modernizing Risk Management Part 1: Streaming data-ingestion, rapid model development and Monte-Carlo Simulations at Scale. This blog post demonstrates how to modernize traditional value-at-risk (VaR) calculation. It is demonstrated in the post, by using various components of the Databricks Unified Data Analytics Platform — Delta Lake, Apache SparkTM and MLflow. This enables a more agile and forward-looking approach to risk management.
- Building a Clickstream Dashboard Application with ksqlDB and Elasticsearch. This post shows an example of how you can build an event-driven application to help you unlock insights contained in the event streams of your business. As the title implies, Kafka, ksqlDB, and Elasticsearch are the components used. Very cool post!
- Best Practices to Secure Your Apache Kafka Deployment. So you are building your event streaming platform on Kafka - awesome! When you are ready to go live, your “pesky” security department asks you about security: “how is the data that flows through Kafka secure, what have you done to prevent data breaches?”. Trust me, they will ask you! Fear not, the post I link to here reviews five security categories and the essential features of Kafka and Confluent Platform that enable you to secure your event streaming platform. If you work with Kafka, READ the post!
- Derivco Webinar - Kafka Masterclass. As I mentioned in last weeks roundup, Charl Lamprecht, and I were going to do a Derivco Webinar about Kafka: Conquer Your Data with Kafka and ksqlDB. Well, we did it, and the link here is to the uploaded YouTube video. Both Charl and I had a blast, and we believe the webinar was a success - peak attendance was at around 700 attendees
WIND (What Is Niels Doing)
Still lockdown! Apparently, some of the lockdown restrictions will be eased June 1, but I don’t see me being back in the office for at least a month, most likely two.
I am still working on the follow-up to the A Lap Around SQL Server 2019 Big Data Cluster: Background & Technology post. The upcoming post looks at the architecture. So when will it be published you may ask? I have learnt from my mistakes, so at this stage, I have no idea.
Our local Azure Meetup group has a virtual catchup on Tuesday, (June 1), where we discuss any news or anything interesting that came out in May 2020 in the industry. If you are interested, please register here.
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.