Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
NOTE: It is now coming up on Christmas and New Year, and I will take a break with these posts and come back in the beginning of next year.
SQL Server 2019 Big Data Cluster (BDC)
- SQL Server Big Data Clusters CU8 release surfaces Encryption at Rest capabilities and more. The latest cumulative update, (CU8), for SQL Server 2019 BDC includes several fixes, optimizations and adds two main capabilities for SQL Server BDC. This post looks at some of the major improvements, provides additional context to understand the design behind these capabilities better, and points you to relevant resources to learn more and get you started.
- The InfoQ eMag - Real World Chaos Engineering. This InfoQ post links to a download of an “eMag” around chaos engineering. The eMag pulls together a variety of case studies to show mechanisms by which you can implement chaos engineering.
- Grafana Announces Grafana Tempo, a Distributed Tracing System. The InfoQ article linked to here looks at Grafana Tempo, the distributed tracing backend recently released by Grafana Labs. Grafana Tempo integrates with any existing logging system to create links from trace IDs in log lines, and it only requires object storage like Amazon S3 or Google Cloud Storage (GCS) to operate.
- How Can Presto And Starburst Data Improve Your Data Analytics. At Derivco we have started looking at Presto. Presto is an open-source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. This post looks more in detail what Presto is and why companies are using it.
- Mix SQL and Machine Learning and leverage your computation cluster. The post linked to above discussed Presto. This post looks at another distributed SQL query engine - dask-sql. In the post, the author examines what dask-sql is and how it can be used in machine learning scenarios. Very cool!
- No Code Data Enrichment with Azure Synapse and Azure Machine Learning. This post will walk through how to train and evaluate Azure ML AutoML Regressions model on your data using Azure Synapse Analytics Spark and SQL pools. Quite interesting!
- Transactional Machine Learning at Scale with MAADS-VIPER and Apache Kafka. The post linked to here shows how transactional machine learning (TML) integrates data streams with automated machine learning (AutoML). Apache Kafka is used as the data backbone, and it allows the creation of a frictionless machine learning process.
- Event Streaming Applications with Zero Infrastructure. In this YouTube video, Tim Berglund, (from Confluent), demos how you can quickly spin up new event streaming applications with ksqlDB, Kafka, and connectors, all in a fully managed way on Confluent Cloud.
- Apache Kafka Lag Monitoring at AppsFlyer. One crucial aspect of every distributed system is visibility - how do you see what’s going on? In streaming applications, it is vital that we can see if consumers are lagging. The post linked to here looks at how one can implement a system for monitoring lag in Kafka.
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.