Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
This week I do not have that much to share, partly because I have been occupied with writing a blog post about deploying SQL Server 2019 Big Data Cluster using Azure Data Studio (see below). It has also been SQL Saturday which I have been “prepping” for.
Data Science / Machine Learning
- Experiences with approximating queries in Microsoft’s production big-data clusters. This is a dissection, by Adrian, of a white paper about approximating queries at Microsoft. Approximation of queries is used when you want to run analysis / OLAP queries against massive datasets where a query could potentially run for hours. By using approximation the time to run the query is reduced significantly.
- Reflections on Event Streaming as Confluent Turns Five – Part 1. This is a blog post by Tim Berglund, (awesome name by the way), where he looks back at how Apache Kafka and the Confluent Platform has changed the way we build event-driven systems. Happy 5th Birthday to Confluent!
SQL Server 2019 Big Data Cluster
- Install SQL Server 2019 Big Data Cluster using Azure Data Studio. I had to tear down the SQL Server 2019 Big Data Cluster Andrew and me used for our workshop A Day of SQL Server 2019 Big Data Cluster in Johannesburg and rebuild it for the Cape Town leg of SQL Saturday. While I did the rebuild, I thought it would be a good idea to document what I did, and this blog post is the result.
The South African leg of SQL Saturday is done and dusted. We were in Cape Town yesterday, (Saturday, September 14), and I delivered two conference talks, (in addition to Andrew’s and mine workshop mentioned above):
- A Lap Around SQL Server 2019 Big Data Cluster: The new release of SQL Server; SQL Server 2019 includes Apache Spark and Hadoop Distributed File System (HDFS) for scalable compute and storage. This new architecture that combines together the SQL Server database engine, Spark, and HDFS into a unified data platform is called a “big data cluster.” This session gives you an overview of what a SQL Server Big Data Cluster is, and what you can do with it.
- What is the PiRate, Snake, and Cup of Coffee Doing in My Database?: In this session we looked at the SQL Server Extensibility Framework, and we saw how we can call out to external languages from inside SQL Server. We looked at R, Python and Java, and what we can do from SQL Server having access to those languages.
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.