Throughout the week, I read a lot of blog-posts, articles, and so forth that has to do with things that interest me:
- AI/data science
- data in general
- data architecture
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me for the week just ending.
- Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data. I think we all agree that Big Data is good. However, there is no doubt that Big Data incurs costs, especially in large organisations. This post from Uber looks at the top challenges they had when assessing their Big Data Platform’s costs and the overall strategy they devised to address them. Very interesting!
- Cost-Efficient Open Source Big Data Platform at Uber. Another post by Uber. In the previous post, Uber discussed their initiative to reduce costs on their data platform. They looked at three broad pillars: platform efficiency, supply, and demand. In this post, they discuss the efforts to improve the efficiency of the data platform and bring down costs.
- Lambda Learner: Nearline learning on data streams. In this post, LinkedIn discusses an in-house system called Lambda Learner. Lambda Learner is a library for iterative, incremental training of a class of supervised machine learning models. The discussion is about how the Lambda Learner system allows for near real-time re-training of machine learning models. This is a very interesting post!
- Run Confluent Cloud & Serverless Apache Kafka on Azure. This is a post by yours truly. As you may know, I have some conferences coming up, and Azure features in quite a few of the talks, together with Apache Kafka. I thought it would be cool if I could run Apache Kafka on Azure and bonus points if I could run it as SaaS, i.e. Confluent Cloud. So in this post, I look at what it takes to deploy Confluent Cloud on Azure.
WIND (What Is Niels Doing)
Apart from publishing the blog post mentioned above, I am prepping for the upcoming 2021 Data Platform Summit. Speaking about the Data Platform Summit, the organizers have managed to increase the capacity of the virtual platform to 10,000! So, they have opened up FREE booking for LIVE attendance for a limited time. They have an internal quota, and once that is full, the free booking will close. Hurry up to https://dataplatformgeeks.com/dps2021/complimentary-registration to register for FREE!
Related to conferences; during the last couple of weeks, I did two webinars, of which one is up on YouTube (I expect the other one to be up soon as well):
- Stream Processing with Apache Kafka and .NET. A presentation about Apache Kafka for the .NET developer and some stuff about stream-processing and ksqlDB. Due to a power failure, there is a break and some distortion in this video; sorry about that.
Obviously I’ll let you know when the second webinar is up.
That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.