Interesting Stuff - Week 24, 2023

Posted by nielsb on Sunday, June 18, 2023

This week, I came across debugging Jupyter notebooks in VSCode. I learned about some amazing applications of generative AI for analytics, content creation, and natural language queries. I also explored how to train a machine-learning model on a Kafka stream using Python libraries.

The CfS for Data Saturday Durban 2023 is still open, and we have already received quite a few submissions. If you are interested in attending, I suggest you book your seat NOW, as there are a limited number of seats for this FREE event, and the event promises to be awesome!

If you are interested in these topics, read further, and you see the links to the articles and resources that inspired me.


  • Debug your code and notebooks by using Visual Studio Code. Earlier this year, Databricks released a VS Code extension for working with Databricks. This post announces that they have added new features to the extension: the ability to debug your code and notebooks and local Jupyter notebook development. The post also provides a step-by-step tutorial on how to install and use the extension. Very cool!


  • OpenAI Just Introduced Function Callings Feature: Everything You Need to Know. This blog post explains how OpenAI has added a new feature to its API that allows users to call functions within the text input. This feature enables users to perform tasks such as data analysis, text summarization, translation, and more using natural language commands. The post also provides examples of using function callings with different models and parameters.
  • Generative AI for Analytics: Performing: Natural Language Queries on Amazon RDS using SageMaker, LangChain, and LLMs. The blog post linked demonstrates how to use LangChain’s SQL Database Chain and SQL Database Agent with OpenAI’s text-davinci-003 model to ask natural language questions of an Amazon RDS for a PostgreSQL database. The post also shows how to use LangChain’s Prompt Template, Query Checker, few-shot prompting, and retrieval-augmented generation (RAG) to improve the accuracy and quality of the generated SQL queries and textual explanations. This is so awesome!


  • Training a Machine Learning Model on a Kafka Stream. The blog post Training a Machine Learning Model on a Kafka Stream shows how to use Kafka and River Python libraries to train a machine learning model on streaming data. The post explains how to set up a Kafka producer and consumer, generate synthetic data using River, and train and evaluate a logistic regression model using River’s online learning API. The post also shows how to use the model to make predictions on streaming data. This is an excellent example of using Kafka and River to train a machine-learning model on streaming data.

WIND (What Is Niels Doing)

In the roundup from last week, I wrote about how Data Saturday Durban is happening on August 19 and how we were looking for Speakers. The Call for Speakers is still open, and we have already received quite a few submissions. If you have some cool data-related stuff you would like to present, please submit your session here. Having looked at the submissions earlier today, we have some really exciting sessions in the pipeline.

Figure 1: Data Staurday Durban 2023

Without officially announcing the event (apart from the CfS), we already have 50+ registrations! As there are a limited number of seats for this FREE event, I suggest you register ASAP to take advantage of what looks to be THE Data Event of the Year in Durban. I will be posting more information about the event in the coming weeks.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.

comments powered by Disqus