Interesting Stuff - Week 22, 2023

Posted by nielsb on Sunday, June 4, 2023

In this week’s blog post, I share some interesting links and insights on generative AI for analytics, Microsoft Fabric, and real-time streaming. You will see links about using natural language queries to access data in Amazon RDS using OpenAI and LangChain.

How to leverage Microsoft Fabric to simplify and enhance your data solutions with Power BI and other Azure products and how to use Kusto Query Language and Kqlmagic in Fabric notebooks.

How to compare and choose stream processing platforms for your use cases and detect and filter PII data from unstructured data streams using Confluent and machine learning.

Check out the full post for more details and examples.

OpenAI

  • Generative AI for Analytics: Performing Natural Language Queries on Amazon RDS using SageMaker, LangChain, and LLMs. The post linked to looks using generative AI for analytics. It demonstrates how to use LangChain’s SQL Database Chain and Agent with OpenAI’s text-davinci-003, a large language model (LLM), to perform natural language queries (NLQ) of an Amazon RDS for PostgreSQL database. It also explains the benefits of using LLMs to ask questions of data using natural language and how to use LangChain’s Prompt Template, Query Checker, few-shot promoting, and retrieval-augmented generation (RAG) to improve the results. The post includes code snippets, screenshots, and sample outputs. Very, very cool!

Microsoft Fabric

  • Build announcement: Microsoft Fabric. This post discusses the announcement of Microsoft Fabric at Microsoft Build 2023. It talks about how Microsoft Fabric is an enhancement to Power BI that adds SaaS versions of many Microsoft analytical products to the Power BI workspace, now called a Fabric workspace. These products include Azure Synapse Analytics, Azure Data Factory, Azure Data Explorer, and Power BI. Microsoft Fabric simplifies the creation and management of data solutions by eliminating the need for subscriptions, storage, or configuration. It also uses the Delta Lake format for all data storage, which is open-sourced and compatible with many products. Microsoft Fabric is designed to run the entire data estate, from departmental projects to enterprise solutions.
  • End-to-end tutorials in Microsoft Fabric. Speaking about Microsoft Fabric: in the post linked to, you find a comprehensive list of end-to-end tutorials available in Microsoft Fabric. These tutorials guide you through a scenario that covers the entire process, from data acquisition to data consumption. They’re designed to help you develop a foundational understanding of the Fabric UI, the various experiences supported by Fabric and their integration points, and the professional and citizen developer experiences that are available.
  • Kusto in Fabric, with Magic. This post looks at using Kusto Query Language (KQL) in Microsoft Fabric notebooks. the post explains how KQL can query and visualize data from various sources. It also shows how to use Kqlmagic, a command that extends the capabilities of the Python kernel in notebooks. The provides some examples of KQL queries and how to render the results. It concludes by recommending KQL as a powerful data analysis and exploration tool.

Streaming

  • Real-Time Streaming Ecosystem - Part 4. This is part four of Hubert ’s series about the real-time streaming ecosystem. He discussed various streaming platforms in the last part, and I covered it here. In that post, he compared streaming platforms with stream processing platforms and discussed the differences. In this post, he dives into the stream processing platforms and discusses the various options available. He also discusses the differences between the multiple options and when to use which. The post is very comprehensive and covers a lot of ground. It’s a must-read if you’re interested in stream processing.
  • Automatic Real-Time PII Detection. This post is about real-time PII detection via machine learning. The post explains how Confluent’s PII Detection accelerator can help InfoSec teams to respond to threats quickly by identifying and filtering sensitive information from unstructured data streams. The post describes how Confluent augments existing SIEM and SOAR solutions by providing a data fabric for receiving, logging, processing, and sharing data with cyber-defence tools. The post also shows how to use stream processing and machine learning to normalize, enrich, and analyze data in motion. The post provides an overview of the PII Detection stream processing app, which uses natural language processing and named entity recognition to detect PII entities from text data. The post also demonstrates deploying and running the app using Confluent Cloud and Confluent Platform. The post includes code snippets, screenshots, and sample outputs.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.


comments powered by Disqus