Interesting Stuff - Week 49, 2023

🚀 Welcome to this week’s exploration of AI’s cutting-edge! Join me as I delve into Simon Headley’s insights on scaling event-driven systems in Kubernetes with Dapr, and Youssef Hosni’s comprehensive overview of Artificial General Intelligence.

We’ll also uncover the potential of Retrieval Augmented Generation in enhancing LLM responses, guided by Eduardo Alvarez, and navigate through Leonie Monigatti’s strategies for optimizing RAG applications. Plus, don’t miss Andrej Karpathy’s unique take on the “hallucination problem” in LLMs, offering a fresh perspective on AI’s creative capabilities.

Distributed Computing

Building event-driven systems at scale in Kubernetes with Dapr — Part III: What does “at scale” really mean?. In this, the third part of his series on building event-driven systems in Kubernetes with Dapr, my colleague and mate Simon Headley delves into the practicalities of scaling such systems. He shares his experience as a Senior Software Engineer, focusing on the challenges and solutions in constructing a near real-time, AI/ML-driven event system. Headley highlights how Dapr helps avoid common pitfalls like creating a distributed monolith, simplifies service discovery, and enables parallel computing with virtual actors. He emphasizes Dapr’s role in building fault-tolerant, observable systems and illustrates the scale of their operations, handling 320 million events per day on Azure Kubernetes Service. This insightful piece sheds light on the technical aspects of Dapr and the real-world implications of scaling complex systems in a cloud environment.

Generative AI

Comperhisve Introduction to Artificial General Intelligence (AGI). In this comprehensive overview, Youssef Hosni explores the intriguing and complex Artificial General Intelligence (AGI) world. He distinguishes AGI from narrow AI, emphasizing AGI’s goal to achieve human-level intelligence across various domains. The article delves into the current state of AGI research, highlighting the significant technical challenges that remain. Additionally, Hosni addresses the debates surrounding the potential implications of developing highly intelligent systems. This piece is an insightful introduction to AGI, offering a clear understanding of its objectives, progress, and the broader conversations it sparks in the AI community.
Retrieval Augmented Generation (RAG) Inference Engines with LangChain on CPUs. Eduardo Alvarez’s article delves into the world of Retrieval Augmented Generation (RAG) and its application in AI, particularly focusing on its use in large language models (LLMs) on CPU platforms. He begins by refreshing our understanding of inference in AI, emphasizing its critical role in user experience. Alvarez then introduces RAG, a technique that enhances LLM responses by retrieving additional information from relevant data sources. The article highlights the architectural benefits of RAG, such as scalability and improved response fidelity, and discusses the efficiency of optimized models in enhancing performance. Alvarez also explores the role of CPUs in supporting RAG, noting their ubiquity and cost-efficiency. The piece concludes with a hands-on example of implementing RAG with LangChain on the Intel Developer Cloud, offering readers a practical insight into the operational aspects of RAG systems. This comprehensive article not only explains the technicalities of RAG but also demonstrates its practical applications in improving AI inference engines.
A Guide on 12 Tuning Strategies for Production-Ready RAG Applications. Leonie Monigatti’s article is a detailed guide on optimizing Retrieval-Augmented Generation (RAG) applications for production environments. She emphasizes the importance of tuning various “hyperparameters” and strategies across different stages of an RAG pipeline. In the ingestion stage, Monigatti suggests improvements through data cleaning, chunking, embedding models, metadata, multi-indexing, and indexing algorithms. For the inferencing stage, she focuses on query transformations, retrieval parameters, advanced retrieval strategies, re-ranking models, choice of Large Language Models (LLMs), and prompt engineering. This comprehensive guide is a valuable resource for data scientists and developers looking to enhance the performance of their RAG applications, offering practical insights into each aspect of the pipeline. Monigatti’s article is a must-read for anyone working with RAG systems, providing a thorough understanding of the intricacies involved in fine-tuning these complex AI applications.
LLMLingua: Innovating LLM efficiency with prompt compression. In this article, Microsoft Research’s Lingua team explores the role of prompt compression in improving the efficiency of large language models (LLMs). They begin by highlighting the importance of LLMs in AI applications and the challenges they pose regarding efficiency. The article then introduces prompt compression, a technique that reduces the size of LLM prompts without compromising their performance. The Lingua team explains the technicalities of prompt compression, emphasizing its role in improving LLM efficiency. They also discuss the challenges of implementing prompt compression and the potential solutions to overcome them. This article is a valuable resource for data scientists and developers looking to improve the efficiency of their LLMs, offering a clear understanding of the role of prompt compression in enhancing LLM performance.
Andrej Karpathy LLM Paper Reading List for LLM Mastery. In this compelling article, Youssef Hosni highlights a curated reading list by Andrej Karpathy, a renowned AI researcher of Tesla and OpenAI fame, focusing on Large Language Models (LLMs). This list serves as a valuable resource for those delving into the complexities of LLMs. The article lists the recommended papers and provides insightful commentary and context, enhancing the reader’s understanding of the pivotal concepts and methodologies in LLM research. This journey through Karpathy’s reading list offers novice and seasoned AI enthusiasts a comprehensive roadmap to understanding and mastering LLMs. It’s a compelling exploration of the frontiers of AI research, shedding light on the evolution of LLMs and their significant role in the broader field of artificial intelligence.
On the “hallucination problem”. In this “tweet by Andrej Karpathy, he offers a nuanced perspective on the “hallucination problem” in Large Language Models (LLMs). He suggests that hallucination is inherent to LLMs, describing them as “dream machines” that generate content based on a vague recollection of their training data. The issue of hallucination arises only when these “dreams” venture into factually incorrect realms. Karpathy contrasts this with search engines, which lack creativity and only regurgitate existing data, highlighting that LLMs are entirely creative but need help maintaining factual accuracy. He acknowledges that while LLMs themselves don’t have a hallucination problem, LLM Assistants like ChatGPT do, and suggests various research methods like Retrieval Augmented Generation (RAG) to mitigate hallucinations by anchoring responses in real data. He concludes that hallucination is not a bug but a feature of LLMs, though it poses a problem for LLM Assistants that needs addressing.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. Please comment on this post or ping me if you have ideas for what to cover.

Distributed Computing

Generative AI

~ Finally

CATALOG