AI Minds #8: RAG & Information Retrieval - Giving AI more knowledge

RAG finally lets LLMs use knowledge beyond 2022, but does it live up to the hype?

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In each edition, we’ll highlight a special topic from the rapidly-evolving field of Language AI. We’ll also touch on timely issues the AI community is buzzing about. As a heads-up, you might want to brace yourself for the occasional spicy take or two along the way. 🌶️

Thanks for letting us crash your inbox; let’s party. 🎉

Introducing superhuman speech-to-text with Nova-2. Get a closer look at the fastest, most accurate speech model in this on-demand webinar.

🌶️ Is RAG the next big thing in LLMs?

We’ll cut to the chase: RAG is amazing. 

Is it a panacea? Not at all. However, retrieval-augmented generation is a framework that allows you to give the world’s latest AI models access to up-to-date information… as opposed to limiting them to “knowledge [that] only goes up until January 2022.”

Nevertheless, here’s what we have to say on the topic:

📚 On the same page: What is RAG?

Before we delve into the cool parts of retrieval-augmented generation, it’s important that we establish a common language. So here’s a quick vocabulary lesson before we move onto the flashier aspects of RAG:

  • Retrieval: The process by which an AI finds relevant information from an external source, such as a database.

  • Generation: Natural Language Generation (NLG) is a subfield of artificial intelligence focused on enabling machines to produce human-like text based on provided data or prompts.

  • Retrieval-augmented generation (RAG): RAG is a framework in which LLMs extract information from external knowledge databases.

  • Information Retrieval (IR): IR refers to code that deals with the organization, retrieval, and evaluation of information from documents or libraries of documents. RAG is a type of information retrieval.

🐦 Tweet alert: Building a RAG-enabled chatbot

🐶 Witnessing Information-Retrieval in action with Virtual Assistants

If you have an Alexa, a Siri, a Cortana, or any other sci-fi named AI assistant, you already have witnessed Information Retrieval in action. 

And to fully appreciate the complexity of RAG, we must first take a peek at just how hard it is to get an AI to look something up.

In this article, our very own Jose Francisco walks us through a study he conducted at Stanford. He compares which LLMs are best at (1) determining whether or not a user’s question requires the machine to look up information, and (2) actually finding that relevant information, when appropriate. 

This study was conducted in 2021, so the models are a bit old, but the methodology and the science remain very much pertinent to today. To replicate the study for the post AI-boom world, simply swap out GPT-2 for GPT-4, and the experiment still stands.

Want to learn what MRR@5 means? Click the image to find out!

🎻 The orchestration layer

Anyone who has built a RAG system from scratch knows one thing. 

The orchestration layer is one of the most pivotal parts of information retrieval. Langchain and Llamaindex have been in the limelight for the past six months. These tools enable builders from all backgrounds to get up and running with AI quickly. 

But we have found a new love and its name is DSPy.

Think of it as your open-source Swiss Army knife for language models. From prompting and fine-tuning LMs to reasoning and self-improvement, DSPy has got you covered. When it comes to building RAG frameworks there are a few reasons we enjoy working with it.

The right abstraction: You can instruct language models using systematic, modular, and trainable pieces. It's like having a language model instruction manual at your fingertips.

Compilers: You don't need to do some crazy tech-magic to get the best prompt for your RAG implementation. DSPy will go out there and figure it out for you.

DSPy, Langchain, Llamaindex, and Haystack are all awesome tools that are helping move the puck forward in the RAG ecosystem. However, if you’ve heard about any additional orchestration tools we haven’t yet seen, definitely let us know!

📊 Rise of the Vector Database

Let's take a trip into the world of vector databases. No RAG is complete without one. 

(Not to mention, the Tweet showcased above talks about vector databases quite frequently.)

So, what exactly is a vector database? 

Picture this: a vast, digital library, where every book—every piece of information, really—has an address. Now, imagine if, instead of a boring, old alphanumeric system, these addresses were unique combinations of direction and magnitude—like vectors. That's the magic of a vector database.

🔑 Key Terms

To kick things off, here are a few key terms that you'll want to familiarize yourself with:

  • Vector: In the context of a vector database, a vector is a mathematical construct that represents multidimensional data. It's like a super-powered coordinate, capable of pointing to a specific location in a vast sea of information. (However, vectors are generally important in language AI as a whole.)

  • Database: This is the digital library we've been talking about. It's where all your vectors—your data—live. It's organized, searchable, and designed to help you find exactly what you're looking for.

Indexing: This is the process of organizing and mapping your database, creating those vector 'addresses' so you can find your data quickly and efficiently.

❓Why should you care about vector databases? 

Vector databases have become the unsung heroes of RAGs. They provide an efficient, scalable, and flexible way to retrieve semantically relevant information, which is key for the performance of retrieval-augmented generation models.

Not only that, vector databases are the backbone of many modern applications, from search engines to recommendation systems. 

These days, it feels like there is a new open-source Vector DB that comes out every week. And all the traditional databases are scrambling to add vector functionalities to their offerings.

Oh yeah…. and over 150 million dollars in VC cash has been invested in Vector DBs since the start of 2023.

“Although RAG sounds impressive, a system is only as good as its data, and RAG is no exception. . . . So, the duty to keep databases comprehensive and assimilate real-time web content becomes all the more critical.”

Stephen Oladele, ML Developer Advocate

“The RAG framework provides the model an option to pull accurate information from external sources, meaning that it is less likely to produce incorrect responses that it would have without the external information. RAG also reduces the necessity to update the language model for up to date information. Considering the training cost of GPT-4 reaches up to $100 million, RAG can save some serious financial troubles.”

🤖 Other bits and bytes

  • 🚢 Delving deeper into vectors - Vectors are a crucial part of AI. Whether we’re talking about vector databases or word vectorization, this one key concept of linear algebra is inescapable. To learn more about word vectorization, check out this video, this glossary entry, or this article.

  • 😯 AI and plagiarism - With the advent of RAG comes the superpower of using AI to generate works that would otherwise be accomplished by humans. After all, if you’re a student writing an essay, why do all the research manually when you can get a machine to look up and write everything for you? Our very own Victoria Hseih explores the ethical intersection of technological innovation and educational integrity in this article.

  • 🧠 Knowledge, AI, and education - And while we’re on the topic of education, check out this article from Tife Sanusi. Now that we’re in a world where AI can look up information, we’re one step closer to having machines as educational assistants. Here, Sanusi takes a glimpse into the future—where AI and humans hold hands in the classroom.