Real-Time AI, Karpathy on LLMs & the New RLHF

What is Human-Guided Deep RL? Is Hug-Deep-RL the new RLHF? What's the latest real-time AI solution? Learn here!

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🔊 The evolution of real-time voice AI solutions

  • 🎙️ What’s next in AI voice synthesis

  • 🚀 Conversational AI and Real-Time solutions

  • 🚗 Self-driving cars, and the new RLHF: Human-guided deep learning

  • 🛑 Potential threats of live, interactive AI agents

  • 📘 Andrej Karpathy and what you need to know about LLMs

  • 🍏 Open-source AI packages for closed-captioning

  • 🤖 More bonus content!

Thanks for letting us crash your inbox; let’s party. 🎉

We coded with the brand-new Whisper-v3 over the past week, and the results were not what we expected. Check it out here!

🐎 Why the world craves real-time AI solutions

Journeying from Wav2Vec to Real-time AI: The evolution of ASR - Currently, AI is good… but it can be even better. In a relatively short timespan, we’ve gone from simple, brute-force algorithms to dynamic, real-time solutions that make us crave even faster, reactive AI. How much further do you think we can advance in the next 12 months?

The evolution of Speech Synthesis - Nowhere is real-time AI more desired than in the domain of voice technology. Text-to-Speech and Speech-to-Text AI is present in your GPS, your virtual assistants, and even your entertainment. As our technology evolves, so do our use cases. But have we slowed down along the way?

Developments in Real-Time AI make Conversational AI seem more Conversational - Most existing conservational AI solutions' lag time is more than two seconds, but the conversational AI feature featured here reduces the response lag to only <300 milliseconds! That is, the AI seems to respond at a pace more similar to humans’. And that was two years ago!

🧑‍🔬 What researchers are keeping an eye on…

Toward Human-in-the-Loop AI: Enhancing Deep Reinforcement Learning via Real-Time Human Guidance for Autonomous Driving - Real-time solutions with Human Guidance (Hug) might just be the new RLHF! This paper simply covers the autonomous driving case, but the implications of this real-time, Hug-deep deep reinforcement learning method are just as  mind-blowing.

The Manipulation Problem: Conversational AI as a Threat to Epistemic Agency - In this paper, Rosenberg warns us about “the emerging risk is that consumers will unwittingly engage in real-time dialog with predatory AI agents that can skillfully persuade them to buy particular products, believe particular pieces of misinformation, or fool them into revealing sensitive personal data.” Do you concur?

The Challenges of Real-Time AI - A blast from the past! This technical report from the University of Maryland was released in 1994, and it details the struggles of “real-time intelligent control” or “real-time AI.” Have we fully overcome these challenges? You be the judge!

🎥 Andrej Karpathy teaches what you need to know about LLMs

🐝 Social media buzz

While conducting research on real-time AI achieves incredible results, there’s no feeling like seeing such technology in action! See the Tweets below about:

  1. PitchCompanion - An AI assistant for live business-pitches

  2. React Native AI - A full-stack framework for building cross-platform mobile AI apps supporting LLM real-time / streaming text and chat UIs, image processing, and more!

🤖 Additional bits and bytes

Subtitles Made Easy: Deepgram's New Open Source Captioning Packages - Deepgram—a company known for it’s fast, inexpensive, and accurate Speech-to-Text technology—recently released a set of open-source packages for live captions. Check it out here!

Voice tech is growing, how can that happen Equitably? - As mentioned earlier, voice technology is a key domain in which the need for real-time AI is non-negotiable. But we must not focus solely on technology. We need to look at the actual people who are affected by that technology as well. There are quite a few pitfalls to avoid.

Whisper-v3 Hallucinations on Real World Data - We mentioned this above, but it’s worth mentioning again: After OpenAI Dev Day, many jumped to their computers to try out some of the newly announced models. Whisper-v3, however, produced some surprising results, especially when compared to its predecessors.

  • How to Build Live-Subtitle Clothing!  - Check out this video by Zack Freedman to see how you can build a hoodie that transcribes your voice and prints the subtitles on your chest in real time!

🖥️ An upcoming webinar! The magic of Multimodal data

Come join Vonage, Deepgram, Voicify, and Flowcode as they present “The Art of the Possible: Creating a Modern Multimodal Customer Journey,” a webinar which showcases ways to create a cohesive and personalized narrative for your customers using creative, multimodal means to drive engagement.

The webinar takes place on December 12, 2023 at 12pm ET. It will be one hour long, and access will be granted upon registration at the links above! 🚀