AI Minds Newsletter
Posts
Altman on Banks’ Voice Authentication, Elon Musk discusses humans’ purpose with AI, and OpenAI’s “Effective” Voice Agents

Altman on Banks’ Voice Authentication, Elon Musk discusses humans’ purpose with AI, and OpenAI’s “Effective” Voice Agents

Sam Altman is shocked that banks still accept voice authentication, Elon Musk discusses AI's purpose for humans, and OpenAI's gpt-realtime is put to the test. Learn all this and more in this week's edition of AI Minds!

Jose Nicholas Francisco
September 04, 2025

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

🎥 Building Effective Voice Agents — Toki Sherbakov + Anoop Kotha, OpenAI
⚡ Testing OpenAI’s gpt-realtime using the VAQI metric
🔐 Unlock the Power of Voice: Everything You Need to Know About AI Voice Agents
🌐 Multilingual and Multi-Accent Jailbreaking of Audio LLMs
📈 Optimizing Multilingual Text-To-Speech with Accents & Emotions
🐝 Social Media Buzz: Elon Discusses Humans’ Purpose with AI
💸 Sam Altman shocked that banks still allow voice authentication
🐍 How to build a simple voice assistant in 70 lines of Python code (X Post)
✈️ Upcoming event: VapiCon! Come see Deepgram there.
🔍 How We Achieved 99.99% Reliability At Vapi
🐱 Open Source Release: Pipecat - Gemini TTS, more metrics, performance updates
🎶 ElevenLabs launches a new Video-to-Music flow in ElevenLabs Studio
🔈️ Microsoft AI launches its first in-house models

Thanks for letting us crash your inbox; let’s party. 🎉

Want a single, unified conversational AI API for building real-time, enterprise-ready, and cost-effective voice AI agents? Check out this link!

🎥 Building Effective Voice Agents — Toki Sherbakov + Anoop Kotha, OpenAI

OpenAI reveals how to build production voice applications from working with customers along the way! But is the tech as good as the demo video above suggests? Read on to find out!

⚡Testing OpenAI’s gpt-realtime and Everything You Need to Know about AI Voice Agents

VAQI, Revisited: How OpenAI’s gpt‑realtime Stacks Up — With Sensitivity Analysis for Real‑World Priorities - The Voice‑Agent Quality Index (VAQI)—a single score that captures how smooth it feels to speak to an AI voice agent. OpenAI released gpt‑realtime, a production‑ready, speech‑to‑speech model that promises lower latency, more natural prosody, and improved tooling for real‑time voice agents—exactly the kind of improvement VAQI is designed to measure. Let’s see how it does.

Unlock the Power of Voice: Everything You Need to Know About AI Voice Agents - "Explore the world of AI voice agents, from how they work to their transformative applications across industries. Learn about conversational AI, NLP, and the future of voice technology."

🔍 Multilingual Jailbreaking of Audio LLMs and Optimizing Multilingual TTS with Accents and Emotions

Multilingual and Multi-Accent Jailbreaking of Audio LLMs - “Large Audio Language Models (LALMs) have advanced audio understanding but introduce security risks, particularly through audio jailbreaks. While prior work has focused on English-centric attacks, this paper exposes a more severe vulnerability: adversarial multilingual and multi-accent audio jailbreaks. This paper introduces Multi-AudioJail, the first systematic framework to exploit these vulnerabilities.”

Optimizing Multilingual Text-To-Speech with Accents & Emotions - This paper introduces a new TTS architecture integrating accent along with preserving transliteration with multi-scale emotion modelling, in particularly tuned for Hindi and Indian English accent.

Elon Musk: Maybe humans will be a source of will or purpose for AI. In a benign scenario, AI will maybe simply try to make the human limbic system happy, just like the cortex does.
— ELON CLIPS (@ElonClipsX)
10:35 AM • Aug 4, 2024

Sam Altman says it's insane that banks still accept voice prints for authentication.
AI has already broken most of the systems we use to prove who we are, and we're sleepwalking into a synthetic fraud crisis.
“Right now it's a voice call. Soon it's gonna be video. It'll be
— vitrupo (@vitrupo)
5:30 AM • Jul 23, 2025

I built a simple voice assistant in 70 lines of Python code.
It uses:
• @livekit - The voice agent
• @AssemblyAI - To turn your voice into text
• @OpenAI - The brain of the agent, and to turn text into audio
There's something really cool about this:
— Santiago (@svpino)
11:10 AM • Jul 30, 2025

🤖 Bonus Bits and Bytes!

📈 Upcoming event: VapiCon - Learn how leading experts are integrating voice with AI (and yes, Deepgram will be there!)
🔍 How We Achieved 99.99% Reliability At Vapi - Speaking of Vapi, check out this blog of theirs about their incredible uptime.
🐱 Open Source Release: Pipecat - Gemini TTS, more metrics, performance updates for avatars, dependency upgrades and bugfixes
🎶 ElevenLabs launches a new Video-to-Music flow in ElevenLabs Studio - Eleven Music model generates a custom soundtrack based on your video’s context. After adding music, you can layer in voiceovers and SFX directly in Studio
🔈️ Microsoft AI launches its first in-house models - “Microsoft’s complicated partnership with OpenAI is adding a new twist as it releases AI models that will compete with GPT-5, DeepSeek, and all the rest.”