• AI Minds Newsletter
  • Posts
  • Altman on Banks’ Voice Authentication, Elon Musk discusses humans’ purpose with AI, and OpenAI’s “Effective” Voice Agents

Altman on Banks’ Voice Authentication, Elon Musk discusses humans’ purpose with AI, and OpenAI’s “Effective” Voice Agents

Sam Altman is shocked that banks still accept voice authentication, Elon Musk discusses AI's purpose for humans, and OpenAI's gpt-realtime is put to the test. Learn all this and more in this week's edition of AI Minds!

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎥 Building Effective Voice Agents — Toki Sherbakov + Anoop Kotha, OpenAI

  • ⚡ Testing OpenAI’s gpt-realtime using the VAQI metric

  • 🔐 Unlock the Power of Voice: Everything You Need to Know About AI Voice Agents

  • 🌐 Multilingual and Multi-Accent Jailbreaking of Audio LLMs

  • 📈 Optimizing Multilingual Text-To-Speech with Accents & Emotions

  • 🐝 Social Media Buzz: Elon Discusses Humans’ Purpose with AI

  • 💸 Sam Altman shocked that banks still allow voice authentication

  • 🐍 How to build a simple voice assistant in 70 lines of Python code (X Post)

  • ✈️ Upcoming event: VapiCon! Come see Deepgram there.

  • 🔍 How We Achieved 99.99% Reliability At Vapi

  • 🐱 Open Source Release: Pipecat - Gemini TTS, more metrics, performance updates

  • 🎶 ElevenLabs launches a new Video-to-Music flow in ElevenLabs Studio

  •  🔈️ Microsoft AI launches its first in-house models

Thanks for letting us crash your inbox; let’s party. 🎉

Want a single, unified conversational AI API for building real-time, enterprise-ready, and cost-effective voice AI agents? Check out this link

🎥  Building Effective Voice Agents — Toki Sherbakov + Anoop Kotha, OpenAI

OpenAI reveals how to build production voice applications from working with customers along the way! But is the tech as good as the demo video above suggests? Read on to find out!

⚡Testing OpenAI’s gpt-realtime and Everything You Need to Know about AI Voice Agents

VAQI, Revisited: How OpenAI’s gpt‑realtime Stacks Up — With Sensitivity Analysis for Real‑World Priorities - The Voice‑Agent Quality Index (VAQI)—a single score that captures how smooth it feels to speak to an AI voice agent. OpenAI released gpt‑realtime, a production‑ready, speech‑to‑speech model that promises lower latency, more natural prosody, and improved tooling for real‑time voice agents—exactly the kind of improvement VAQI is designed to measure. Let’s see how it does.

Unlock the Power of Voice: Everything You Need to Know About AI Voice Agents - "Explore the world of AI voice agents, from how they work to their transformative applications across industries. Learn about conversational AI, NLP, and the future of voice technology."

🔍 Multilingual Jailbreaking of Audio LLMs and Optimizing Multilingual TTS with Accents and Emotions

Multilingual and Multi-Accent Jailbreaking of Audio LLMs - “Large Audio Language Models (LALMs) have advanced audio understanding but introduce security risks, particularly through audio jailbreaks. While prior work has focused on English-centric attacks, this paper exposes a more severe vulnerability: adversarial multilingual and multi-accent audio jailbreaks. This paper introduces Multi-AudioJail, the first systematic framework to exploit these vulnerabilities.”

Optimizing Multilingual Text-To-Speech with Accents & Emotions - This paper introduces a new TTS architecture integrating accent along with preserving transliteration with multi-scale emotion modelling, in particularly tuned for Hindi and Indian English accent.

🐝 Social Media Buzz: Elon Discusses Humans’ Purpose with AI and more!

🤖 Bonus Bits and Bytes!