• AI Minds Newsletter
  • Posts
  • Jensen Huang and NVIDIA’s New Dataset, Apple’s new Siri, The Ultimate Guide to AI Voice Cloning

Jensen Huang and NVIDIA’s New Dataset, Apple’s new Siri, The Ultimate Guide to AI Voice Cloning

NVIDIA's new Granary dataset features around 1 million hours of audio. Apple's new Siri will be able to take actions on your behalf across various apps by following voice commands. And you can learn how to go easily from a beginner to a pro voice cloner with a simple, eleven-minute tutorial! All these features and more in today's edition of AI Minds.

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎥 Ultimate AI Voice Cloning Guide - BEGINNER To Pro 2025

  • 🔈 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction

  • 🧠 Analyzing Conversational Context Recall and Utilization in Voice Interaction Models

  • 💻 Deepgram Saga: The Voice OS for Developers

  • 🎯 What Developers Need to Know About Speech Recognition Metrics

  • ⚡Deepgram Partners with AWS to Accelerate Voice AI Deployment

  • 🐝 Social Media Buzz: Zero-Shot Voice Cloning, the smallest voice agent, and more!

  • 💻 New Webinar: Building AI Voice Agents with Deepgram + AWS Bedrock

  • 🎙️ The AI Minds Podcast with Thibault Mardinli, Explorer at Voice AI Space

  • 📈 Gupshup raises $60M+ to expand its conversational AI and messaging platform

  • 💰 Best AI Meeting Notes Assistants for Fintech Teams

  • 🔊 In Case You Missed It: GPT-5 and the Future of Voice AI 

  • 🤖 Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B

  • 🍎 Apple’s new Siri may allow users to operate apps just using voice

  • 💬 NVIDIA releases open dataset, models for multilingual speech AI

Thanks for letting us crash your inbox; let’s party. 🎉

Want a single, unified conversational AI API for building real-time, enterprise-ready, and cost-effective voice AI agents? Check out this link

🎥 Ultimate AI Voice Cloning Guide - BEGINNER To Pro 2025

Learn how to go easily from a beginner to a pro in this eleven-minute tutorial! The three tools featured are labeled as “Beginner AI Voice Cloning,” “Mid-Level AI Voice Cloning,” and “Professional AI Voice Cloning.” Which one do you think is best?

🔍 Voice-Language Foundation Models for Voice Role-Play and the Surprising Memory of Voice Assistants

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play - “The authors of this paper presentVoila, a family of large voice-language foundation models that moves beyond traditional pipeline systems by adopting a new end-to-end architecture that enables full-duplex, low-latency conversations while preserving rich vocal nuances such as tone, rhythm, and emotion.”

Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models - “Recent advancements in multi-turn voice interaction models have improved user-model communication. However, while closed-source models effectively retain and recall past utterances, whether open-source models share this ability remains unexplored.” This paper fills this gap.

🧠 Deepgram Saga: The Voice OS for Developers

Deepgram Saga: Meeting You Where and How You Work - Execute complex workflows across your entire tech stack using voice or text—whatever feels natural in the moment. With one-click integrations to Gmail, Google Calendar, Slack, Notion, Linear, Perplexity, and more, Saga connects seamlessly to the tools you already use.

Deepgram Saga isn't just another voice assistant—it's a voice OS built specifically for developers. What you can do:

  • 📅 Manage calendars and schedule meetings without leaving your IDE

  • 📥Send emails, create tickets, and update team channels

  • 🔍 Research solutions with Perplexity and save findings to Notion

  • ⏩ The result? Faster execution, fewer context switches, and development that truly happens at the speed of thought.

Ready to make your voice your most powerful development tool?

⚡ What Developers Need to Know About Speech Recognition Metrics and Deepgram Partners with AWS to Accelerate Voice AI Deployment

What Developers Need to Know About WER, KER, and KRR - This article will introduce three of the most used and prominent speech recognition metrics: Word Error Rate (WER), Keyword Error Rate (KER), and Keyword Recognition Rate (KRR).

Deepgram Signs Strategic Collaboration Agreement with AWS to Accelerate Global Deployment of Voice AI - As a Generative AI Competency Partner and long-standing AWS Partner Network (APN) member, Deepgram offers a full-featured voice AI platform that includes speech-to-text (STT), text-to-speech (TTS), and speech-to-speech (STS) capabilities.

🐝 Social Media Buzz: Zero-Shot Voice Cloning and More!

Thibault Mardinli, Explorer at Voice AI Space. Voice AI Space is a beacon to master voice tech's wild seas. They guide developers, entrepreneurs, and enthusiasts to top tools, news, knowledge, and careers, empowering everyone in the vast voice AI ocean.

💻 New Webinar: Building AI Voice Agents with Deepgram + AWS Bedrock

Sign up here: ​Deepgram’s Voice Agent API brings lightning-fast speech-to-text and lifelike text-to-speech together with event hooks and speaker diarization, all in real time. Amazon Bedrock gives you instant access to leading foundation models like Claude and Titan, with built-in safety, compliance, and flexibility, perfect for powering voice agents with real intelligence.

​Join us to learn how to build scalable, responsive AI voice agents that actually work in production.

When: Tuesday, September 9, 10:30am - 11:30am PDT

🧭 Where: Online

✅ Sign up here!

🤖 Bonus Bits and Bytes!