AI Minds Newsletter
Posts
Lex Fridman & Zuckerberg speak Hindi with AI, Amazon buys wearable AI device “Bee,” and Meta closes deal to buy AI voice replicator PlayAI

Lex Fridman & Zuckerberg speak Hindi with AI, Amazon buys wearable AI device “Bee,” and Meta closes deal to buy AI voice replicator PlayAI

Hear Fridman and Zuckerberg speak Hindi, see how Amazon plans to progress the wearable AI space, and learn how Meta is going to use AI voice replicators in the future.

Jose Nicholas Francisco
August 07, 2025

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of voice AI, brought to you by the Deepgram editorial team.

In this edition:

🎥 Amazon buying wearing AI device maker ‘Bee’
🔉 MultiVox: Benchmarking Voice Assistants for Multimodal Interactions
🎤 Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
⚡ Deepgram Expands Internationally and Wins Top Voice AI award
📶 Vonage Partners with Deepgram for Real-Time Translation for Contract Centers
🐦 Social Media Buzz: Lex Fridman and Zuckerberg speak Hindi
‼️Voxtral technical report released
👂 Speechmatics shipped realtime speaker diarization for voice agents
🎙️ AI Minds #073 | Brooke Hopkins, Founder at Coval
🧠 OpenAI CEO Sam Altman is right and very wrong about AI-faked voices
🤖 Microsoft is doubling down on multilingual LLMs (& Europe stands to benefit most)
🚒 AI voice company Hyper raises $6.3M to help automate 911 calls
📚Meta closes deal to buy AI voice replicator PlayAI

Thanks for letting us crash your inbox; let’s party. 🎉

Want a single, unified conversational AI API for building real-time, enterprise-ready, and cost-effective voice AI agents? Check out this link!

🎥 Amazon buying wearing AI device maker ‘Bee’

KTLA's David Lazarus reports. July 23, 2025: Amazon is now acquiring a tech startup called “Bee,” which makes a bracelet—a wearable device that listens to and transcribes every conversation you have. Check out this news clip to learn more!

🔍 Benchmarking Voice Assistants for Multimodal Interactions and Controlling Zero-Shot Voice Imitation with Self-Supervised Disentanglement

MultiVox: Benchmarking Voice Assistants for Multimodal Interactions - This paper introduces MultiVox, the first omni voice assistant benchmark designed to evaluate the ability of voice assistants to integrate spoken and visual cues including paralinguistic speech features for truly multimodal understanding.

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement - To effectively disentangle timbre and style in voice-imitation AI, the authors of this paper propose Vevo, a versatile zero-shot voice imitation framework with controllable timbre and style.

⚡ Deepgram Expands Internationally and Partners with Vonage for Real-Time Translation for Contract Centers

Deepgram Expands Internationally, Launches Managed Single-Tenant Deployment Option, and Wins Top Voice AI Award - Deepgram is expanding globally with two major infrastructure updates: Deepgram Dedicated, a fully managed single-tenant deployment, and an EU-hosted API for in-region inference. Learn more in this blog.

Vonage and Deepgram: Enabling Real-Time Translation for Contact Centers - Vonage and Deepgram have joined forces to bring mulitlingual voice agents to life. Together, we’ve developed a proof-of-concept that combines telephony, voice AI, and real-time translation into a single, seamless workflow.

Synclabs develops AI generated video lip-syncing for voice cloning, dubbing, and dialogue modification.
Within ~12mo, this technology will be indiscernible from the real thing.
Here’s Lex Fridman speaking to Mark Zuckerberg in lip-synced Hindi:
— AI Breakfast (@AiBreakfast)
3:12 PM • Feb 17, 2024

In our continued commitment to open-science, we are releasing the Voxtral Technical Report: arxiv.org/abs/2507.13264
The report covers details on pre-training, post-training, alignment and evaluations. We also present analysis on selecting the optimal model architecture, which
— Mistral AI (@MistralAI)
9:50 PM • Jul 22, 2025

The team at @Speechmatics just shipped a really clean integration of realtime speaker diarization for voice agents. I've tinkered quite a bit with multi-speaker voice agent pipelines, and this is the best implementation I've seen so far.
Voice AI in 2025 is at a really
— kwindla (@kwindla)
5:00 PM • Jul 23, 2025

🎙️ The AI Minds Podcast!

Podcast with Brooke Hopkins, Founder at Coval. Coval accelerates AI agent development with automated testing for chat, voice, and other objective-oriented systems.

Many engineering teams are racing to market with AI agents, but slow manual testing processes are holding them back. Teams currently play whack-a-mole just to discover that fixing one issue introduces another.

At Coval, they use automated simulation and evaluation techniques inspired by the autonomous vehicle industry to boost test coverage, speed up development, and validate consistent performance.

🤖 Bonus Bits and Bytes!

🧠OpenAI CEO Sam Altman is right and very wrong about AI-faked voices - “It’s getting scarily easy to create an artificial intelligence soundalike to fool your banks or loved ones. We need industry and government to fix this.”
🥐 Microsoft is doubling down on multilingual large language models – and Europe stands to benefit the most - “The tech giant wants to ramp up development of LLMs for a range of European languages”
🚒 AI voice company Hyper raises $6.3M to help automate 911 calls - “We train our models on real 911 calls with local agencies,” says Hyper CEO Ben Sanders
🤖 Meta reportedly closes deal to buy AI voice replicator PlayAI - Meta has finalized the agreement to purchase Play AI, a startup based in California providing users with an AI voice cloning tool