• AI Minds Newsletter
  • Posts
  • Satya Nadella on AI in healthcare, a16z discusses voice agents’ role in labor, and deep dives into speech modeling

Satya Nadella on AI in healthcare, a16z discusses voice agents’ role in labor, and deep dives into speech modeling

Microsoft's CEO predicts how AI will impact healthcare. The team at a16z reveals what they think will happen to labor when voice agents hit the market full-force. VP of Research at Deepgram, Andrew Seagraves, goes in-depth with Agora on the challenges of speech modeling. And much, much more resides in this edition of AI Minds!

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of voice AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎥 Podcast: One of the 'deepest dives' on the Science of Speech Modeling

  • 📈 Using NLP to Reduce Hallucinations in AI Agents

  • 🎤 From wearables to avatars, which AI Agent form is best at modeling the world?

  • 🌐 Deepgram Expands Nova-3 with Spanish, French, and Portuguese Support 

  • 💡 How to deploy serverless transcription workflow with AWS Lambda + Deepgram STT

  • 🏥 How tech and healthcare companies evaluate Voice AI Healthcare Agents

  • 🐝 Social Media Buzz: Real-time Voice cloning

  • 🚑 Satya Nadella on AI in healthcare, education, and productivity

  • 🐦 Twitter: a16z on voice agents’ role in “swallowing” labor

  • 🚗 Volkswagen committing ~$1.2B into AI capabilities by 2030 

  • 💊 Eli Lilly’s TuneLab: democratizing drug discovery

  • 🔎 FTC probes AI chatbots’ safety (especially for minors)

  • 🔈 Microsoft introduces “Constrained Speech Recognition” for contact centers

  • 💸 Investors doubling down on voice-AI startups

Thanks for letting us crash your inbox; let’s party. 🎉

Want a single, unified conversational AI API for building real-time, enterprise-ready, and cost-effective voice AI agents? Check out this link

🎥  Podcast: One of the 'deepest dives' on the Science and Practical Challenges of Speech Modeling

VP of Research at DG Andrew Seagraves joins Agora’s podcast to discuss the evolution of speech recognition technology. 

The conversation explores innovative methods of data generation, the role of generative models in creating complex audio datasets, and the evaluation of open-source models. It also highlights real-world applications of Deepgram’s technology, the challenges of conversational dynamics such as end-of-turn detection, and the potential of emotion recognition as a frontier in AI development.

🔍 Using NLP to Reduce Hallucinations in Voice AI Agents and Reviewing the Differences between different types of Voice AI Agents—robots, avatars, etc.

Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks - Hallucinations remain a significant challenge in current Generative AI models, undermining trust in AI systems and their reliability. This study investigates how orchestrating multiple specialized Artificial Intelligent Agents can help mitigate such hallucinations using OpenVoiceNetwork (OVON)

Embodied AI Agents: Modeling the World - This paper describes our research on AI agents embodied in visual, virtual or physical forms, enabling them to interact with both users and their environments. These agents, which include virtual avatars, wearable devices, and robots, are designed to perceive, learn and act within their surroundings

🌐 Deepgram Expands Nova-3 with Spanish, French, and Portuguese Support

Deepgram Expands Nova-3 with Spanish, French, and Portuguese Support - Nova-3 brings enterprise-grade accuracy to three global bridge languages, connecting Europe, the Americas, and Africa. Learn more in this article!

⚡ How to deploy a serverless transcription workflow with AWS Lambda + Deepgram STT, and How we evaluate Voice AI Healthcare Agents

Deploy a Serverless Transcription Workflow with AWS Lambda + Deepgram STT - Pairing AWS Lambda with Deepgram’s speech‑to‑text (STT) API, you get a push‑button, event‑driven workflow that scales to meet bursts and drops to zero when idle. In this guide, you’ll deploy a production‑ready foundation: when audio lands in S3, Lambda sends a secure presigned URL to Deepgram’s /v1/listen endpoint and writes both the raw JSON response and the cleaned transcript text to a transcripts/ prefix.

Evaluating Healthcare AI Agents: Building Trustworthy Machines - In this final installment of our six-part series on Healthcare AI Agents, we see how both tech companies and healthcare companies alike ensure that the latest AI innovations are safe, efficient, and effective.

🐝 Social Media Buzz: Real-time Voice cloning, Satya Nadella on AI in healthcare, and a16z on voice agents’ role in labor

🤖 Bonus Bits and Bytes!