AI Minds Newsletter
Posts
GPT-4o Complete Review, Top 10 arXiv Papers on AI Agents, and new IoT Computer Vision Integrations: AI Agents

GPT-4o Complete Review, Top 10 arXiv Papers on AI Agents, and new IoT Computer Vision Integrations: AI Agents

GPT-4o storms AI community, Robot Humanoid AI Agents released, the RobinHood to AI journey, and more

Jose Nicholas Francisco & Marcel Santilli
May 14, 2024

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

🎦 Complete GPT-4o Showcase and Review
💾 Top 10 arXiv Papers on AI Agents
⌨️ How chatbots have improved (and regressed) since ChatGPT
🚀 New Webinar: The Future of Conversational Agents
👀 The Latest Research in AI Vision
📲 IoT in the Era of Generative AI
🚢 FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
🎤 AI Minds Podcast: Peter Dun, Robinhood, and Feathery
🐦 Underrated Tweets on GPT-4o
🤖 Bonus: Unitree Humanoid AI Agent Avatar
🔈 Navigating Deepfakes and Voice Cloning
💎 An overview of Synthetic Data for AI Training
⚕️ How AI could aid movement disorders

Thanks for letting us crash your inbox; let’s party. 🎉

Deepgram just released a brand new text-to-speech model called Aura! Check it out here. 🥳

🎥 GPT-4o Showcase and Review

GPT-4o seems magical. It’s articulate, multimodal, and “perfectly timed to steal the spotlight from Google.” This video walks you through all the benchmarks and demonstrations that OpenAI published on GPT-4 Omni so that we can more fully understand the latest AI has to offer.

🏇 Top 10 arXiv Papers on AI Agents & How chatbots have improved (and regressed) since ChatGPT

The Top 10 arXiv Papers about AI Agents - With the release of GPT-4o, interest in multimodal AI agents is skyrocketing. Here are the top 10 best papers on AI Agents, so you can gather context on how this technology works, where researchers are headed, and how you can use it to your advantage.

Chatbot improvements (and regressions) since ChatGPT - GPT-4 came out around a year ago. GPT-4o came out yesterday. But what exactly happened in between? Have other companies and developers improved upon the invention of the chatbot? Or have we introduced some regressions? Find out here.

💻 New Webinar: The Future of Conversational Agents

Discover the future of Voice AI agents with industry leaders Scott Stephenson, CEO of Deepgram & Kwindla Hultman Kramer, CEO of Daily in an upcoming panel conversation hosted by VUX World. RSVP today!

Where: Online

When: May 30th 9am PDT | 12pm EDT | 5pm BST | 6pm CEST

Sign-up here!

🧑‍🔬 The Latest Research in AI Vision

IoT in the Era of Generative AI: Vision and Challenges - Recent advancements in Generative AI exemplified by GPT, LLaMA, DALL-E, and Stable Diffusion hold immense promise to push IoT to the next level. This article shares the authors’ collective vision and views on the benefits that Generative AI brings to IoT—especially their most important real-world applications.

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback - Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes various hallucination problems. This paper proposes a reward-model based solution.

🎙️ AI Minds Podcast!

Peter Dun shares insights on his experience working at Robin Hood and how that led to the creation of Feathery—a startup that provides streamlined form and workflow solutions, especially for complex forms in areas such as insurance, healthcare, and finance.

Furthermore, Peter and Demetrios explore the crucial role of AI-powered voice recognition technologies and the potential for AI in form automation and digital transformation.

🐝 Underrated GPT-4o Tweets

You can build fun, responsive, conversational AI today even before Open AI ships API endpoints for streaming input to GPT-4o! Here's a demo that uses GPT-4o for text inference (but not transcription or voice generation).
— kwindla (@kwindla)
11:23 PM • May 13, 2024

I hooked gpt-4o up to a voice-gym I have going and the results are ⚡️. Getting gpt4 quality at speeds only available on OSS models before is pretty amazing.
Can't wait to explore more.
Please don't mind the dad jokes.
— Topper 👽 - soul/acc (@tobowers)
9:11 PM • May 13, 2024

GPT-4o is truly remarkable on 18th handwriting. I gave it the following letter and asked it for a transcription. A couple of very minor errors…amazing!
— Generative History (@HistoryGPT)
12:08 AM • May 14, 2024

🤖 Additional Bits and Bytes

⚡Unitree Humanoid AI Agent Avatar - (Video above) The creators of our very own RoboDoge have released a new humanoid product. This new robot boasts new levels of dexterity and intelligence never seen before.
🗣️ Navigating Deepfakes and Voice Cloning: How to Safeguard - As synthetic media technologies develop, ethical debates will continue. They must be handled carefully to benefit society without compromising personal and collective integrity. This article discusses all.
💠 An overview of Synthetic Data for AI Training - This glossary entry discusses the ins-and-outs of Synthetic Data when it comes to finetuning and pre-training new AI models.
🏥 How AI could aid movement disorders in the elderly - Using new AI video techniques, researchers are finding ways to help the elderly with AI. Specifically, this review explores the advantages and challenges associated with using AI-driven video monitoring to care for elderly patients with movement disorders.