Computer Vision Advancements: Putting the “eye” in A.I. 👀

Computer Vision can see your car's plate number, differentiate various animal breeds, and even tell when fruit is ripe to pick. When LLMs come into the mix, what happens next?

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎨 How good is GPT-4 at visual recognition and writing image-generation prompts?

  • 📈 Do Computer Vision and Deep Learning really go hand-in-hand?

  • 👓 GANs and their ties to vision models

  • 🤖 Robots: The intersection of LLMs and Vision

  • 🚗 Tutorial: Build a Visual AI Plate-Number Detector

  • 🍐 AI Drones can visually tell when a fruit is ripe to pick

  • 💊 Johns Hopkins Tweets about Surprising Medical applications for Computer Vision

  • 🐄 Identify animal breeds with AI Vision apps

  • 🎟️ How Computer vision impacts advertising

  • 🎥 Using AI vision to get the most out of CCTV cameras

Thanks for letting us crash your inbox; let’s party. 🎉

Oh yeah, and while you may be familiar with Deepgram’s speech-to-text API, you might want to check out our upcoming text-to-speech technology as well 🥳

🧑‍🔬 Research: Seeing the Cutting Edge of Computer Vision… Literally

A Vision Check-up for Language Models: The authors of this paper have systematically evaluated LLM’s ability to recognize and generate various visual concepts. From shapes to scenes, these models’ sight-based capabilities were tested through and through. Is GPT-4 visually competent? Find out here!

Integration and Performance Analysis of Artificial Intelligence and Computer Vision Based on Deep Learning Algorithms - “The successful experiences in the field of computer vision provide strong support for training deep learning algorithms. . . .  In this paper, typical image classification cases are combined to analyze the superior performance of deep neural network models while also pointing out their limitations in generalization and interpretability, proposing directions for future improvements.”

A survey on GANs for computer vision: Recent research, analysis and taxonomy- GANs, or Generative Adversarial Networks, are typically associated with AI art (and sometimes with TikTok filters). Nevertheless, GANs also go hand-in-hand with computer vision, and this survey provides a general overview of GANs—showing the latest architectures, optimizations of the loss functions, validation metrics and application areas of the most widely recognized GAN variants.

🐎 Computer Vision and AI Robots

Computer Vision and AI Robots: How LLMs and Sight Interact - In this piece, Tife Sanusi reveals the current state of the art of robotics, including Google’s RT-2 model. Of course, vision and robots go hand-in-hand, considering that these bots need to navigate physical space in some form or another. How advanced has current technology become? Find out here!

🎥 Watch this! An expert’s guide to startup growth

Given an off-the-shelf computer vision AI model, how can we create an app that can do something worthwhile, entertaining, or interesting? Well, this video tutorial can show you how! And once you’ve implemented your version of the project, it becomes much easier to extrapolate and have fun with it 🥳

🐝 Social media buzz

What’s social media saying about the world of computer vision? From MIT to Jim Harris, here’s the latest buzz:

🧭 AI Apps in the world of Vision

LensAI is an AI-powered contextual computer vision advertising solution. It monetizes visual content by identifying objects, logos, actions, and context in images and videos. Then it matches them with relevant ads. If you’re an advertiser or publisher, check it out!

Foqus is a real-time, cloud-based video analytics service that provides valuable business insights through advanced AI and computer vision. It analyzes video feeds from IP cameras to generate actionable data and metrics. Foqus is designed for any business that wants to gain greater insight into their physical operations from video and CCTV cameras—from retailers & restaurants to hotels & hospitals.

Siwalu Software offers AI-based animal recognition apps that can identify dogs, cats, and horses. The apps use advanced machine learning algorithms to analyze photos of animals and determine their breed. If you, for example, need to figure out the breed of a rescue animal, Siwalu would definitely come in handy!