Multilingual speech recognition model with emotion recognition, LivePortrait, Lynx open source hallucination detection model, EchoMimic, In-browser speech recognition and more

Jul 12, 2024

Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.

In today’s issue (Issue #70 ):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week

Sponsor AI Brews

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

Alibaba Group introduced FunAudioLLM, a framework for natural voice interactions between humans and large language models. At its core are two new models: SenseVoice for high-precision multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoicefor natural speech generation with multi-language, timbre, and emotion control. The models related have been open-sourced along with the corresponding training, inference, and fine-tuning codes [Details].
Poe added a new feature Previews that lets you see and interact with web applications generated directly in chats on Poe. Previews can be shared with anyone via a dedicated link, and you can view your output in a new tab outside the chat. You can also leverage Poe features like multi-bot chat, file upload, and video input to help you build your custom web applications [Details].
Anthropic added new features to the Anthropic Console. You can now generate, test, and evaluate your prompts. The Console includes a built-in prompt generator, powered by Claude 3.5 Sonnet, that allows you to describe your task and have Claude generate a high-quality prompt. Additionally, you can generate automatic test cases and compare outputs [Details].
Patronus AI released Lynx, an open source, open weights hallucination detection model that excels in real-world domains like medicine and finance. Lynx is the first open source hallucination detection model that outperforms GPT-4o and closed source LLMs-as- Judge. You can use quantized Lynx-8B locally, deploy Lynx-70B with GPUs, or via Patronus AI for API access [Details |Hugging Face].
Stability AI added two new features to Stable Assistant tool: Search & Replace and music generated using Stable Audio. Search & Replace gives you the ability to replace an object in an image with another one. Stable Audio enables the creation of high-quality audio of up to three minutes [Details].
SenseTime unveiled SenseNova 5o, China’s first real-time multimodal model that is especially suitable for real-time conversation, claiming it to be on par with GPT-4o’s streaming interaction capabilities. As part of SenseNova 5.5, SenseTime has released Vimi, its first controllable AI avatar video generator. With just a single photo, Vimi can generate short video clips with precise control over an avatar's facial expressions and upper body movements [Details].
Odyssey, a startup that’s around a year old, is building Hollywood-grade visual AI that should be able to generate cinematic scenery, characters, and lighting; users will have fine-tuned control over every element in the scene [Details].
Amazon launched a public preview of AWS App Studio. App Studio is a generative AI-powered service that uses natural language to create enterprise-grade applications in minutes, without requiring software development skills [Details].
Groq released a new LLM engine. It allows users to perform lightning fast queries with leading large language models (LLMs) directly on its web site. Developer base rockets past 280K in 4 months [Details].
Perplexity is planning revenue sharing program with web publishers next month [Details]
AliPay released EchoMimic model which is capable of generating portrait videos not only by audios and facial landmarks individually, but also by a combination of both audios and selected facial landmarks [Details].
Artifacts made with Claude can now be published and shared. You can also remix Artifacts shared by others.
Meta released the MobileLLM models training codebase [Link].
Red Rabbit Robotics introduced RX1 humanoid - an open-sourced full-human scale dual arm robot that can be built under $1,000 [video].
Text-to-video model Kling from China’s Kuaishou Technology is now accessible via web but access requires a Chinese phone number
Stability AI revised the license for individual creators and small businesses. Models released under this new “Stability AI Community License” can be used for free much more broadly than they could under the previous licenses. It covers recent Stability AI models including SD3 Medium [Details]
Magnific AI launched a Photoshop plugin for users to leverage the AI upscaling and enhancing tool from within Photoshop [Details].
YouTube will use AI to snip copyrighted music and not silence your whole video [Details].
The Voice Isolator model by ElevenLabs is now available via API. It removes background noise from an audio [Details].
OpenAI Startup Fund & Thrive Global create new company, Thrive AI Health, to launch hyper-personalized AI health coach [Details]
Marc Andreessen provided $50,000 in Bitcoin to a semi-autonomous AI agent with an account ‘truth_terminal ‘ on X that asked Marc for the grant [Link]

🔦 Weekly Spotlight

LivePortrait Demo: Add mimics and lip sync to your static portrait driven by a video [Link].
Whisper Timestamped: In-browser speech recognition with word-level timestamps [Link]
LlamaCloud - Built for Enterprise LLM App Builders [Link]
LLM Evaluation doesn't need to be complicated [Link].
Demo of the Numina-Math-7B-TIR - won the first progress prize of the AI Math Olympiad (AIMO) [Link]

🔍 🛠️ AI Toolbox: Product Picks of the Week

Octolens: an AI-powered social listening tool for B2B SaaS that monitors the web & social platforms and sends alerts when somebody mentions a keyword that matters to your business
Leo: AI phone assistants for non-technical people. Set up AI phone assistants for making and receiving calls without coding.
Reworkd: AI agents will understand web pages and automatically generate code to extract the exact data.

Last week’s issue
Real-time speech-to-speech model, Magic Insert, CriticGPT, Meta 3D Gen, Multimodal Canvas, InternLM 2.5, AI Voice Isolator, llama-agents and more
July 5, 2024
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI. In today’s issue (Issue #69): AI Pulse: Weekly News & Insights at a Glance AI Toolbox: Product Picks of the Week
Read full story