New unified reasoning and intuitive language model, Video Ads Foundation Models, Agent Leaderboard, 1.6B open-source expressive TTS, Mobile App development in Replit and Bolt, and more

Feb 14, 2025

Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.

In today’s issue (Issue #93 ):

AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week

🗞️🗞️ AI Pulse: Weekly News at a Glance

Nous Research released a new open-source LLM - DeepHermes-3 Preview, which is one of the first models to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model. DeepHermes 3 is built from the Hermes 3 datamix, with new reasoning data, creating a model that can toggle on and off long chains of thought for improved accuracy at the cost of more test time compute [Details].
Zyphra released Zonos-v0.1 beta, featuring two expressive and real-time text-to-speech (TTS) models, trained on more than 200k hours of varied multilingual speech, that can can accurately perform speech cloning when given a reference clip spanning just a few seconds. Both the 1.6B transformer and 1.6B hybrid models have been released under an Apache 2.0 license [Details].
AlphaGeometry2, an upgraded AI by Google DeepMind, has surpassed the average gold medallist level in the International Mathematical Olympiad (IMO). Its predecessor, AlphaGeometry, had previously matched silver medallist performance a year ago [Details].
Bytedance introduced Goku, a new family of joint image-and-video generation models based on rectified flow Transformers and Goku+, a new family of video foundation models built on top of Goku, specifically designed to optimize advertising scenarios [Details].
YouTube is integrating Veo 2, Google's latest video generation model, into its Dream Screen feature for Shorts, allowing creators to generate AI-driven video backgrounds as well as standalone clips using text prompts [Details].
Deepgram introduced Nova-3, the first voice AI model to offer real-time multilingual transcription. It maintains high accuracy even in environments with significant speaker-to-microphone distance, overlapping speech, and background noise, such as air traffic control, drive-thrus, and call centers [Details].
Replit announced Native Mobile App support on Replit, available in early access, for building iOS and Android apps powered by Replit Assistant [Link].
Bolt has also added Native Mobile App support for building mobile apps from prompts [Link].
OpenAI o1 and o3-mini now support both file & image uploads in ChatGPT. o3-mini-high usage limit for Plus users increased to up to 50 per day [Link].
Galileo Labs launched new Agent Leaderboard on Hugging Face to evaluate how do top LLMs perform in real-world agentic scenarios [Details].
Luma AI’s Image to Video model with Ray2 is available now [Link].
Perplexity now offers file and image uploads with an expanded context window of 1 million tokens. Free for all signed in users in “Auto” mode [Link]
OpenAI shared report on competitive programming with large reasoning models. o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors [Link].
Adobe’s commercially safe Firefly video model is now in public beta [Details]

Anthropic launched the Anthropic Economic Index, an initiative aimed at understanding AI's effects on labor markets and the economy over time. The Index’s initial report provides first-of-its-kind data and analysis based on millions of anonymized conversations on Claude.ai [Details]
Twelve Labs updated its video understanding model to Pegasus 1.2, which can now analyze videos up to 1 hour in length with superior ability to understand time-based events [Details].
Codeium’s updates for Windsurf editor: MCP server support to Cascade, Drag-and-drop images, ‘Turbo’ Mode to let Cascade run all terminal commands for you and more [Details]
Perplexity introduced a new version of Sonar, Perplexity's in-house model that is optimized for answer quality and user experience. Built on top of Llama 3.3 70B, Sonar has been further trained to enhance answer factuality and readability for Perplexity’s default search mode [Details].
OpenAI plans to simplify its product offerings by releasing GPT-4.5 as the final non-chain-of-thought model, followed by GPT-5, which will unify its GPT-series and o-series models [Details].
Snap has unveiled an AI text-to-image research model for mobile devices that will power some of Snapchat’s features. It can produce high-resolution images in around 1.4 seconds on an iPhone 16 Pro Max [Details].

🔦 Weekly Spotlight: Noteworthy Reads and Open-source Projects

Hugging Face’s AI Agents Course [Link].
Three Observations by Sam Altman [Link].
Cursor-tools: Give Cursor Agent an AI team and advanced skills. It’s optimized for Cursor Composer Agent but it can be used by any coding agent that can execute commands [Link].
Model Context Protocol servers: a collection of reference implementations for the Model Context Protocol (MCP), as well as references to community built servers and additional resources [Link]

🔍 🛠️ AI Toolbox: Product Picks of the Week

Talo: Real-Time AI translator for video calls.
Tough Tongue AI: Use AI Agent to generate, share, and rehearse difficult conversation scenarios.
Krea Chat: Powered by DeepSeek, this new tool brings the power of every Krea feature into a chat interface.
Lovable Visual Edits: On top of Lovable's chat based AI app builder, you can now easily edit sizes, colors, content, and other stylings of any element on the page with a Figma-like experience. This feature is in early access. See also ‘Launched by Loveable’ - projects built using Loveable.

Last week’s issue
Gemini 2.0 Pro, Diffusion model for video restoration, OmniHuman , o3-mini, Deep Research in ChatGPT and Open-source DeepResearch, GitHub Agent mode, Arena-Price Plot, Pikadditions and more
Feb 7
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
Read full story