Eleven v3, OpenAudio S1, Higgsfield Speak, Self-improving coding agent, FLUX.1 Kontext, Modify Video, Runner H, Perplexity Labs, Mistral Code, Chatterbox and more

Jun 06, 2025

Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.

In today’s issue (Issue #104):

AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week

🗞️🗞️ Weekly News at a Glance

Eleven Labs introduced Eleven v3 (alpha), their most expressive Text to Speech model with deeper text understanding. It supports a wide variety of audio tags that are somewhat voice and context dependent and delivers multi-speaker conversations with natural pacing and interruptions [Details].
Hanabi AI launched OpenAudio S1, a text-to-speech (TTS) model with sophisticated understanding and rendition of human emotion and vocal nuances. It ranked #1 in Human Subjective Evaluation on HuggingFace TTS-Arena-V2. The model supports a rich set of emotional and tone markers to precisely control the synthesized speech from (angry), (happy), (sad) to nuanced (emphasize), (whispering), (empathetic), and more [Details].
Google introduced an upgraded preview of Gemini 2.5 Pro, calling it their most intelligent model yet, with improved coding reasoning and creative writing [Details].
Higgsfield launched Higgsfield Speak, a tool to make motion-driven talking videos from an avatar and a script.
H Company launched Runner H (public beta), an AI agent that executes entire workflows across web apps, documents, spreadsheets, and more with a single prompt. It’s available for free for now. They’ve also open-sourced Holo-1 (3B & 7B), an Action Vision-Language Model (VLM) designed to interact with web interfaces like a human user [Details].
Black Forest Labs released FLUX.1 Kontext and the BFL Playground. The FLUX.1 Kontext family performs in-context image generation, allowing you to prompt with both text and images. Try free on Replicate’s Kontext Chat app [Details].
PlayAI open-sourced PlayDiffusion, a diffusion-based inpainting model. PlayDiffusion uses a non-AR diffusion model to inpaint audio, cleanly filling in masked regions while preserving context [Details].
Luma AI introduced Modify Video, a tool powered by Ray 2 to reimagine any video [Details].
HeyGen’s Avatar IV got a major upgrade. It can now generate 1080p video, 60 seconds long videos of AI actors with dynamic gestures based on your script, gesture control via prompt and micro-expressions. HeyGen also launched AI Studio video platform that gives you fine-grained control over every aspect of your video creation process, from voice customization to gestures and avatar movement [Details].
OpenAI is adding new features in ChatGPT, including integrations with different cloud services, meeting recordings, and MCP connection support for connecting to tools for deep research [Details].
SakanaAI introduced the Darwin Gödel Machine (DGM), a self-improving coding agent that rewrites its own code to improve performance on programming tasks. [Details].
Perplexity launched Perplexity Labs, a new mode of doing your searches on Perplexity for much more complex tasks like building trading strategies, dashboards, headless browsing tasks for real estate research, building mini-web apps, storyboards, and a directory of generated assets [Details].
Cursor released version 1.0, which brings BugBot for code review, memories, one-click MCP setup, Jupyter support, and general availability of Background Agent [Details].
Microsoft is now offering free video generation, powered by OpenAI’s Sora model, through the Bing mobile app [Details].
OpenAI is fighting a court order to preserve all ChatGPT user logs—including deleted chats and sensitive chats logged through its API business offering—after news organizations suing over copyright claims accused the AI company of destroying evidence [Details].
Perplexity has added SEC/EDGAR integration for all users across Search, Research, and Labs. This provides direct access to comprehensive financial data for all investors, making technical documents instantly understandable [Details].
Bland AI introduced Bland TTS that turns text into realistic sound effects or AI voice tracks with precise control over style and emotion [Details]
Mistral AI launched Mistral Code (in Private beta), an AI-powered coding assistant that bundles powerful models, an in-IDE assistant, local deployment options, and enterprise tooling into one fully supported package [Details].
ElevenLabs launched Conversational AI 2.0, a significant update to its AI voice agent platform. The update includes a state-of-the-art turn-taking model, Multimodality, automatic language detection, Integrated RAG and batch calls [Details]
Langfuse is open sourcing all remaining Product Features in Langfuse under the MIT license [Details].
Google’s NotebookLM now supports sharing of notebooks publicly [Details].
OpenAI is rolling out Codex to ChatGPT Plus users. You can now give Codex access to the internet during task execution to install base dependencies, run tests that need external resources, upgrade or install packages needed to build new features, and more [Details].
Anthropic’s Claude Code, a command line tool that gives you access to Claude models directly in your terminal, is now also available to users on the Pro plan [Details].
Exa launched /research - agentic search for automating web research that requires making many searches, then returning insights as structured outputs [Details].
Manus introduced Manus video generation. With a single prompt, Manus plans each scene, crafts the visuals, and animates your vision [Details].
GitHub introduced Copilot Spaces. Spaces let you ground Copilot’s knowledge in a curated set of specific code, documents, notes, and more. With this extra context, Copilot becomes an expert in the task at hand [Details].
Firecrawl launched /search - the endpoint that allows you to search and scrape all with one API call [Details].

🔦 🔍 Weekly Spotlight

Articles/Courses/Videos:

MCP: Build Rich-Context AI Apps with Anthropic - short course on DeepLearning.AI
The battle to AI-enable the web: NLweb and what enterprises need to know.
How Generative Engine Optimization (GEO) Rewrites the Rules of Search - Andreessen Horowitz
Gemini Magic Mirror with the Raspberry Pi
Selecting a Model Based on Stripe Conversion: A Practical Eval for Startups - OpenAI Cookbook
GitHub MCP Exploited: Accessing private repositories via MCP -Invariant Labs

Open-Source Projects:

Chatterbox: Resemble AI's first production-grade open source TTS model
circuit-tracer: Open-sourcing circuit tracing tools - Anthropic
Vibetest Use by Browser Use: An MCP server that launches multiple Browser-Use agents to test a vibe-coded website for UI bugs, broken links, accessibility issues, and other technical problems.
Google AI Edge Gallery: an experimental app for running AI models locally on your Android (available now) and iOS (coming soon) devices
fast-agent: Define, Prompt and Test MCP enabled Agents and Workflows. It is the first framework with complete, end-to-end tested MCP Feature support including Sampling.

🔍 🛠️ Product Picks of the Week

Mirage Studio by Captions: Powered by Captions’s proprietary omni-modal foundation model, generate expressive videos at scale, with actors that can laugh, flinch, sing, rap etc.
Underlord by Descript: an AI video editor for vibe editing.
Pine: An AI agent that takes hassle off your plate — handling customer service calls, refunds, dealing with billing issues, cancelling subscriptions etc.
Factory AI: Delegate software development tasks to agents called Droids. Droids take commands and deliver: pull requests, tickets, docs, and more.
Shaken: AI agent for learning. Create voice-powered learning content in minutes

Last Issue
Claude Opus 4 & Claude Sonnet 4, Gemini Diffusion, Veo3, Imagen 4, Jules, NLWeb, BAGEL, Devstral, safe vibe coding, Matrix-Game, Lyria RealTime API and more
May 23
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Read full story