Qwen3-Next, Seeddream 4.0, Kling AI Avatar, HunyuanImage 2.1, Stable Audio 2.5, Replit's Agent 3, MiniMax AI Music 1.5 Model and more
Qwen3-Next, Seeddream 4.0, Kling AI Avatar, HunyuanImage 2.1, Stable Audio 2.5, Replit's Agent 3, MiniMax AI Music 1.5 Model and more
Sep 12, 2025
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #112 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ Weekly News at a Glance
ByteDance introduced Seeddream 4.0, a unified image generation and image editing model that can handle complex multimodal tasks including knowledge-based generation, complex reasoning and reference consistency. It can produce images of upto 4K resolution. Seeddream 4.0 is the new leading image model across both the Artificial Analysis Text to Image and Image Editing Arena, surpassing Google's Gemini 2.5 Flash (Nano-Banana), across both [Details].
Alibaba Qwen Team:
Designed a new model architecture called Qwen3-Next and trained the Qwen3-Next-80B-A3B-Base model based on it — an 80-billion-parameter model that activates only 3 billion parameters during inference and achieves performance comparable to the dense Qwen3-32B model, while using less than 10% of its training cost. Two post-trained versions released: Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking. The Qwen3-Next-80B-A3B-Instruct performs comparably to their flagship model Qwen3-235B-A22B-Instruct-2507, while Qwen3-Next-80B-A3B-Thinking excels at complex reasoning tasks, outpeforming the closed-source Gemini-2.5-Flash-Thinking on multiple benchmarks, and approaching the performance of their top-tier model Qwen3-235B-A22B-Thinking-2507 [Details].
introduced Qwen3-Max-Preview (Instruct), their biggest model yet, with over 1 trillion parameters, available via Qwen Chat & Alibaba Cloud API [Details].
Qwen3-ASR, a speech recognition model supporting 11 languages and multiple accents. It accurately transcribes songs even with background music [Details]
Claude can now create and edit Excel spreadsheets, documents, PowerPoint slide decks, and PDFs directly in Claude.ai and the desktop app [Details].
Kling AI launched Kling AI Avatar, letting users animate avatars by adding custom images and audio, and prompted emotions and expressions. Limited access available at launch [Details].
Moonshot AI released Kimi K2-0905, an updated version of version of Kimi K2 that shows improvements on public benchmarks and real-world coding agent tasks and supports 256K context [Details].
Google:
Veo 3 and Veo 3 Fast: support for vertical format outputs (9:16 aspect ratio), 1080p HD output, and new, lower pricing (Veo 3: $0.40 / s from $0.75 / s
and Veo 3 Fast: $0.15 / s from $0.40 / second) [Details].
A2A Extensions: A2A protocol provides a standardized framework for agent-to-agent communication. Extensions allow developers to add custom, domain-specific functionalities and methods to their A2A servers, going beyond the core protocol [Details].
Google AI Edge Gallery, an open-source app for running AI models locally, is now available in open beta on the Google Play Store. It includes a new "Audio Scribe" features that lets you upload or record audio, and get on device transcriptions via Gemma 3n [Details].
You can now upload audio files to chat in the Gemini app [Details]
Tencent Hunyuan released:
HunyuanImage 2.1, an open-source text-to-image model with native 2K image generation. It supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image [Details].
b. HunyuanWorld-Voyager, an open-source ultra-long-range world model with native 3D reconstruction. Built on HunyuanWorld 1.0, Voyager can generate 3D-consistent scene videos for world exploration following custom camera trajectories. It achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines [Details].
Replit launched Agent 3, an autonomous AI agent capable of running for up to 200 minutes while tackling human-level development tasks. The agent can independently test and debug code, as well as build custom agents and workflows to automate complex or repetitive processes. According to Replit, their proprietary testing system is 3x faster and 10x more cost effective than state-of-the-art Computer Use Models [Details].
MiniMax (Hailuo) AI Music 1.5 Model is live with API available. It can generate full songs (up to 4 mins) with natural vocals, global musical styles and authentic cultural expression. You can control the style, mood and scenario [Details].
OpenRouter has two new stealth models,Sonoma Dusk Alpha and Sonoma Sky Alpha with 2M context window. Available for free during alpha period [Details].
Stability AI launched Stable Audio 2.5, an audio generation model designed specifically for enterprise-grade sound production and is optimized for music. In addition to text-to-audio and audio-to-audio workflows, Stable Audio 2.5 supports audio inpainting, which means users can input their own audio, select where they want it to start, and the model will use the context to generate the rest of the track. It has an inference speed of less than two seconds on a GPU, for tracks up to three minutes [Details].
OpenAI:
added full support for MCP tools in ChatGPT. In developer mode, developers can create connectors and use them in chat for write actions (not just search/fetch) [Details].
You can now branch conversations in ChatGPT, letting you more easily explore different directions without losing your original thread. Available now to logged-in users on web [details]
Baidu launched ERNIE X1.1, a reasoning model that surpasses DeepSeek R1-0528 and performs on par with GPT-5 and Gemini 2.5 Pro [Details].
RSS co-creator launched Real Simple Licensing (RSL), a new protocol for AI data licensing [Details]
ElevenLabs launched a new feature Voice Remixing in alpha that lets you modify the core attributes of any voice you own while keeping its unique identity. With simple prompts, you can adjust gender, accent, style, pacing, and audio quality to create new variations of your voices for different contexts, creations and charcaters [Details].
Meituan LongCat released LongCat-Flash, an open-source Agentic Mixture-of-Experts (MoE) with 560B total params activating 18.6 to 31.3 billion parameters (out of 560 billion total) based on contextual demands. Based on multiple benchmark tests, LongCat-Flash, as a non-thinking foundational model, performs comparably to leading mainstream models by activating only a small number of parameters with significantly faster inference speed [Details].
Mistral AI’s Le Chat AI assistant app now integrates with 20+ enterprise platforms powered by MCP, and remembers what matters with Memories. Both Connectors and Memories are available to all Le Chat users [Details].
Vercel released an open-source end-to-end vibe coding platform where the user can enter text prompts, and the agent will create a full stack application [Details].
Anthropic adds memory to Claude Team and Enterprise, incognito mode for all users [Details]
YouTube’s multi-language audio feature for dubbing videos rolls out to all creators [Details].
Codebuff: Generate code from the terminal. Codebuff beats Claude Code across 175+ coding tasks over multiple open-source repos.
🔍 🛠️ Product Picks of the Week
Tripo AI 3.0: turn text or images into production-ready 3D assets. Tripo 3.0 delivers sharper geometry, cleaner topology, and richer textures for higher-quality results.
Solid: Build production-ready web apps (with full backend logic, auth, databases, APIs), dashboards, portfolios, ecommerce, landing pages, internal tools, and more
AgentOS: Agno’s production-ready runtime and control plane that runs entirely within your own infrastructure, ensuring complete data privacy and control.
Last Issue
Thanks for reading and have a nice weekend! 🎉 Mariam.