Mercury chat diffusion LLM, Gemini CLI, Magenta RT, Hunyuan-A13B, Multimodal Reflection, OmniGen2, Matrix-Game, Warp 2.0, Anthropic's Desktop Extensions, Hailuo & HeyGen's video agents and more
Mercury chat diffusion LLM, Gemini CLI, Magenta RT, Hunyuan-A13B, Multimodal Reflection, OmniGen2, Matrix-Game, Warp 2.0, Anthropic's Desktop Extensions, Hailuo & HeyGen's video agents and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #105):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ Weekly News at a Glance
Inception Labs launched Mercury, the first commercial-scale diffusion LLM tailored for chat applications. When benchmarked by Artificial Analysis, Mercury matches the performance of speed-optimized frontier models like GPT-4.1 Nano and Claude 3.5 Haiku while running over 7x faster [Details].
Tencent released Hunyuan-A13B, an open-source large language model built on a fine-grained Mixture-of-Experts (MoE) architecture. With only 13 billion active parameters (out of a total of 80 billion), the model achieves performance on par with models like o1 across multiple mainstream benchmarks [Details].
Google:
Full release of multimodal Gemma 3n model for edge devices. Gemma 3n, announced as a preview during Google I/O, natively supports image, audio, video, and text inputs and text outputs. Available in Hugging Face Transformers, llama.cpp, Google AI Edge, Ollama, MLX, and others. Google also launched The Gemma 3n Impact Challenge, with $150K in prizes [Details].
Gemini CLI: an open-source AI agent that brings Gemini 2.5 Pro directly to your Terminal. 60 model requests per minute and 1,000 requests per day available at no charge [Details].
Magenta RealTime (Magenta RT), an open-weights live music model, in research preview, that allows you to interactively create, control and perform music in the moment [Details].
AlphaGenome, an AI that more comprehensively and accurately predicts how single variants or mutations in human DNA sequences impact a wide range of biological processes regulating genes. It’s available in preview via AlphaGenome API for non-commercial research [Details].
Gemini Robotics On-Device, a VLA (vision language action) model optimized to run locally on robotic devices. Gemini Robotics On-Device shows strong general-purpose dexterity and task generalization, and it’s optimized to run efficiently on the robot itself [Details].
Imagen 4 text-to-image model is now available in the Gemini API and Google AI Studio [Details].
Doppl: a new experimental app from Google Labs to help you visualize how an outfit might look on you and explore your style [Details].
Blackforest Labs released FLUX.1 Kontext [dev], an open weights model for proprietary-level image editing performance that can run on consumer chips. It was earlier available in private beta only [Details].
Claude users can build, host, and share interactive AI-powered apps directly in the Claude app. When someone uses your Claude-powered app, they authenticate with their existing Claude account, their API usage counts against their subscription, not yours and there’s no need to manage API keys [Details].
Beijing Academy of Artificial Intelligence released OmniGen2, an open-source generative model that provides a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. It achieves state-of-the-art performance among open-source models in terms of consistency. A distinctive feature of OmniGen2 is its built-in reflection mechanism, allowing it to evaluate its own outputs, identify shortcomings, and generate improved results through iterative refinement [Details].
Skywork AI released Matrix-Game, an open-weights interactive world foundation model for controllable game world generation. With over 17 billion parameters, Matrix-Game enables precise control over character actions and camera movements, while maintaining high visual quality and temporal coherence [Details].
Anthropic introduced Desktop Extensions, a new packaging format that makes installing MCP servers simple [Details].
Moonshot AI introduced Kimi-Researcher, an autonomous agent that excels at multi-turn search and reasoning. It performs an average of 23 reasoning steps and explores over 200 URLs per task. Built on an internal version of the Kimi k-series model and trained entirely through end-to-end agentic reinforcement learning (RL), it achieved state-of-the-art result on Humanity's Last Exam [Details]
Researchers introduced JarvisArt, a multi-modal large language model (MLLM)-driven agent that understands user intent, mimics the reasoning process of professional artists, and intelligently coordinates over 200 retouching tools within Lightroom [Details].
Microsoft introduced a new on-device small language model, Mu that powers the agent in Settings by mapping natural language input queries to Settings function calls [Details].
Airtable has been relaunched as an AI-native app platform offering a new app building agent, Omni [Details].
Hailuo AI (MiniMax) introduced Hailuo Video Agent in Beta, unveiling a three-stage roadmap that starts with the now-available Stage 1: pre-built templates for one-click polished videos [Details] .
HeyGen announced HeyGen Video Agent that turns prompts into finished videos for ads, TikToks, short films, product demos etc. You can join the wait list here
🔦 🔍 Weekly Spotlight
Articles/Courses/Videos:
Software Is Changing (Again) - Andrej Karpathy's keynote at AI Startup School in San Francisco
The lethal trifecta for AI agents: private data, untrusted content, and external communication - Simon Willison
How we built our multi-agent research system - Anthropic
Agentic Misalignment: How LLMs could be insider threats - Anthropic
Does MCP Kill Vector Search - Jerry Liu
Open-Source Projects:
any-agent by Mozilla.ai: A single interface to use and evaluate different agent frameworks
Self-Forcing Video Generation: Real-time video generation with distilled Wan2-1 1.3B
Fire Enrich by Firecrawl: open-source Clay alternative. Upload a CSV with emails and AI agents automatically fill in missing data like decision makers, company size, and more.
Customer Service Agents Demo by OpenAI: Demo of a customer service use case implemented with the OpenAI Agents SDK
🔍 🛠️ Product Picks of the Week
11ai by ElevenLabs: Personal AI voice assistant with MCP support. Plan your day, research customers with Perplexity, manage Linear tickets, and message your Slack team, all with just your voice.
Warp 2.0: the first Agentic Development Environment. Warp 2.0 has four capabilities in a single app: Code, Agents, Terminal, and Drive, a shared knowledge store for your team and your agents.
Yupp: Check out the best answers from all the latest AIs for free, including Claude Opus 4 and 03-pro. See this launch post
Genspark AI Browser: a full-agentic browser with Autopilot Mode and access to 700+ tool via built-in MCP Store
MiniMax Agent: a general intelligent agent built to tackle long-horizon, complex tasks.
Voice Design by MiniMax: Design entirely new voices from text description
Thanks for reading and have a nice weekend! 🎉 Mariam.