Claude Opus 4 & Claude Sonnet 4, Gemini Diffusion, Veo3, Imagen 4, Jules, NLWeb, BAGEL, Devstral, safe vibe coding, Matrix-Game, Lyria RealTime API and more

May 23, 2025

Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.

In today’s issue (Issue #103 ):

AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week

🗞️🗞️ Weekly News at a Glance

Anthropic:
1. Next generation of Claude models available: Claude Opus 4 and Claude Sonnet 4, setting new standards for coding, advanced reasoning, and AI agents.Claude 4 models lead on SWE-bench Verified, a benchmark for performance on real software engineering task. Also, Claude Code is now generally available [Details].
2. Four new capabilities on the Anthropic API that enable developers to build more powerful AI agents: the code execution tool, MCP connector, Files API, and the ability to cache prompts for up to one hour [Details].
3. System Card: Claude Opus 4 & Claude Sonnet 4
4. Claude Code SDK: allows developers to programmatically integrate Claude Code into their applications. It enables running Claude Code as a subprocess, providing a way to build AI-powered coding assistants and tools [Details].
Google I/O 2025:
1. Gemini Diffusion: a state-of-the-art text diffusion model that learns to generate outputs by converting random noise into coherent text or code, like how models in image and video generation work.
2. Veo 3: a new state-of-the-art video generation model that not only improves on the quality of Veo 2, but can also generate videos with audio, including sound effects, ambient noise, and lip-synced dialogue.
3. New capabilities to 2.5 Pro and 2.5 Flash:
  1. Native audio output for a more natural conversational experience and improvements to Live API
  2. 2.5 Pro Deep Think, an experimental, enhanced reasoning mode for highly-complex math and coding. It uses new research techniques enabling the model to consider multiple hypotheses before responding. Gemini 2.5 Pro Deep Think has the highest score across many of the hardest benchmarks for maths, coding, and multimodal reasoning.
  3. Project Mariner's computer use capabilities will be available in the Gemini API and Vertex AI
  4. Nnative SDK support for Model Context Protocol (MCP) definitions in the Gemini API
4. Gemma 3n preview: a fast and efficient open multimodal model that runs on as little as 2GB of RAM. It uses a cutting-edge architecture optimized for mobile on-device usage and handles audio, text, image, and video.
5. Jules: an asynchronous, agentic coding assistant that integrates directly with your existing repositories is now in public beta
6. Flow: a new AI filmmaking tool powered by Veo, Imagen and Gemini.
7. Imagen 4: creates richer, more detailed images with superior typography & color. Offers 2k resolution & improved prompt adherence.
8. Google Beam: an AI-first 3D video communication platform; formerly known as Project Starline. It transforms standard 2D video streams into realistic 3D experiences, allowing you to connect in a more natural and intuitive way.
9. Lyria RealTime: an interactive music generation model which powers MusicFX DJ, available via an API and in AI Studio. Lyria RealTime lets you interactively create, control, and perform generative music in real time.
10. Lyria 2 music generation model is now available for creators through YouTube Shorts and enterprises in Vertex AI.
11. Video Overviews coming to NotebookLM. Official iOS and Android apps launched for NotebookLM. [Video Overviews generated in NotebookLM for the keynote as well as for the developer updates]
12. Google Meet now offers near real-time, low-latency speech translation. This new feature translates your spoken words into your listener’s preferred language — in near real time, with low-latency, and while preserving your voice, tone, and expression.
13. More: Dive deeper into the news with NotebookLM.
Bytedance released BAGEL, an open-source foundational model that natively supports multimodal understanding and generation with. It demonstrates advanced in-context multimodal abilities like free-form image editing, future frame prediction, 3D manipulation, world navigation, and sequential reasoning [Details | Demo].
OpenAI:
1. Research preview of Codex, a cloud-based software engineering agent that can work on many tasks in parallel; each task runs in its own cloud sandbox environment, preloaded with your repository. Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering [Details].
2. Remote MCP server support, image generation, Code Interpreter, and more in the Responses API [Details]
3. OpenAI to Z Challenge: use OpenAI o3, o4-mini, or GPT-4.1 to find previously unknown archaeological sites in the Amazon [Details].
Mistral AI released Devstral, an open-source agentic LLM for software engineering tasks (Apache 2.0 license). Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA models by more than 6% [Details]
Replit now provides ‘safe vibe coding’ with built-in safeguards like Replit Auth for authentication, pre-deployment security scans with Semgrep, sandboxed environments that protect critical system files from AI-induced errors and more [Details]
Synyi AI, a Chinese startup, has opened a clinic in Saudi Arabia where a virtual AI doctor makes diagnoses and prescriptions to patients. The treatment plan is reviewed and signed off by a traditional human doctor without seeing the patients [Details].
Shopify launched new AI tools including AI store builder, Shopify Catalog for AI shopping agents, Shopify Knowledge Base App, Storefront MCP and more [Details].
Microsoft Build 2025 [Details]:
1. a new coding agent for GitHub Copilot embedded directly into GitHub. The agent starts its work when you assign a GitHub issue to Copilot or prompt it in VS Code and spins up a secure and fully customizable development environment powered by GitHub Actions
2. NLWeb: an open project designed to simplify the creation of natural language interfaces for websites—making it easy to turn any site into an AI-powered app.
3. broad first-party support for Model Context Protocol (MCP) in GitHub, Copilot Studio, Dynamics 365, Azure AI Foundry, Semantic Kernel and Windows 11
4. Open-sourcing of GitHub Copilot Chat in VS Code
5. Microsoft Discovery: a platform that uses agentic AI to help researchers speed up innovation and bring new products to market faster
6. Windows AI Foundry: a unified platform supporting the AI developer lifecycle from model selection, optimization, fine-tuning and deployment across client and cloud.
7. Magentic-UI: a new open-source research prototype of a human-centered interface powered by a multi-agent system that can browse and perform actions on the web, generate and execute code, and generate and analyze files.
Skywork AI released Matrix-Game, an open-source interactive world foundation model for controllable game world generation. With over 17 billion parameters, Matrix-Game enables precise control over character actions and camera movements, while maintaining high visual quality and temporal coherence [Details]
xAI Live Search API is now free in beta for a limited time. With Live Search, Grok can search through realtime data from 𝕏, the internet , trending news, and more [Details]

🔦 🔍 Weekly Spotlight

Articles/Courses:

Open-source Projects:

The MCP boilerplate for vibe coders: A simple MCP starter kit to help vibe coders quickly create and monetize MCP servers.
BrowserBee: a privacy-first open source Chrome extension that lets you control your browser using natural language

🔍 🛠️ Product Picks of the Week

Mosaic: AI Agents for video editing.
Stitch: a new experiment from Google Labs that allows you to turn simple prompt and image inputs into complex UI designs and frontend code
Sparkle: Organize Your Mac automatically with AI. Sparkle creates a smart folder system using AI, organizing any folder you choose.

Last Issue
AlphaEvolve, Psyche, Windsurf SWE-1, HunyuanCustom, GenSpark's Download Agent, Step1X-3D, Meta 3D AssetGen 2.0, HealthBench, ElevenLab's Soundboard, Maunus Image Generation, Higgsfield Ads and more
May 16
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Read full story