GPT‑5.2, GLM-4.6V, Runway's GWM Worlds, GWM Avatars and GWM Robotics, Nomos 1, Devstral 2, Wan-Move, SimGym, Disco, Stripe's Agentic Commerce Suite and more
GPT‑5.2, GLM-4.6V, Runway's GWM Worlds, GWM Avatars and GWM Robotics, Nomos 1, Devstral 2, Wan-Move, SimGym, Disco, Stripe's Agentic Commerce Suite and more
Dec 12, 2025
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #117 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ Weekly News at a Glance
OpenAI introduced GPT‑5.2 model series (Instant, Thinking, and Pro) that sets a new state of the art across many benchmarks, including GDPval, where it outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations. It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects than any previous model [Details].
GWM-1 (early access) - Runway’s first general world model family. GWM-1 is an autoregressive model built on top of Gen-4.5. It generates frame by frame, runs in real time, and can be controlled interactively with actions—camera pose, robot commands, audio. GWM-1 comes in three variants: GWM Worlds for explorable environments, GWM Avatars for conversational characters, and GWM Robotics for robotic manipulation.
Updates to Gen-4.5 video generation model: Soon you’ll be able to both generate and edit native audio with Gen-4.5 and edit video at arbitrary lengths with multi-shot editing.
Z.ai:
GLM-4.6V series of open-source multimodal large language models: GLM-4.6V (106B-A12B) and GLM-4.6V-Flash (9B). GLM-4.6V supports a 128k training context window and adds native Function Calling with direct image, screenshot, and document inputs. It can visually comprehend results returned by tools—such as searching results, statistical charts, rendered web screenshots, or retrieved product images—and incorporate them into subsequent reasoning chain, as well as final output. GLM-4.6V delivers SOTA performance among similarly sized open models across 20+ multimodal benchmarks particularly in multimodal comprehension, logical reasoning, and long-context tasks [Details].
Open-sourced AutoGLM, a vision-language model that understands phone screens and acts as an autonomous mobile agent [Details].
Devstral 2 coding model family available in two sizes: Devstral 2 (123B) and Devstral Small 2 (24B). Devstral 2 ships under a modified MIT license, while Devstral Small 2 uses Apache 2.0. Devstral 2 reaches 72.2% on SWE-bench Verified, establishing it as one of the best open-weight models while remaining highly cost efficient; Devstral Small 2 scores 68.0% on SWE-bench Verified, and can run locally on consumer hardware.
Mistral Vibe CLI: an open-source command-line coding assistant powered by Devstral. It explores, modifies, and executes changes across your codebase using natural language in your terminal or integrated into your preferred IDE via the Agent Communication Protocol. Released under the Apache 2.0 license.
Google:
Disco: a Google Labs app to discover new generative AI features on the web. The first feature, GenTabs, uses Gemini 3 to remix your open tabs into totally custom apps [Details].
Interactions API: a unified interface for interacting with Gemini models and agents. It simplifies state management, tool orchestration, and long-running tasks [Details].
A more powerful Gemini Deep Research agent, available via the Interactions API that autonomously plans, executes, and synthesizes multi-step research tasks. This release features vastly improved web search, allowing it to navigate deep into sites for specific data and achieves state-of-the-art results on Humanity’s Last Exam(HLE) and DeepSearchQA, a new benchmark to evaluate agents on intricate, multi-step information-seeking tasks [Details].
Titans architecture and the MIRAS framework, which allow AI models to work much faster and handle massive contexts by updating their core memory while it’s actively running [Details].
Upgrades to Gemini 2.5 Flash and Pro Text-to-Speech models that deliver richer expressivity, smarter context-aware pacing, and more natural multi-speaker dialogue [Details].
Nous Research released Nomos 1, a specialization of Qwen/Qwen3-30B-A3B-Thinking-2507 for mathematical problem-solving and proof-writing in natural language. At just 30B parameters, it scored 87/120 on this year’s Putnam, one of the world’s most prestigious math competitions. This score would have ranked second out of 3,988 participants in the 2024 competition. Under the same conditions, the base model scored 24/120. [Details].
Adobe launched Adobe Photoshop, Adobe Express and Adobe Acrobat for ChatGPT. Users can now edit images, design graphics or edit PDFs with simple natural-language commands from within the chat interface [Details].
Tongyi Lab at Alibaba Group released Wan-Move, a simple yet scalable framework that adds motion control to video generative model. It generates 5-second, 480p videos with motion controllability that user studies show is comparable to the commercial Motion Brush feature in Kling 1.5 Pro. It’s implemented on top of Wan-I2V-14B image-to-video (I2V) generation model [Details].
Shopify launched SimGym, a Shopify app to simulate buyer behavior with AI shoppers. It provides insights on add-to-cart rates, cart value, and shopper navigation patterns before changes go live [Details].
Stripe introduced the Agentic Commerce Suite: a new solution that enables you to sell on AI agents more easily by making your products discoverable, simplifying checkout, and allowing you to accept agentic payments via a single integration [Details].
Cursor launched a visual editor for the Cursor Browser. It brings together your web app, codebase, and powerful visual editing tools, all in the same window. You can drag elements around, inspect components and props directly, and describe changes while pointing and clicking [Details]
OpenAI launched AI certification courses to provide practical AI skills: AI Foundations focuses on providing workers with hands-on, real-world training on how to use today’s AI tools, while the course for teachers helps educators build AI expertise and put it to work in the classroom [Details].
The Linux Foundation announced the formation of the Agentic AI Foundation (AAIF) with founding contributions of leading technical projects including Anthropic’s Model Context Protocol (MCP), Block’s goose, and OpenAI’s AGENTS.md [Details].
Anthropic added the ability to delegate tasks to Claude Code directly from Slack. Tagging Claude in Slack automatically spins up a Claude Code session using the surrounding context [Details].
Android Use: Open-source library for AI agents to control native Android apps
Saber: a scalable zero-shot framework for reference-to-video (R2V) generation
AnyTalker: an audio-driven framework for generating multi-person talking videos
Handy: A free, open source, and extensible speech-to-text application that works completely offline. Press a shortcut, speak, and have your words appear in any text field—all without sending your voice to the cloud.
🔍 🛠️ Product Picks of the Week
Orchids: The Vibe Coding IDE that natively supports Supabase and Stripe for auth, database, and payments. Ranks #1 on App Bench and also on UI Bench.
Okara: Private AI Chat with 20+ open-source models with integrated search tools. Switch between Llama, Qwen, DeepSeek and more without losing context.
Mintlify Agent: AI agent for auto-updating documentation. Monitor Git repositories for changes and receive suggested documentation updates
Agent Opus: AI video agent for end-to-end video creation from research, script, motion graphics, to avatar, voiceover, and editing.
Last Issue
Thanks for reading and have a nice weekend! 🎉 Mariam.
Outstanding curation this week! The GLM-4.6V's native function calling capability that can visually comprehendtool outputs is a huge leap forward. I've been tinkering with agentic systems and the visual reasoning over search results or charts directly in the reasoning chain solves alot of the context-loss problems we've been dealing with. It's like the model finally gets to see what it just asked for instead of just geting text back. The 128k context window helps but the visual-tool integration is what really makes this stand out.
Outstanding curation this week! The GLM-4.6V's native function calling capability that can visually comprehendtool outputs is a huge leap forward. I've been tinkering with agentic systems and the visual reasoning over search results or charts directly in the reasoning chain solves alot of the context-loss problems we've been dealing with. It's like the model finally gets to see what it just asked for instead of just geting text back. The 128k context window helps but the visual-tool integration is what really makes this stand out.