Grok 4, 4KAgent, Moonvalley’s Marey, Devstral Medium, open-source AI robot, SmolLM3, FlexOlmo, Phi-4-mini-flash-reasoning, Trae Agent, Genspark AI Docs + AI Pods, Comet, FlexOlmo and more
Grok 4, 4KAgent, Moonvalley’s Marey, Devstral Medium, open-source AI robot, SmolLM3, FlexOlmo, Phi-4-mini-flash-reasoning, Trae Agent, Genspark AI Docs + AI Pods, Comet, FlexOlmo and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #107 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ Weekly News at a Glance
xAI launched two new models, Grok 4 and Grok 4 Heavy, that show frontier-level performance on several benchmarks. Grok 4 achieved a state-of-the-art score on ARC-AGI-2 test (a difficult benchmark that consists of puzzle-like problems where an AI has to identify visual patterns) scoring nearly twice the score of the next best commercial AI model, Claude Opus 4. Grok 4 also scored 25.4% on Humanity’s Last Exam without “tools,” outperforming Google’s Gemini 2.5 Pro, which scored 21.6%, and OpenAI’s o3 (high), which scored 21%. Grok 4 Heavy spawns multiple agents to work on a problem simultaneously, and then they all compare their work “like a study group” to find the best answer [Details].
Mistral introduced Devstral Medium, as well as an upgrade to Devstral Small with improved performance and cost efficiency. Devstral Small 1.1 achieves a score of 53.6% on SWE-Bench Verified, and sets a new state-of-the-art for open models without test-time scaling. Devstral Medium, with a score of 61.6% on SWE-Bench Verified, is available via API [Details].
Moonvalley’s Marey, a commercially safe video model, trained only on licensed, high-resolution footage and built for professional production, is now publicly available [Details].
Hugging Face released SmolLM3, a fully open 3B model that outperforms Llama-3.2-3B and Qwen2.5-3B while staying competitive with larger 4B alternatives (Qwen3 & Gemma3) [Details].
Ai2 introduced FlexOlmo, a new paradigm for language model training that enables co-development of AI through data collaboration. With FlexOlmo, data owners can contribute to the development of a language model without giving up control of their data. There’s no need to share raw data directly, and data contributors can decide when their data is active in the model (i.e., who can make use of it and when), deactivate data at any time, and receive attributions whenever data is used for inference [Details].
Microsoft released Phi-4-mini-flash-reasoning. This new model follows Phi-4-mini, but is built on a new hybrid architecture, that achieves up to 10 times higher throughput and a 2 to 3 times average reduction in latency, enabling significantly faster inference without sacrificing reasoning performance [Details].
Anthropic launched a free educational platform with 6 courses that include dozens of lectures, self-guided quizzes and certificates upon completion [Details].
Hugging Face introduced Reachy Mini, an expressive, open-source robot designed for human-robot interaction, creative coding, and AI experimentation. Fully programmable in Python (and soon JavaScript, Scratch), starting from $299 [Details].
Perplexity launched Comet, an AI-native browser with a built-in assistant capable of agentic tasks like scheduling meetings, sending emails and automating workflows [Details].
Genspark launched AI Docs and AI Pods. Genspark AI Docs is an agentic AI document creator that supports both rich text AND markdown natively. AI Pods lets you generate a professional podcasts from news articles, YouTube videos, research papers etc. from one prompt [Details].
Character.AI introduced TalkingMachines, a new autoregressive diffusion model that enables real-time, audio-driven, FaceTime-style video generation. With just an image and a voice signal, the model can generate an interactive, real-time video of characters conversing across different styles, genres, and identities [Details].
Researchers introduced 4KAgent, an agentic image super-resolution generalist designed to universally upscale any image to 4K resolution, regardless of input type, degradation level, or domain. Code will be released soon [Details].
Liquid AI released the 2nd generation of their Liquid foundation models, LFM2 specifically designed for edge AI and on-device deploymen. LFM2 outperforms similarly-sized models across multiple benchmark categories [Details].
Google announced new multimodal models in the MedGemma collection of open models for health AI development. MedGemma is useful for medical text or imaging tasks that require generating free text, like report generation or visual question answering. MedSigLIP is recommended for imaging tasks that involve structured outputs like classification or retrieval [Details].
Tencent introduced Hunyuan3D-PolyGen, an art-grade 3D generative model [Details].
Bytedance released Trae Agent, an open-source LLM-based agent for general purpose software engineering tasks. It provides a CLI interface that can understand natural language instructions and execute complex software engineering workflows using various tools and LLM providers [Details].
🔦 🔍 Weekly Spotlight
Articles/Courses/Videos:
Compare AI video models - Replit blog
Introduction to deep research in the OpenAI API - OpenAI Cookbook
Grok 4 seems to consult Elon Musk to answer controversial questions - TechCrunch
Open-Source Projects:
MemOS: an operating system for Large Language Models (LLMs) that enhances them with long-term memory capabilities.
NotebookLlaMa: A fully open-source, LlamaCloud-backed alternative to NotebookLM.
Cactus: Cross-platform framework for deploying LLM/VLM/TTS models locally on smartphones.
Wasm-agents by Mozilla AI: AI agents running in your browser.
opencode: an AI coding agent built for the terminal. Supports 75+ LLM providers through Models.dev, including local models.
MCP Toolbox for Databases by Google: an open source MCP server for databases.
🔍 🛠️ Product Picks of the Week
Dora Studio: AI-powered motion graphics generator.
Songscription: Upload your single-instrument audio file or YouTube link and let the AI convert it into professional sheet music and MIDI.
Simili: Add Real-time video avatars to your app or website in minutes.
Context: The AI Office Suite.
rtrvr.ai: Retrieve structured data, research across tabs, and automate complex tasks.
Vogent VoiceLab: A fast, stable, and scalable API for top text-to-speech models, including Sesame CSM-1B and Dia.
Last Issue
AI-Native UGC Game Engine, Reasoning VLMs, Kyutai TTS, Collective Intelligence for Frontier AI, pay per crawl, String by Pipedream, Qwen VLo, Ovis-U1-3B, Context Engineering and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.