Manus AI, Grounded language model, Tavus' Conversational Video Interface, Jamba 1.6, QwQ 32B, Mistral OCR, Character-3, audio-to-video model, Sesame's voice model, Aya Vision, Browser Operator & more
Manus AI, Grounded language model, Tavus' Conversational Video Interface, Jamba 1.6, QwQ 32B, Mistral OCR, Character-3, audio-to-video model, Sesame's voice model, Aya Vision, Browser Operator & more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #96 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News at a Glance
Mistral introduced Mistral OCR model which excels in understanding complex document elements, including interleaved imagery, mathematical expressions, tables, and advanced layouts such as LaTeX formatting. The model enables deeper understanding of rich documents such as scientific papers with charts, graphs, equations and figures [Details].
Manus AI launched a general AI agent called Manus with multi-modal capabilities, that achieves state-of-the-art (SOTA) performance in the GAIA benchmark, which evaluates AI agents on reasoning, multi-modal processing, and tool usage. Manus can handle a variety of tasks, from automating workflows to executing complex decision-making processes without requiring constant human intervention [Details ].
Contextual AI introduced Grounded Language Model (GLM) that achieves state-of-the-art performance on the Public Set of FACTS, the leading groundedness benchmark, outperforming all foundation models. “Groundedness” refers to the degree to which an LLM’s generated output is supported by and accurately reflects the retrieved information provided to it [Details].
Qwen released QwQ-32B, an open weight reasoning model with only 32 billion parameters that achieves performance comparable to DeepSeek-R1, with 671 billion parameters ( 37 billion activated). It’s available in Qwen Chat [ Details].
Sesame AI introduced the Conversational Speech Model (CSM) for speech generation with emotional intelligence, natural timing, pauses, interruptions, emphasis and contextual awareness. You can try the demo with Maya and Miles AI voices. The model will be available under an Apache 2.0 license [Details].
Hedra launched Hedra Studio with Character-3. Character-3, is the first omnimodal model in production, built to jointly reason across image, text, and audio for more intelligent video generation [Details].
Tavus introduced the next evolution of its Conversational Video Interface (CVI) platform – a complete operating system that is emotionally intelligent. It lets you build AI Agents that see, listen, understand, and engage in real-time, face-to-face interactions, powered by a new family of AI models: Phoenix-3, Raven-0, and Sparrow-0 [Details | Demo].
Opera launched Browser Operator, an AI agent that can get stuff done for you in the Opera browser [Details].
AI21 released Jamba 1.6 open model family for private enterprise deployment. Jamba Large 1.6 outperforms Mistral Large 2, Llama 3.3 70B, and Command R+ on quality, and Jamba Mini 1.6 outperforms Ministral 8B, Llama 3.1 8B, and Command R7B. With a context window of 256K and hybrid SSM-Transformer architecture, Jamba 1.6 excels at RAG and long context grounded question answering tasks [Details].
Cohere For AI released Aya Vision, a state-of-the-art open-weights vision model excelling across multiple languages and modalities. Aya Vision outperforms the leading open-weight models in multilingual text generation and image understanding. Aya Vision 8B outperforms models 10x its size such as Llama-3.2 90B Vision; Aya Vision 32B outperforms models more than 2x of its size, such as Llama-3.2 90B Vision, Molmo 72B, and Qwen2.5-VL 72B [Details].
Microsoft announced Microsoft Dragon Copilot, an AI system for healthcare that can, among other things, listen to and create notes based on clinical visits [Details].
Northwestern Polytechnical University, China introduced DiffRhythm, the first latent diffusion-based song generation model capable of synthesizing complete songs with both vocal and accompaniment for durations of up to 4m45s in only ten seconds [Demo | Details].
Codeium has updated Windsurf editor: Previews, Linter integration, MCP discoverability, Suggested actions and more. Previews lets you iterate on your apps rapidly by easily sending elements and console errors back to Cascade as context [Details].
Captions showed a demo of Mirage, a new audio-to-video foundation model, that generates expressive AI humans with just an audio input without a reference image [Details].
Tencent released Hunyuan image to video model with the inference code and model weights [Details].
Convergence AI launched Template Hub, a centralised repository with workflow specific agents or 'templates' that can be deployed in one click [Details]
Google’s is adding ‘AI Mode’, a new Search mode powered by a custom version of Gemini 2.0. AI Mode is particularly helpful for questions that need further exploration, comparisons and reasoning that might have previously taken multiple searches and get a helpful AI-powered response with links to learn more [Details].
ChatGPT for macOS can now edit code directly in IDEs. Available to Plus, Pro, and Team users [Details].
LM Studio released SDK for Python and TypeScript to interact with LLMs, embeddings models, and agentic flows [Details].
Anthropic has updated Anthropic Console, a tool for developers to build, test, and iterate on AI deployment with Claude [Details].
Hume AI launched Expressive TTS Arena to evaluate how TTS systems handle nuanced, creative, and emotionally rich content and prompts [Details].
OpenAI announced NextGenAI, a consortium with 15 leading research institutions to advance research and education with AI [Details].
Stability AI has partnered with Arm to bring generative audio to mobile devices, enabling high-quality sound effects and audio sample generation directly on-device with no internet connection required [Details].
Anthropic’s Claude Code, an agentic coding tool is now available without wait list [Details].
Salesforce launches Agentforce 2dx, letting AI run autonomously across enterprise systems [Details].
🔦 Weekly Spotlight: Noteworthy Reads and Open-source Projects
Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic
DeepSeek Open Infra: projects from DeepSeek’s Open-Source Week
Data Science Agent in Colab: The future of data analysis with Gemini
ACU - Awesome Agents for Computer Use: A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.
AgentGPT: Assemble, configure, and deploy autonomous AI Agents in your browser.
🔍 🛠️ AI Toolbox: Product Picks of the Week
Chanel42: One timeline for all your AI video tools.
Composio MCP: Connect Cursor to100+fully managed MCP Servers with built-in auth.
PodGenie: The All-In-One podcast audio intelligence platform
VoicePanel: Use AI to collect feedback on your products.
Quadratic: AI spreadsheet with code and connections
Last week’s issue
Diffusion large language model, GPT‑4.5, 3.7 Sonnet, Wan2.1 open-source video model, Phi-4-multimodal, Proxy Lite, Omni-capable text and voice engine, Poe Apps and App Creator, FastRTC,Scribe and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.
Really appreciate this concise roundup of AI developments! The mention of Composio MCP caught my eye as I've been looking into MCP servers for my own projects. I found a comprehensive mcp server list on mcpdb.org that has been incredibly helpful for comparing options. Looking forward to your next issue - these weekly updates have become an essential part of my AI research routine!