Claude Integrations, Qwen3, Chai by Langbase, agentic commerce,Phi-4 Reasoning, LlamaFirewall, Kimi-Audio, Gen-4 References, DeepWiki by Cognition, F Lite, Dia, Suno v4.5 ,Xiaomi MiMo-7B and more
Claude Integrations, Qwen3, Chai by Langbase, agentic commerce,Phi-4-Reasoning, LlamaFirewall, Kimi-Audio, Gen-4 References, DeepWiki by Cognition, F Lite, Dia, Suno v4.5 ,Xiaomi MiMo-7B and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue # 101):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News at a Glance
Anthropic announced integrations, a new way to connect apps and tools to Claude. Until now, support for Model Context Protocol (MCP) was limited to Claude Desktop via local servers. Integrations allow Claude to work seamlessly with remote MCP servers across the web and desktop apps. Initially, it supports 10 popular services including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid. Claude's Research tool has also been updated with an advanced mode that searches the web, your Google Workspace, and now your Integrations too. Claude can research for up to 45 minutes before delivering a comprehensive report, complete with citations [Details].
Alibaba released Qwen3 family of large language models. This includes two open-weights MoE models: Qwen3-235B-A22B, a large model with 235 billion total parameters and 22 billion activated parameters, and Qwen3-30B-A3B, a smaller MoE model with 30 billion total parameters and 3 billion activated parameters. Additionally, six dense models are also open-weighted, including Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B, under Apache 2.0 license. The flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro [Details].
Microsoft released Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning open-weights small language models. Despite their significantly smaller size (14-billion parameter), both Phi-4-reasoning and Phi-4-reasoning-plus achieve better performance than OpenAI o1-mini and DeepSeek-R1-Distill-Llama-70B at most benchmarks, including mathematical reasoning and Ph.D. level science questions. Phi-4-Mini-Reasoning is a compact 3.8B parameters reasoning model which outperforms other reasoning models nearly twice its size, such as DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B [Details].
Visa is partnering with Microsoft, OpenAI & Anthropic to launch Visa Intelligent Commerce, a platform that will enable AI agents to shop and make purchases on behalf of consumers, based on preselected preferences. Mastercard is also rolling out Agent Pay and PayPal has announced its own entry into agentic commerce [Details].
A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals [Details].
MoonshotAI released Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation. It achieves SOTA results on numerous audio benchmarks. Kimi-Audio can handle diverse tasks like speech recognition (ASR), audio question answering (AQA), audio captioning (AAC), speech emotion recognition (SER), sound event/scene classification (SEC/ASC), and end-to-end speech conversation [Details].
Xiaomi released MiMo-7B, a series of models trained from scratch and optimized for reasoning tasks. MiMo-7B-RL demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1-mini [Details].
Freepik and Fal released F Lite, a 10B parameter diffusion model, trained exclusively on copyright-safe and SFW content. The model was trained on Freepik's internal dataset comprising approximately 80 million copyright-safe images, making it the first publicly available model of this scale trained exclusively on legally compliant and SFW content [Details].
DeepSeek released DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3 [Details].
Runway launched Gen-4 References available to paid users - allows you to visualise and create consistent characters and locations, new shots and angles and more [Details].
Meta:
Meta AI app launched: standalone iOS and Android application powered by Llama 4, offering a personalized AI assistant with natural voice interaction and a Discover feed for exploring shared AI content [Details].
new open-source Llama protection tools released: LlamaFirewall, a security guardrail tool to help build secure AI systems, new Llama Guard 4 and Llama Prompt Guard 2, an update to the Llama Prompt Guard classifier model, improves on its performance in jailbreak and prompt injection detection [Details].
Llama API: an upcoming developer platform for Llama application development, which is available as a limited free preview. Llama API provides easy one-click API key creation and interactive playgrounds to explore different Llama models [Details].
OpenAI is updating ChatGPT search, its web search tool in ChatGPT with shopping features [Details].
Suno v4.5 released for Pro & Premier subscribers: a wider range of genres, richer vocals, & enhanced prompt understanding for songs. It can now create up to 8 minutes of song without using Extend [Details].
Lovable 2.0 released with major updates including new Chat Mode Agent, security scan, visual and Dev Mode editing, custom domains and more [Details].
Nari labs released Dia, a 1.6 billion parameter open-source TTS model released under the Apache 2.0 license. Dia is designed to generate ultra-realistic dialogue from text transcripts. You can condition the output on audio, enabling emotion and tone control [Details].
JetBrains open sources Mellum, its 4 billion parameters model trained from scratch to power cloud-based code completion in JetBrains IDEs. Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, Mellum-4b-base is tailored specifically for code completion [Details].
NotebookLM Audio Overviews are now available in over 50 languages [Details].
OpenAI is rolling out a lightweight version of deep research which is powered by a version of OpenAI o4-mini to all free users [Details].
Kyutai released Helium 1, a 2B parameters language open-source model trained on the 24 official languages of the European Union. It achieves state-of-the-art performance among models of similar scale when evaluated across a diverse set of tasks in European languages [Details].
luma AI has made available Ray2 Camera Concepts API, giving developers access to camera controls for cinematic framing and dynamic movement [Details].
Researchers from the University of Zurich deployed a slew of AI bots posing as real people and engaging with users without their knowledge or consent to try to change minds on the popular Reddit forum r/changemyview, where posts often ask users to challenge their views on contentious topics. Reddit is considering legal action; the researchers say they will no longer publish their study [Details].
🔦 Weekly Spotlight: Noteworthy Reads and Open-source Projects
The 9 best vibe coding tools developers are using to improve their workflows
The Urgency of Interpretability by Dario Amodei, CEO of Anthropic
Claude Code: Best practices for agentic coding - by Anthropic
How to use Gradio to build an MCP Server in 5 lines of Python
Securing America's Compute Advantage: Anthropic’s Position on the Diffusion Rule
AgenticSeek: A local alternative to Manus AI, this voice-enabled AI assistant autonomously browses the web, writes code, and plans tasks while keeping all data on your device.
🔍 🛠️ AI Toolbox: Product Picks of the Week
Chai by Langbase: Vibe code any AI agent. Chai turns prompts into prod-ready agents.
DeepWiki by Cognition: Deep Research for GitHub – powered by Devin.
Simular for macOS: macOS-native agent, performing digital actions
on your behalfGenspark AI Slides: a full agentic tool that makes creating slides fast and simple.
Iconic Scenes by Higgsfield: step inside legendary movie moments with just a selfie and become the main character
Previous issue:
o3 & o4-mini, Bytedance's Seaweed & UI-TARS-1.5, GPT‑4.1, Gemini 2.5 Flash, first open-source native 1-bit LLM, DataDecide, Convex Chef, Grok Studio, Kling 2.0 Master & Kolors 2.0, Codex CLI and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.