Llama 4, Nova Sonic and Nova Reel 1.1, Cogito v1, HiDream-I1, Agent2Agent (A2A) Protocol, Deep Research for arXiv, fully Open-Source 14B Coder at o1 Level, AutoRAG, MCP security issues, and more
Llama 4, Nova Sonic and Nova Reel 1.1, Cogito v1, HiDream-I1, Agent2Agent (A2A) Protocol, Deep Research for arXiv, fully Open-Source 14B Coder at o1 Level, AutoRAG, MCP security issues and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #99 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News at a Glance
Google Cloud Next 25:
Agent2Agent (A2A) Protocol: an open protocol that provides a standard way for agents to collaborate with each other, regardless of the underlying framework or vendor. A2A is built with support and contributions from more than 50 technology partners [Details].
Firebase Studio: a cloud-based, agentic development environment powered by Gemini to create as well as publish AI apps quickly. Available to everyone in preview [Details].
Deep Research in Gemini is now powered by Gemini 2.5 Pro, available to Gemini Advanced users [Details].
Ironwood: the seventh-generation Tensor Processing Unit (TPU) designed specifically for inference [Details].
Agent Development Kit (ADK): a new open-source framework for building agents and multi-agent systems while maintaining precise control over agent behavior. Examples are available in the Agent Garden.
Agent Engine: a fully managed runtime in Vertex AI to depoy custom agents to production with built-in testing.
New Al capabilities in Workspace including Help me analyze in Google Sheets, Audio overviews in Google Docs and Google Workspace Flows to automate repetitive tasks [Details].
Updates to Google Agentspace, the enterprise platform that connects work apps to multimodal search and AI agents [Details].
Veo 2 is available today in the Gemini API in Google AI Studio ($0.35 per second of video generated). The Live API for Gemini models is now in Preview with 30 new languages and 2 new voices [Details].
New video, image, speech and music generative AI tools are coming to Vertex AI [Details]
Meta AI unveiled Llama 4 family of models which include Scout, Maverick and Behemoth. Llama 4 Scout is a 17 billion active parameter model with 16 experts (context window of 10M), Llama 4 Maverick is a 17 billion active parameter model with 128 experts. Llama 4 Behemoth, a 288 billion active parameter model with 16 experts is the most powerful model in the family that outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still in training [Details].
Amazon introduced Amazon Nova Reel 1.1 model that generates multi-shot videos up to 2-minutes in length with consistent style across shots. You can either provide a single prompt for up to a 2-minute video composed of 6-second shots, or design each shot individually with custom prompts [Details].
Deep Cogito released Cogito v1 Preview family of models in sizes 3B, 8B, 14B, 32B and 70B under open license. Deep Cogito claims that each model outperforms the best available open models of the same size, including counterparts from LLaMA, DeepSeek, and Qwen, across most standard benchmarks. In particular, the 70B model also outperforms the newly released Llama 4 109B MoE model. The LLMs are trained using Iterated Distillation and Amplification (IDA) - a scalable and efficient alignment strategy for general superintelligence using iterative self-improvement. Each model can function in a standard mode as well as a reasoning mode [Details].
VivagoAI has open-sourced their HiDream-I1 family of models under the MIT License. It comes in three variants: Full, Dev, and Fast. HiDream-I1-Dev is the new leading open-weights image generation model, overtaking FLUX1.1 [pro] in the Artificial Analysis Image Arena [Details]
Amazon unveiled Amazon Nova Sonic, a new speech-to-speech model that unifies speech understanding and generation into a single model. It picks up on tone, inflection, and pacing, for a deeper understanding of human conversation [Details].
Microsoft added several new features in Copilot including podcast generation, deep research, memory, action, shopping tools and others [Details].
OpenAI is rolling out an update to the ChatGPT’s memory feature. It can now reference all of your past chats to provide more personalized responses [Details].
Nvdia released Llama-3.1-Nemotron-Ultra-253B-v1, an open weights reasoning model which is a derivative of Meta Llama-3.1-405B-Instruct. It beats Llama 4 Behemouth, Maverick and is competitive with DeepSeek R1 [Details].
Together AI and Agentica Project jointly released DeepCoder-14B, a fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math [Details].
Moonshot AI released Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities, all while activating only 2.8Bparameters in its language decoder (Kimi-VL-A3B) [Details].
Cloudflare launched AutoRAG, a fully managed Retrieval-Augmented Generation (RAG) pipeline designed to easily integrate context-aware AI into applications [Details].
OpenAI launched Evals API to programmatically define tests, automate evaluation runs, and quickly iterate on prompts [Details].
Grok 3 API is now available [Details].
OpenAI open-sourced BrowseComp (“Browsing Competition”), a new benchmark of 1,266 challenging problems [Details].
Agent mode is rolling out to all VS Code users and supports MCP [Details].
Cloudflare launched a new toolkit for AI agents with new Agents SDK support for MCP (Model Context Protocol) clients, authentication/authorization/hibernation for MCP servers, and Durable Objects free tier [Details].
Supabase launched the official Supabase MCP server which connects AI tools like Cursor or Claude to Supabase so that they can perform tasks like launching databases, managing tables, fetching config, and querying data on your behalf [Details].
Midjourney’s new image model v7 Alpha is available now. It is the first model to have model personalization turned on by default [Details].
Anthropic has introduced a new Max plan for Claude. In addition to more usage, Max plan users will also have priority access to the newest features and models [Details].
🔦 Weekly Spotlight: Noteworthy Reads and Open-source Projects
Model Context Protocol has prompt injection security problems [Details].
FantasyTalking - Realistic Talking Portrait Generation via Coherent Motion Synthesis - By Alibaba Group [Details]
One-Minute Video Generation with Test-Time Training [Details].
OmniSVG: A Unified Scalable Vector Graphics Generation Model [Details].
Image to 3D with TripoSG [Details].
5 ways to use Gemini Live with camera and screen sharing [Details].
AI Agents for Beginners - A Course by Microsoft [Details].
The official ElevenLabs MCP server [Details].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Genspark Super Agent: An AI assistant that can autonomously think, plan, act, and use tools to handle tasks.
Deep Research for arXiv: Ask questions like 'What are the latest breakthroughs in RL fine-tuning?' and get comprehensive literature reviews with trending papers automatically included.
Higgsfield Motion Controls: New text-to-video model with 50+ preset motion controls for complex camera movements.
VivagoAI: AI-powered platform for professional-grade creative visual design
Last issue:
Gemini 2.5 Pro, Qwen2.5-Omni, GPT-4o with native image generation, Reve Image 1.0, Anthropic's AI microscope, first real-time speech-to-speech VSM, Ideogram 3.0 and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.