All-in-one model for video creation & editing, DeepWork in Proxy, Gemini Robotics, Gemma 3, Native image generation in Gemini 2.0 Flash, Reka Flash 3, Command A, Figma to Bolt, AgentExchange and more
All-in-one model for video creation & editing, DeepWork in Proxy, Gemini Robotics, Gemma 3, Native image generation in Gemini 2.0 Flash, Reka Flash 3, Command A, Figma to Bolt, AgentExchange and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #97 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
From our sponsors:
Get Rid of your SaaS Subscriptions
Stop paying the No-Code Tax to SaaS companies. Use a self hosted solopreneur operating system with preinstalled open source apps and AI agents and save $1k+ per month.
🗞️🗞️ AI Pulse: Weekly News at a Glance
Google:
DeepMind introduced two new AI models, based on Gemini 2.0, designed for robotics. Gemini Robotics, a vision-language-action model adds physical actions as a new modality, and Gemini Robotics-ER (short for ‘“embodied reasoning”) focusing especially on spatial reasoning. Gemini Robotics can solve multi-step tasks that require significant dexterity, such as folding origami packing a lunch box etc. They also accomplished tasks not seen in training, showing the ability to generalize to new scenarios
[Details].Released Gemma 3, an update to their Gemma family of open-weights models in sizes from 1B to 27B parameters. Gemma 3 outperforms Llama-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena’s leaderboard. It supports over 140 languages and offers advanced text and visual reasoning capabilities [Details].
Native image generation with Gemini 2.0 Flash is now available in the Google AI Studio and via Gemini API. Gemini 2.0 Flash was first introduced in December 2024 but the native image-generation capability wasn’t available for all users [Details].
Deep Research updated with Gemini 2.0 Flash Thinking Experimental and is now available on the free plan as well. Gems are also now available for everyone [Details].
Gemini with personalization, a new experimental capability in the Gemini app, powered by experimental Gemini 2.0 Flash Thinking mode. Gemini with personalization will be able to use your Google apps, starting with your Search history, to give contextually relevant responses [Details].
A new experimental Gemini Embedding text model (gemini-embedding-exp-03-07) available in the Gemini API. This new embedding model achieves the top rank on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard, and comes with new features like longer input token length [Details].
Google’s new AI button in Gmail automatically adds events to Google Calendar [Details].
You can now paste YouTube links directly in the Google AI Studio to use Gemini audio-video understanding [Details].
Convergence launched DeepWork in Proxy AI agent. Users simply specify desired outcomes & the toggle to activate - DeepWork then autonomously coordinates multiple AI agents to handle much more complex, multi-step workflows [Details].
Reka open sourced Reka Flash 3, a new reasoning model with 21B parameters that was trained from scratch and performs competitively with OpenAI o1-mini The model weights are available under Apache 2.0 license [Details].
Cohere introduced Command A, a new state-of-the-art open-weights 111 billion parameter model, optimized for enterprises that provides maximum performance across agentic tasks with minimal compute requirements [Details].
Alibaba’s Tongyi Lab introduced VACE (All-in-One Video Creation and Editing), an all-in-one model designed for video creation and editing. It encompasses various tasks, including reference-to-video generation (R2V), video-to-video editing (V2V), and masked video-to-video editing (MV2V), allowing users to compose these tasks freely [Details].
Ai2 released OLMo 2 32B, the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks [Details].
Sakana AI's ‘AI Scientist’ autonomously generated a research paper that passed peer review at a leading machine learning workshop without any human modifications. AI Scientist-v2 formulated the hypothesis, designed and conducted experiments, analyzed data, created visualizations, and wrote the entire manuscript, including formatting. Sakana AI claims this is the first fully AI-generated paper to meet the same peer-review standards as human scientists [Details].
Sesame AI the startup behind the viral AI voice demos has now released its base model Conversational Speech Model (CSM) 1B, under an Apache 2.0 license [Details |Demo].
Move AI introduced Gen 2 spatial motion models for 3D motion understanding [Details].
Open AI launched a new set of APIs and tools to simplify the development of agentic applications. This includes a new Agents SDK, integrated observability tools, new Responses API, combining the simplicity of the Chat Completions API with the tool use capabilities of the Assistants API for building agents and built-in tools including web search, file search, and computer use [Details].
OpenAI calls DeepSeek ‘state-controlled,’ calls for bans on ‘PRC-produced’ models [Details].
Bolt launched Figma to Bolt to turn any Figma design into a pixel-perfect full stack app. Simply select a frame and put bolt.new in front of the Figma URL to start building the app [Details].
Hunyan introduced Hunyuan-TurboS – the first ultra-large Hybrid-Transformer-Mamba MoE model. It combines Mamba's efficient long-sequence processing
with Transformer's strong contextual understanding [Details].
LangChain introduced Agent Chat UI, an OSS web app for interacting with any LangGraph app via a chat interface [Details].
Salesforce launched AgentExchange, a marketplace and community for Agentforce [Details].
Adobe's new AI feature lets you edit stock images on the fly [Details].
Luma Labs introduced a new method, Inductive Moment Matching (IMM), a pre-training technique that not only delivers superior sample quality compared to diffusion models but also offers over a tenfold increase in sampling efficiency. Luma has released the code and checkpoints [Details].
🔦 Weekly Spotlight: Noteworthy Reads and Open-source Projects
Everyone in AI is talking about Manus. We put it to the test - MIT Technology Review
Top 100 Gen AI Consumer Apps by a16z
Detecting misbehavior in frontier reasoning models - Open AI
upsonicAl: a reliability-focused framework designed for real-world applications that supports MCP
AgentKit: a starter kit developed by BCG X to build Agent apps with Nextjs, FastAPI and Langchain
Here’s how I use LLMs to help me write code - Simon Willison
🔍 🛠️ AI Toolbox: Product Picks of the Week
Mirage by Captions: Generate UGC-style ads with prompts.
Duck.ai by DuckDuckGo: Anonymous access to popular AI models, including GPT-4o mini, Claude 3, and open-source Llama 3.3 and Mistral Small 3.
AlfredOS*: a self hosted solopreneur operating system with preinstalled open source apps and AI agents.
Memex: The AI builder for your desktop.
Fleur: a macOS Desktop App that allows you to install MCP (Model Context Protocol) servers to Claude without any technical expertise.
Last week’s issue
Manus AI, Grounded language model, Tavus' Conversational Video Interface, Jamba 1.6, QwQ 32B, Mistral OCR, Character-3, audio-to-video model, Sesame's voice model, Aya Vision, Browser Operator & more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.