o3 & o4-mini, Bytedance's Seaweed & UI-TARS-1.5, GPT‑4.1, Gemini 2.5 Flash, first open-source native 1-bit LLM, DataDecide, Convex Chef, Grok Studio, Kling 2.0 Master & Kolors 2.0, Codex CLI and more
o3 & o4-mini, Bytedance's Seaweed & UI-TARS-1.5, GPT‑4.1, Gemini 2.5 Flash, first open-source native 1-bit LLM, DataDecide, Convex Chef, Grok Studio, Kling 2.0 Master & Kolors 2.0, Codex CLI and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #100):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News at a Glance
OpenAI announced:
o3 and o4-mini: the smartest in the o-series of models trained to think for longer before responding. These models can agentically use and combine tools within ChatGPT, which includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and generating images [Details].
a new series of GPT models in the API: GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano that outperform GPT‑4o and GPT‑4o mini across the board, with major gains in coding and instruction following. They also have larger context windows with up to 1 million tokens of context. OpenAI will begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency [Details].
Codex CLI: an open-source lightweight coding agent that runs in your terminal [Details].
Google is rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI. Gemini 2.5 Flash is a fully hybrid reasoning model, giving developers the ability to turn thinking on or off. The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost, and latency [Details]
Bytedance introduced UI-TARS-1.5, a multimodal agent built upon a powerful vision-language model. It is capable of effectively performing diverse tasks within virtual worlds and beats OpenAI Operator and Claude 3.7 on GUI Agent and Game Agent tasks. A small-size version model UI-TARS-1.5-7B, which is trained from Qwen2.5-VL-7B has been open-sourced along with UI-TARS-desktop app [Details].
Microsoft released BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Its performance comparable to leading open-weight, full-precision models of similar size, while being more computational efficient (memory, energy, latency) [Details].
Bytedance introduced Seaweed, a new foundational model for video generation trained using compute equivalent to 1,000 H100 GPUs. It supports natively generating a single shot lasting 20 seconds without any extension technique. With the extension, it can generate videos up to a minute long. Seaweed is also capable of generating both audio and video together [Details].
Anthropic added a new ‘Research’ feature in Claude that enables Claude to search across both your internal work context and the web. Claude now integrates with Gmail and Calendar, in addition to Google Docs [Details].
Kling AI introduced major updates to its video and image models, KLING 2.0 Master and KOLORS 2.0. The Multi-Elements Editor for KLING 1.6 is also now launched: lets you upload a 1-5s video that you can easily add/swap/delete video content based on text and image inputs [Details].
Bolt has opened registrations for its ‘Worlds Largest Hackathon’ [Details].
Ai2 released DataDecide, the most extensive open suite of models pretrained on 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, over 14 different model sizes ranging from 4M parameters up to 1B parameters (more than 30k model checkpoints in total) [Details].
Together AI introduced Open Deep Research, an open-source LLM workflow that can answer complex questions that require multi-hop reasoning and generate research reports [Details].
Microsoft introduced a ‘computer use’ feature in Copilot Studio, enabling AI agents to autonomously interact with websites and desktop applications [Details]
Google’s Veo 2 video model is available to Gemini Advanced users, as well as in the Google AI Studio and Whisk Animate [Details].
Cohere released Embed 4, a multimodal embedding model capable of accurately and quickly searching complex multimodal business materials. Embed 4 is multilingual across 100+ languages [Details].
Grok has gained a canvas-like feature for editing and creating documents and basic apps, called Grok Studio, available for both free and paying users on Grok.com [Details].
Cohere is now a supported Inference Provider on Hugging Face Hub, making it the first model creator to share and serve their models directly on the Hub [Details].
Google, in collaboration with researchers at Georgia Tech and Wild Dolphin Project (WDP), introduced DolphinGemma, a foundational AI model trained to learn the structure of dolphin vocalizations and generate novel dolphin-like sound sequences [Details].
Hugging Face is taking a big leap into robotics with the acquisition of humanoid robotics startup Pollen Robotics [Details].
🔦 Weekly Spotlight: Noteworthy Reads and Open-source Projects
A practical guide to building agents - by OpenAI [Details].
OpenAI’s Sam Altman Talks ChatGPT, AI Agents and Superintelligence — Live at TED2025 [video].
Memory in Agents: What, Why and How [Details].
Vibe Check: o3 Is Here—And It’s Great - by Dan Shipper [Details]
🔍 🛠️ AI Toolbox: Product Picks of the Week
Convex Chef : AI agent that builds web apps: multiplayer games, social messaging platforms, and AI-powered agentic apps.
WorkflowAI: an open-source platform for product and engineering teams to collaborate to build and iterate on AI features. Test and compare 80+ leading AI models side-by-side and evaluate quality, cost, and speed.
Arcads: transforms text into emotionally resonant video ads
Operative.sh: vibe-test your web app. Automated UX testing powered by intelligent browser agents that understand your application.
n8nChat: Create, edit, debug, and optimize your n8n workflows in seconds.
Last week’s issue
Llama 4, Nova Sonic and Nova Reel 1.1, Cogito v1, HiDream-I1, Agent2Agent (A2A) Protocol, Deep Research for arXiv, fully Open-Source 14B Coder at o1 Level, AutoRAG, MCP security issues, and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.