Mistral Small 3, Open Music Foundation Models, Qwen2.5-Max and VL, FUZZ, Open-R1, Hailuo Director mode, Tülu 3 405B, Postman AI Agent Builder, Goose, LlamaReport, open-source operator, Codev, and more
Mistral Small 3, Open Music Foundation Models, Qwen2.5-Max and VL, FUZZ, Open-R1, Hailuo Director mode, Tülu 3 405B, Postman AI Agent Builder, Goose, LlamaReport, open-source operator, Codev, and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #91 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Alibaba’s Qwen:
Qwen2.5-Max model: a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. It is available via API and in Qwen Chat [Details].
Qwen2.5-VL: a new flagship vision-language model, released in 3 sizes, including 3B, 7B, and 72B. Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. It can comprehend videos of over 1 hour, and capture event by pinpointing the relevant video segments. It can localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. Qwen2.5-VL supports structured outputs [Details].
Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Open-ource Qwen models to handle 1M-token contexts. Also, the team fully open-sourced their inference framework based on vLLM which can process 1M-token inputs 3x to 7x faster [Details].
Mistral Small 3, a latency-optimized 24B-parameter model released under the Apache 2.0 license. It’s competitive with larger models such as Llama 3.3 70B or Qwen 32B, and is on par with Llama 3.3 70B instruct, while being more than 3x faster on the same hardware. Mistral AI released both a pretrained and instruction-tuned checkpoint under Apache 2.0 [Details].
Ai2 released Tülu 3 405B — the first application of fully open post-training recipes to the largest open-weight models. Tülu3 is an instruction following model family, offering fully open-source data, code, and recipes. Tülu 3 405B achieves competitive or superior performance to both Deepseek v3 and GPT-4o, while surpassing prior open-weight post-trained models of the same size including Llama 3.1 405B Instruct and Nous Hermes 3 405B on many standard benchmarks [Details].
M-A-P released YuE, a series of open-source foundation models for music generation, specifically for transforming lyrics into full songs (lyrics2song). It can generate a complete song, lasting several minutes, that includes both a catchy vocal track and accompaniment track. YuE is capable of modeling diverse genres/languages/vocal techniques [Details ].
Riffusion launched a free web app powered by their new music generation model, FUZZ that can generate complete songs from text or audio clips. It learns to generate music with your unique aesthetic over time [Link].
Kimi 1.5 is now available completely free with unlimited usage via Kimi.ai. Last week MoonshotAI introduced Kimi k1.5, an o1-level multi-modal model trained with reinforcement learning (RL) [Paper].
Proxy, Europe’s answer to OpenAI’s Operator, launched globally with basic access for free [Link].
DeepSeek released Janus-Pro-7B, an open-source multimodal LLM capable of visual understanding and image generation [Details]
Microsoft makes OpenAI’s o1 reasoning model (Think Deeper mode) free for all Copilot users [Details].
Hailuo AI (MiniMax) made a new model available for text-to-video: Hailuo T2V-01-Director that lets you command camera movements with natural language or simple commands [Link].
OpenAI’s Canvas now works with OpenAI o1 and it can render HTML & React code [Link].
Nous Research announced Nous Psyche built on Solana blockchain
- a cooperative training network for generative AI. Psyche coordinates heterogeneous hardware to join a run and train open-source models [Details].
Block, by Twitter founder Jack Dorsey launched Goose, an on-machine, open source AI agent built to automate your tasks [Details].
Google is testing a new “Ask for Me” feature that uses AI to call local businesses on your behalf, for information about availability and pricing [Details].
Postman, the API platform for building and using APIs launched ‘Postman AI Agent Builder’, a suite of tools for AI agent development and testing [Details].
Pika 2.1 video generation model is available now with realistic physics simulation, more precise control over character and object movements, full HD resolution, new animation styles and more [Link].
Hugging Face has integrated serverless inference providers - fal, Replicate, Sambanova, and Together AI - into their platform, enabling users to perform serverless inference directly from model pages and client SDKs [Details].
Hugging Face launched Open-R1 project, an initiative to systematically reconstruct DeepSeek-R1’s data and training pipeline [Details].
LlamaIndex announced the beta release of LlamaReport, a new report generation tool that can create complex, multi-section reports that adhere to any template you specify, from your documents [Details].
U.S. Copyright Office says AI generated content can be copyrighted — if a human contributes to or edits it [Details].
Luma Labs added a new Upscale to 4K feature to enhance a DreamMachine video up to 4K resolution [Link].
OpenAI announced ChatGPT Gov, a new tailored version of ChatGPT to provide U.S. government agencies with an additional way to access OpenAI’s frontier models with their own security, privacy, and compliance requirements [Details].
DeepSeek R1 is now available in the model catalog on Azure AI Foundry and GitHub [Details].
The mobile app for DeepSeek skyrocketed to the No. 1 spot in app stores around the globe topping ChatGPT. On iOS, DeepSeek is currently the No. 1 free app in the U.S. App Store and 51 other countries [Details].
Let’s keep this going!
Putting this newsletter together takes significant time and energy. To keep it sustainable, I’m exploring paid subscriptions. Would love to hear what works for you!
🔦 Weekly Spotlight
Interview with Deepseek Founder, Liang Wenfeng [Link].
AI Web Operator: An open-source version of operator. Uses Browserbase and Vercel AI SDK + vision via claude [Link].
DeepSeek FAQ by Ben Thompson [Link].
Ollama Deep Researcher: a fully local web research and report writing assistant that uses any LLM hosted by Ollama [Details].
Exa & Deepseek Chat App: a free and open-source chat app that uses Exa's API for web search and Deepseek R1 LLM for reasoning [Link].
Stagehand: An open-source project to build browser automations. It is fully compatible with Playwright, offering three simple AI APIs (
act
,extract
, andobserve
) on top of the base PlaywrightPage
class that provide the building blocks for web automation via natural language [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Codev: an AI-powered platform that converts text descriptions into full-stack Next.js web applications
Proxy: AI-powered digital assistant that explores the web and executes tasks through simple conversation.
Qwen Chat: an AI chat assistant from Qwen. It supports Artifacts, web searches image and video generation along with documents upload feature. It also features the Qwen2.5-Turbo model, which supports long-context processing with a context length of up to 1M tokens.
Nim: Create images and videos with the latest models, templates, and inspiration feed. Kling Pro, Minimax, Character reference, Hunyuan, and 15+ other models and tools
Browser Use Cloud: Enable AI to control your browser.
Last week’s issue
Thanks for reading and have a nice weekend! 🎉 Mariam.