Diffusion large language model, GPT‑4.5, 3.7 Sonnet, Wan2.1 open-source video model, Phi-4-multimodal, Proxy Lite, Omni-capable text and voice engine, Poe Apps and App Creator, FastRTC,Scribe and more
Diffusion large language model, GPT‑4.5, 3.7 Sonnet, Wan2.1 open-source video model, Phi-4-multimodal, Proxy Lite, Omni-capable text and voice engine, Poe Apps and App Creator, FastRTC, Scribe and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #95 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
From our sponsors:
Voiset.io: Say Goodbye to Burnout with AI Task Management
Modern life requests working in multitasking mode. You balance work, personal plans, and future goals, trying to keep everything under control. But relying on memory alone will let you down, leads to stress and missed opportunities.
A notebook might help, but you don’t always have it with you. Your phone, however, is always by your side. That’s where Voiset can help you.
With the power of an AI task manager, you can ASAP create tasks or notes on the go—just by pressing one button, as easily as sending a voice message. No typing, no distractions. Our system will determine date, content, performer all data.
Later, manage and organize everything comfortably from your desktop, with a clear overview of your workload.
Why Voiset?
Instant task creation – Create tasks on the go with a single tap, no need to type.
AI-powered planning – Smart scheduling based on your workload and availability.
Full control of your day – See your entire schedule at a glance and avoid burnout.
Seamless integration – Work from your phone or desktop, wherever it’s convenient.
Try for free today and experience stress-free productivity!
🗞️🗞️ AI Pulse: Weekly News at a Glance
Open AI released a research preview of GPT‑4.5, available to Pro users and developers worldwide. GPT‑4.5 is a step forward in scaling up pre-training and post-training. By scaling unsupervised learning, GPT‑4.5 improves its ability to recognize patterns, draw connections, and generate creative insights without reasoning [Details].
Anthropic released their first hybrid reasoning model Claude 3.7 Sonnet, which shows particularly strong improvements in coding and front-end web development. It reached #1 spot in WebDev Arena with a +100 score jump
over Claude 3.5 Sonnet. Along with the model, Claude Code, a command line tool for agentic coding, is available as a limited research preview [Details].
Inception introduced Mercury, the first commercial-scale diffusion large language model that is up to 10x faster and cheaper than current LLMs. A dLLM is a drop-in replacement for a typical autoregressive LLM, supporting all its use cases, including RAG, tool use, and agentic workflows. A code generation model, Mercury Coder, is available to test in a playground [Details].
Microsoft released Phi-4-multimodal and Phi-4-mini, the newest models in Microsoft’s Phi family of small language models (SLMs). Phi-4-multimodal is a 5.6B parameter model, that seamlessly integrates speech, vision, and text processing into a single, unified architecture [Details].
Alibaba’s Qwen team unveiled QwQ-Max-Preview, a reasoning model based on Qwen2.5-Max. This model is still for preview and is available "Thinking (QwQ)" as in Qwen Chat. Compared with Qwen2.5-Max, it is much smarter and has much more creatitvity. Qwen plans to open-weight both QwQ-Max and Qwen2.5-Max under the license of Apache 2.0 [Details].
Hume launched Octave (Omni-capable text and voice engine), an LLM for text-to-speech. It acts out characters, generates voices from prompts, and takes instructions to modify the emotion and style of a given utterance [Details]
Tongyi Lab of Alibaba Group released Wan2.1 series of open-source video foundation models. Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio. Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks [Details].
Quora’s Poe launched Poe Apps and App Creator, built on top of Claude 3.7 Sonnet, to make it easy to build visual interfaces on top of any combination of the existing models on Poe and custom logic expressed in Javascript [Details].
Convergence released Proxy Lite, a mini, open-weights, version of their Proxy assistant. WebVoyager results place Proxy Lite among the top performers in web automation tasks, while using just a fraction of the computational resources [Details].
IBM expands Granite model family with new Multi-Modal and reasoning AI built for the Enterprise. All Granite 3.2 models are available under the permissive Apache 2.0 license on Hugging Face [Details].
Hugging Face released FastRTC, the real-time communication library for Python. The library is designed to make it super easy to build real-time audio and video AI applications entirely in Python [Details].
Amazon introduced Alexa+, the next-generation of its voice assistant powered by generative AI. Alexa+ is more conversational, smarter, personalized and helps you get things done [Details].
Meta released Meta MLGym and MLGym-Bench, a new open-source framework and benchmark for evaluating and developing LLM agents on AI research tasks [Details].
ElevenLabs introduced Scribe, their first Speech to Text model that transcribes speech in 99 languages, featuring word-level timestamps, speaker diarization, and audio-event tagging [Details].
Google’s Veo 2 video model is available on Freepik and Fal. It surpassed OpenAI’s Sora and Kling 1.5 Pro as the new leader in Artificial Analysis Video Arena [Details].
Microsoft is rolling out free, unlimited access to Voice and Think Deeper (powered by OpenAI’s o1 model) to all Copilot users [Details].
Exa’s new search product, ‘Websets’ is now available. It retrieves over 20x more correct search results than Google on a benchmark of complex queries [Details].
Cloudflare announced agents-sdk, a new JavaScript framework for building AI agents and updates to Workers AI [Details].
‘Gemini Code Assist’, Google’s AI code assistant (Gemini 2.0 fine-tuned for coding) is available for free in popular IDEs, with up to 180K code completions/ month [Details].
OpenAI is rolling out a version of Advanced Voice powered by GPT-4o mini to all ChatGPT free users. Also, Open AI’s Deep Research is now available to all users on all paid plans on all platforms [Details].
Ideogram introduced Ideogram 2a, a new faster and more affordable text-to-image model optimized for graphic design and photography [Details].
LMArena introduced Prompt-to-leaderboard (P2L), a real-time LLM leaderboard tailored exactly to your use case. P2L trains an LLM to generate "prompt-specific" leaderboards, so you can input a prompt and get a leaderboard specifically for that prompt. The model is trained on the 2M human preference votes from Chatbot Arena [Details].
Ai2 introduced olmOCR, a high-performance toolkit designed to convert PDFs and document images into clean, structured plain text [Details].
Pika video generation platform launched Pika 2.2 model, with 10s generations, 1080p resolution, and Pikaframes. Pikaframes is an image-to-video feature that allows you to upload the first and the last frame as still images to generate a video [Details].
Replit released Replit Agent v2 in an early access program. Agent v2 is fundamentally more autonomous. At each step, it forms a hypothesis, searches for the right files, and only starts making changes when it has enough information to get the job done. Instead of getting caught in loops, it knows when to step back and rethink its approach [Details].
Luma AI adds audio to video in Dream Machine [Link].
Perplexity plans to launch Comet, a web browser for agentic search [Link]
1X Technologies introduced NEO Gamma home humanoids that includes improvements across NEO’s hardware and AI [Details]
🔦 Weekly Spotlight: Noteworthy Reads and Open-source Projects
How I use LLMs - new video by Andrej Karpathy
An extensive example prompt that can help you build applications using Cloudflare Workers and your preferred AI model.
Building an AI Resume Job Matching App With Firecrawl And Claude
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
How to Use Cursor Agent and Supabase to Maximize Productivity.
🔍 🛠️ AI Toolbox: Product Picks of the Week
AI Engineer Pack Volume 3 by ElevenLabs: The essential AI development toolkit. Get free credits and discounts to get started with the best AI tools and services.
Voiset: Let AI simplify the management of your plans. Boost productivity with an all-in-one platform featuring AI assistant and automated task planning.
OpenArt Consistent Characters: Create images of consistent characters from just one image or description. Pose, place, and combine them in any scene.
Mastra: The TypeScript Agent Framework.
Flora: All of the top text, image, and video AI models in one infinite canvas.
Last week’s issue
Multi-robot collaboration,Grok 3 , smallest video language model, Generative AI Model for Gameplay, AI co-scientist, Mistral Saba, Fiverr Go, Step-Video-T2V and Step-Audio, Pikaswaps & more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.
What a crazy busy week this one's been. Every time I thought I was mostly done with my "AI News" list, something else would pop right up. And it's still only Friday.