Open-Weight alternative to GPT-4o Realtime, Athene-V2, Stripe Agent Toolkit, Qwen2.5-Coder-32B, Prompt Canvas and Promptim, Vidu-1.5, MagicQuill, OpenCoder and more
Open-Weight alternative to GPT-4o Realtime, Athene-V2, Stripe Agent Toolkit, Qwen2.5-Coder-32B, Prompt Canvas and Promptim, Vidu-1.5, MagicQuill, OpenCoder and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #84 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Alibaba Cloud released Qwen2.5-Coder-32B, an open-source model for programming tasks that matches the coding capabilities of GPT-4o. In addition to this flagship model, four new models have been released, expanding the Qwen2.5-Coder family to a total of six models, ranging in sizes from 0.5B to 32B. An Artifacts app, similar to the Claude Artifacts, has also been launched [Details | Demo].
Fixie AI released Ultravox v0.4.1, a family of multi-modal, open-source models trained specifically for enabling real-time conversations with AI. Ultravox does not rely on a separate automatic speech recognition (ASR) stage, but consumes speech directly in the form of embeddings. The latency performance is comparable to the OpenAI Realtime . Fixie also released Ultravox Realtime, a managed service to integrate real time AI voice conversations into applications [Details].
Google introduced a new model Gemini (Exp 1114), available now in Google AI Studio. It has climbed to joint #1 overall on the Chatbot Arena leaderboard, following over 6K+ community votes in the past week. It matches the performance of 4o-latest while surpassing o1-preview and is #1 on Vision leaderboard [Details].
Nexusflow released Athene-V2, an open source 72B model suite, fine-tuned from Qwen 2.5 72B. It includes Athene-V2-Chat matching GPT-4o across multiple benchmark and Athene-V2-Agent, a specialized agent model surpassing GPT-4o in function calling and agent applications [Details].
Vidu launched Vidu-1.5, a multimodal model with multi-entity consistency. Vidu-1.5 can seamlessly integrate people, objects, and environments to generate a video [Link].
Codeium launched Windsurf Editor, an agentic IDE. It introduces ‘Flow’ a collaborative agent that combines the collaborative nature of copilots with the ability to be independently powerful like an agent [Details].
Researchers introduced MagicQuill, an intelligent interactive image editing system. It uses a multimodal large language model to anticipate editing intentions in real time, removing the need for explicit prompts [Details |Demo].
DeepSeek released JanusFlow, an open-source unified multimodal model that excels at both image understanding & generation in a single model. It matches or outperforms specialized models in their respective domains and significantly surpasses existing unified models on standard benchmarks [Details| Demo].
Google DeepMind has open-sourced AlphaFold 3 for academic use. It models interactions between proteins, DNA, RNA, and small molecules. This is vital for drug discovery and disease treatment [Details].
Epoch AI launched FrontierMath, a benchmark for advanced mathematical reasoning in AI. Developed with over 60 top mathematicians, it includes hundreds of challenging problems, of which AI systems currently solve less than 2% [Details].
TikTok launched Symphony Creative Studio, an AI-powered video-generation tool for Business users. Users can turn product information or a URL into a video, add a digital avatar to narrate the video script, or localize any existing videos into new languages using translation and dubbing capabilities [Details].
Nous Research introduced the Forge Reasoning API Beta. It lets you take any model and superpower it with a code interpreter and advanced reasoning capabilities. Hermes 70B x Forge is competitive with much larger models from Google, OpenAI and Anthropic in reasoning benchmarks [Details].
Anthropic added a new prompt improver to the Anthropic Console. Take an existing prompt and Claude will automatically refine it with prompt engineering techniques like chain-of-thought reasoning [Details].
Nvidia present Add-it, a training-free method for adding objects to images based on text prompts. Add-it works well on real and generated images. It leverages an existing text-to-image model (FLUX.1-dev) without requiring additional training [Details].
Microsoft released TinyTroupe, an experimental Python library for simulation of people with specific personalities, interests, and goals. These artificial agents - TinyPersons - can listen to us and one another, reply back, and go about their lives in simulated TinyWorld environments. This is achieved by leveraging the power of Large Language Models (LLMs), notably GPT-4, to generate realistic simulated behavior [Details].
Johns Hopkins researchers trained a surgical robot by having it watch videos of skilled surgeons. Using imitation learning, the robot learned complex tasks like suturing and tissue handling, ultimately performing with skill comparable to human doctors [Details[.
Stripe launched a SDK built for AI agents - LLMs can call payment, billing, issuing, etc APIs. It natively supports Vercel’s AI SDK, LangChain, and CrewAI, and works with any LLM provider that supports function calling [Details].
Researchers released OpenCoder, completely open-source and reproducible code LLM family which includes 1.5B and 8B base and chat models. Starting from scratch, OpenCoder is trained on 2.5 trillion tokens and built on the transparent data process pipeline and reproducible dataset. It achieves top-tier performance on multiple code LLM evaluation benchmarks [Details[.
Alibaba launched Accio, an AI search engine for small businesses to find wholesale products alongside the analysis on their popularity with consumers and projected profit. Accio is powered by Alibaba’s Tongyi Qianwen large language model [Details].
Anthropic released RapidResponseBench, a benchmark that evaluates how well LLM defenses can adapt to and handle different jailbreak strategies after seeing just a few examples [GitHub| Paper].
LangChain launched Prompt Canvas, an interactive tool designed to simplify prompt creation. Prompt Canvas, the UX inspired from ChatGPT’s Canvas, lets you collaborate with an LLM agent to iteratively build and refine your prompts [Details].
LangChain released Promptim, an experimental open-source library for prompt optimization. Promptim automates the process of improving prompts on specific tasks. You provide initial prompt, a dataset, and custom evaluators (and optional human feedback), and promptim runs an optimization loop to produce a refined prompt that aims to outperform the original [Details].
Apple’s Final Cut Pro 11 with AI-powered features now available [Details].
ChatGPT app for Mac is now able to integrate with coding apps like Xcode, VS Code, TextEdit, and Terminal [Details].
🔦 Weekly Spotlight
Stable Diffusion 3.5 Prompt Guide by Stability AI [Link].
Cline: an open-source AI assistant that can use your CLI and editor [Link].
Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity - Lex Fridman Podcast [Link].
StackBlitz achieves $4M ARR in 4 weeks for their AI web development platform, Bolt (bolt.new) with Claude [Link].
Understanding OpenAI Swarm: A Framework for Multi-Agent Systems [Link].
We Studied 200,000 AI Overviews: Here's What We Learned - by Semrush [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Napkin: Transforms your existing text content into visuals like diagrams, charts, scenes, and image.
CapCut Commerce Pro: Shoppable video creation and batch product image generation for e-commerce
Output Media API by Recall.ai: API to build interactive AI agents that talk in meetings
Writer: Build generative AI into any business process with the secure enterprise platform
Last week’s issue
Thanks for reading and have a nice weekend! 🎉 Mariam.