Self-Adaptive LLMs, MatterGen, ChatGPT Reminders,MiniMax-01 with 4M tokens, Tarsier2 by ByteDance, Ray2, Vidu 2.0, Ambient Agents and Agent Inbox, FLUX Pro Finetuning API, Codestral 25.01 and more

Jan 17, 2025

Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.

In today’s issue (Issue #89 ):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week

Sponsor AI Brews

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

Sakana AI presents Transformer² (‘Transformer-squared’), a machine learning system that dynamically adjusts its weights for various tasks. The name Transformer² reflects its two-step process: first, the model analyzes the incoming task to understand its requirements, and then it applies task-specific adaptations to generate optimal results. Transformer² offers a glimpse into a future where AI systems are no longer static entities trained for fixed tasks. Instead, they will embody “living intelligence”, models that continually learn, evolve and adapt over time. [Details].
Microsoft introduced MatterGen, a diffusion model to generate novel and stable materials. It generates novel materials given prompts of the design requirements for an application. It can generate materials with desired chemistry, mechanical, electronic, or magnetic properties, as well as combinations of different constraint [Details].
MiniMax released a new MiniMax-01 series of open-source models, with context length of upto 4 million tokens, which includes two models: the foundational language model MiniMax-Text-01 and the visual multi-modal model MiniMax-VL-01. MiniMax-Text-01 has a new hybrid architecture integrating Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) with 456 billion total parameters( 45.9 billion activated per token). In addition to offering APIs, the models can be accessed via Hailuo AI [Details].
Mistral released a new coding model Codestral 25.01, which features a more efficient architecture and an improved tokenizer than the original, generating and completing code about 2 times faster. It’s no.1 on the Copilot Arena leaderboard along with DeepSeek v2.5 and Claude 3.5 Sonnet. You can try it for free in Continue for VS Code or JetBrains [Details].
OpenAI is rolling out a new beta feature in ChatGPT called Tasks that lets users schedule future actions and reminders [Details].
GitHub’s AI coding agent ‘Copilot Workspace’ is now available without waitlist [Details].
Luma Labs launched Ray2, a large–scale video generative model capable of producing fast coherent motion, ultra-realistic details, and logical event sequences. Text-to-video generation is available in Ray2 now, with image-to-video, video-to-video and editing capabilities coming soon [Details].
NovaSky (UC Berkeley’s Sky Computing Lab) released Sky-T1-32B-Preview, a fully open-source reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks. Sky-T1-32B-Preview was trained for less than $450. It’s fine-tuned from Alibaba’s Qwen2.5-32B-Instruct, an open source model without reasoning capabilities, while QwQ-32B-Preview was used to generated training data [Details].
Black Forest Labs launched FLUX Pro Finetuning API [Details].
ByteDance Research introduced Tarsier2, a large vision-language model (LVLM) for generating detailed and accurate video descriptions, while also exhibiting superior general video understanding capabilities. Tarsier2-7B outperforms leading proprietary models, including GPT-4o and Gemini 1.5 Pro, in detailed video description tasks. It also sets new state- of-the-art results across 15 public benchmarks, spanning tasks such as video question-answering, video grounding, hallucination test, and embodied question-answering, demonstrating its versatility as a robust generalist vision-language model [Paper].
Jina AI released ReaderLM-v2, a 1.5B small language model for HTML-to-Markdown conversion and HTML-to-JSON extraction, with superior accuracy and improved longer context handling [Details].
Google announced the general availability of Vertex AI’s RAG Engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods [Details].
MiniMax unveiled T2A-01-HD, a Text-to-Audio model with emotional depth that is available via hailuo.ai/audio and API platform. It supports voice cloning with just 10 seconds of audio, automatic emotion detection or manual controls for perfectly expressive speech and a library of 300+ pre-built voices [Details]
Microsoft released updates for AutoGen, their open-source programming framework for agentic AI and Magentic-One, a new generalist multi-agent application to solve open-ended web and file-based tasks across various domains [Details].
Vidu launched Vidu 2.0 an update to its video model that can generate a video in 10 seconds with enhanced consistency. Vidu is offering unlimited free generations during non-peak hours [Link].
Shanghai AI Laboratory released InternLM3-8B-Instruct, ageneral-purpose usage and advanced reasoning that surpass models like Llama3.1-8B and Qwen2.5-7B. It’s trained on only 4 trillion tokens, saving more than 75% of the training cost compared to other LLMs of similar scale [Details].
Researchers released LlamaV-o1, an open-source 11 billion parameters visual reasoning model that outperforms existing open-source models, including the recent Llava-CoT, across multiple metric [Details].
Langchain introduced ‘Agent Inbox’, a new UX for interacting with ‘ambient agents’ (‘AI Agents that listen to an event stream and act on it accordingly, potentially acting on multiple events at a time’). Open-source AI agents for email and social media also released [Details].
Together AI launched Agent Recipes, a site to learn about agent/workflow recipes with code examples [Link].
Microsoft relaunches Copilot for business with free AI chat and pay-as-you-go agents [Details].
Contextual AI announced the general availability of the Contextual AI Platform for building specialized RAG agents that can complete highly complex knowledge tasks [Details].
Adobe launched Adobe Firefly Bulk Create, a web app that allows users to edit several photos with AI in one go [Details].

🔦 Weekly Spotlight

'Future of Jobs Report 2025' by World Economic Forum [Link].
OpenAI’s Economic Blueprint - The Blueprint outlines policy proposals for how the US can maximize AI’s benefits, bolster national security, and drive economic growth [Link].
Run ComfyUI workflows for free with Gradio on Hugging Face Spaces [Link].
ChatGPT reveals the system prompt for ChatGPT Tasks [Link].
OpenAI’s updated guide on Function Calling [Link].
Trying out QvQ—Qwen’s new visual reasoning model [Link].
Takes on “Alignment Faking in Large Language Models” [Link[.
Building knowledge graph agents with LlamaIndex Workflows [Link]

🔍 🛠️ AI Toolbox: Product Picks of the Week

Humva: Custom AI video avatar - free for now
AI SDR-Kit by Composio: AI SDR-Kit lets developers build AI sales agents with customizable workflows, API integrations, and intelligent automation for outreach, engagement, lead qualification, pipeline management & more.
Dreamina by Capcut: the all-in-one AI creative suite for all your artistic work

Last week’s issue
Stable Point Aware 3D, Cosmos, Autonomous game characters and Digits by Nvidia, Qwen Chat, Hailuo's Subject Reference, rStar-Math, Text-to-Video gen with Transparency, Cohere's North, STAR, & more
Jan 10
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
Read full story