DeepSeek-V3.2-Speciale, Kling O1+ AI Avatar 2.0 + Video 2.6 + Image O1, Kamo-1, Runway Gen-4.5, Vidi2 by ByteDance, Mistral 3, Lux by OpenAGI, Norton Neo and more
DeepSeek-V3.2-Speciale, Kling O1+ AI Avatar 2.0 + Video 2.6 + Image O1, Kamo-1, Runway Gen-4.5, Vidi2 by ByteDance, Mistral 3, Lux by OpenAGI, Norton Neo and more
Dec 05, 2025
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #116):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ Weekly News at a Glance
DeepSeek released DeepSeek-V3.2 and DeepSeek-V3.2-Specialereasoning-first models built for agents. DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). DeepSeek-V3.2 is the successor to V3.2-Exp and is live on App, Web & API. DeepSeek-V3.2-Speciale is API-only for now [Details].
Runway introduced a new frontier video model, Runway Gen-4.5 - codenamed Whisper Thunder (aka) David. Gen-4.5 scored 1,247 Elo points in the Artificial Analysis Text to Video leaderboard surpassing all other AI video models [Details].
Kuaishou:
KlingAI Avatar 2.0 — upgraded, expressive, and built for full 5-minute performances. The Avatar can now assist with explanations through body movements, gestures, expressions, and camera angles with major optimizations made to address hand movement issues from version 1.0 [Details].
Kling Video 2.6: Kling AI’s first model with native audio that simultaneously produces video visuals and complete audio, including voiceovers, sound effects, and ambient sounds [Details].
Kling O1: a unified multimodal video model that generates and edits video from text, images, or video inputs while maintaining strong subject and scene consistency. It integrate a wide range of tasks including Reference to Video, Text-to- Video, Start & End Frames generation, video content editing, modifications, transformations, restyling, and camera extension all into one unified model [Details].
Kling Image O1 model that precisely responds to edits while preserving the original style, lighting, and texture. It supports extracting features from up to 10 reference images, precisely locking in the subject contours, core elements, and tonal qualities of each image [Details].
Harmonic’sAristotle AI independently proved Erdos Problem #124 that has been open for nearly 30 years since conjectured in the paper “Complete sequences of sets of integer powers” in the journal Acta Arithmetica [Details].
ByteDance:
Vidi2 large multimodal model for video understanding and creation. In its second release, Vidi2 advances video understanding with fine-grained spatio-temporal grounding (STG) and extends its capability to video question answering (Video QA), enabling comprehensive multimodal reasoning. Given a text query, it can identify not only the corresponding timestamps but also the bounding boxes of target objects within the output time ranges outperforming Gemini 3, GPT-5 and Qwen3-VL [Details].
Seeddream 4.5 upgraded image and editing model that achieves an all-round improvement through the overall scaling of the model [Details]
An AI voice assistant for smartphones, powered by ByteDance’s Doubao large language model, that can act autonomously on the user’s behalf [Details].
Kinetix unveiled Kamo-1 (Open Beta) video model that combines video diffusion with 3D understanding, giving creators full control over camera and character motion. Kamo-1 takes a simple acting video performance, a reference frame and the user’s desired camera movement, and outputs a fully controlled generation [Details].
OpenAGI Foundation released its first foundation computer model Lux, that achieved a score of 83.6 on the Online-Mind2Web benchmark, which includes over 300 real-world, web-based computer-use tasks, outperforming Google’s Gemini CUA (69.0), OpenAI’s Operator (61.3), and Anthropic’s Claude Sonnet 4 (61.0). Lux completes each step in 1 second, compared to roughly 3 seconds for OpenAI’s model, and it’s 10x cheaper [Details].
Microsoft released VibeVoice-Realtime-0.5B, a lightweight real‑time text-to-speech model supporting streaming text input and robust long-form speech generation. It produces initial audible speech in ~300 ms [Details].
Mistral released Mistral 3, the next generation of Mistral models. Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3, a sparse mixture-of-experts trained with 41B active and 675B total parameters. Mistral Large 3 debuts at #2 in the OSS non-reasoning models category (#6 amongst OSS models overall) on the LMArena leaderboard. All models are released under the Apache 2.0 license [Details].
Google launched Google Workspace Studio to create, manage, and share AI agents to automate work in Workspace. End users can simply describe what they want to automate in plain language (e.g., “If an email contains a question for me, label the email as ‘To respond’ and ping me in Chat.”), and Gemini will create it. Agents can be connected to third-party apps and platforms including Asana, Jira, Mailchimp, and Salesforce [Details].
Amazon:
Nova 2 Omni (Preview): an all-in-one model for multimodal reasoning and image generation that supports text, images, video, and speech inputs while generating both text and image outputs. It enables multimodal understanding, image generation and editing using natural language, and speech transcription. The model supports a 1M token context window, 200+ languages for text processing and 10 languages for speech input [Details].
Amazon Nova 2 foundation models: Nova 2 Lite and Nova 2 Pro (Preview) that support extended thinking with step-by-step reasoning and task decomposition and include three thinking intensity levels—low, medium, and high—giving developers control over the balance of speed, intelligence, and cost. The models also offer built-in tools such as code interpreter and web grounding, support remote MCP tools, and provide a one-million-token context window.
General availability of Amazon Nova 2 Sonic, a speech-to-speech foundation model for natural, real-time voice conversations. The model now handles alphanumeric inputs, short utterances, and 8KHz telephony speech input with improved accuracy and is also more robust when dealing with different accents and background noise [Details].
Amazon Nova Forge: a new service for organisations to build their own frontier models using Nova. Nova Forge customers can start their development from early model checkpoints, blend their datasets with Amazon Nova-curated training data, and host their custom models securely on AWS [Details].
Three new AI agents called ‘frontier agents’ - Frontier agents represent a new class of AI agents that are autonomous, scalable, and work for hours or days without constant intervention. These include Kiro autonomous agent, AWS Security Agent, and AWS DevOps Agent [Details].
New features in Amazon Bedrock AgentCore, the platform for building and deploying agents. Policy in AgentCore to set boundaries on what agents can do with tools, and AgentCore Evaluations to understand how the agents will perform in the real world [Details].
AI Factories: a new offering for enterprises that deploys dedicated AWS AI infrastructure in customers’ own data centers, operated exclusively for them. AWS AI Factories operate like a private AWS Region that gives secure, low-latency access to compute, storage, database, and AI services [Details].
Telegram launched COCOON, a decentralized AI inference platform built on TON blockchain, enabling GPU owners to earn cryptocurrency by serving AI models in trusted execution environments (TEE) [Details].
A new report from Visa and Morning Consult shows that Spain, Singapore, South Africa, the UAE, Brazil and Mexico show the highest rate of AI openness. In the U.S., nearly half of consumers (47 percent) have already used AI for at least one shopping-related task, with gift discovery, price comparison and product research emerging as top holiday use cases across North America [Details].
GELab-Zero: First complete open-source GUI Agent with model + infrastructure by StepFun. Plug-and-play engineering setup included — no cloud dependencies, full privacy control.
Hugging Face Skills: Agent Context Protocol (ACP) definitions for AI/ML tasks like dataset creation, model training, and evaluation. Interoperable with all major coding agent tools like OpenAI Codex, Anthropic’s Claude Code, Google DeepMind’s Gemini CLI, and Cursor.
RAPTOR: an autonomous offensive/defensive security research framework, based on Claude Code. It empowers security research with agentic workflows and automation.
🔍 🛠️ Product Picks of the Week
Norton Neo: AI Native browser combining built-in privacy/safety (local history + data control, phishing/malware blocking), AI-powered tools (summaries, context-aware search & typing assistance), and smart tab-management.
Incredibly thoroug roundup Mariam! The AWS AI Factories concept is particularly intresting because it flips the traditonal cloud paradigm on its head. Instead of enterprises shipping data to the cloud, AWS is now shiping the cloud to them, which sidesteps most compliance and latency headaches in one move. The real question iswhether this creates genuine sovereign compute or just shifts the control plane while keeping enterprises locked in to AWS's ecosystem.
Incredibly thoroug roundup Mariam! The AWS AI Factories concept is particularly intresting because it flips the traditonal cloud paradigm on its head. Instead of enterprises shipping data to the cloud, AWS is now shiping the cloud to them, which sidesteps most compliance and latency headaches in one move. The real question iswhether this creates genuine sovereign compute or just shifts the control plane while keeping enterprises locked in to AWS's ecosystem.