LTX-2 open-source model, Seed3D 1.0, DeepSeek-OCR, HunyuanWorld-Mirror, Surfer 2, Copilot Fall Release, ChatGPT Atlas and Company knowledge, Krea Realtime 14B, Hugging Face AI Sheets and more
LTX-2 open-source model, Seed3D 1.0, DeepSeek-OCR, HunyuanWorld-Mirror, Surfer 2, Copilot Fall Release, ChatGPT Atlas and Company knowledge, Krea Realtime 14B, Hugging Face AI Sheets and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #113 ):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
🗞️🗞️ Weekly News at a Glance
Lightricks introduced LTX-2 an open-source AI model that combines synchronized audio and video generation, 4K fidelity, and real-time performance. Multi-keyframe conditioning, 3D camera logic, and LoRA fine-tuning deliver frame-level precision and style consistency. Accepts text, image, video, and audio inputs, plus depth maps and reference footage for guided conditioning [Details].
Microsoft launched its Copilot Fall Release with 12 major updates focused on making it more personal and collaborative. This includes [Details]:
Groups for real-time collaboration among up to 32 users, an expressive avatar Mico for expressive feedback during voice interactions that even changes colors, Real Talk mode that adapts to communication styles while offering thoughtful pushback, long term memory, connectors, Copilot for Health with credible medical information and doctor matching and Learn Live - a voice-enabled Socratic tutor that uses questions, visual cues, and interactive whiteboards.
Copilot Mode in Edge: Converts Microsoft Edge into an AI browser where Copilot can see and reason over open tabs, summarize and compare information, and take Actions like booking a hotel or filling out forms. Voice-only navigation enables hands-free browsing.
DeepSeek AI released DeepSeek-OCR, an open-source 3B vision-language model designed as a preliminary proof-of-concept for efficient vision-text compression. It compresses long documents into vision tokens, achieving 97% precision at 10x compression ratios [Details].
Tencent released Hunyuan World 1.1 (WorldMirror), an open-source universal feed-forward 3D reconstruction model. While Hunyuan World 1.0 focused on generating 3D worlds from text or single-view images, Hunyuan World 1.1 adds video-to-3D and multi-view-to-3D world creation. It simultaneously generates multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussian Splattings [Details].
OpenAI:
ChatGPT Atlas: a new web browser built with ChatGPT at its core. If you turn on browser memories, ChatGPT will remember key details from content you browse to improve chat responses and offer smarter suggestions. It’s available on macOS including for Free users [Details].
Company knowledge for ChatGPT: Company knowledge lets ChatGPT pull context from connected apps to provide company-specific answers with citations. Powered by GPT-5, it searches multiple sources for comprehensive responses and resolves conflicting information. Available on Business, Enterprise, and Edu plans [Details].
Google:
The Gemini API now includes Grounding with Google Maps to build location-aware AI applications. Developers will be able to use the Google Maps tool in the Gemini API to ground their apps in the up-to-date geospatial data [Details].
Google AI Studio introduced an updated Build Mode for vibe coding which lets you mix and match AI capabilities, automatically wiring up the right models and APIs [Details].
New Earth AI models to Gemini capabilities in Google Earth, letting users instantly find objects and discover patterns from satellite imagery. Geospatial Reasoning, a framework powered by Gemini now lets AI automatically connect different Earth AI models like weather forecasts, population maps and satellite imagery to answer complex questions [Details].
Google Skills: Google’s new learning platform for building essential AI and technical capabilities [Details].
Krea released Krea Realtime 14B, a 14-billion parameter open-source model capable of real-time, long-form video generation. The model enables real-time interactive capabilities: users can modify prompts mid-generation, restyle videos on-the-fly, and see first frames in 1 second. It’s distilled from the Wan 2.1 14B text-to-video model using Self-Forcing, a technique for converting regular video diffusion models into autoregressive models [Details].
Anthropic:
Claude Code on the web (in beta research preview): lets developers delegate coding tasks to Claude directly from their browser using Anthropic-managed cloud infrastructure, with the ability to run multiple tasks in parallel across different GitHub repositories [Details].
Claude for Life Sciences initiative: New connectors designed to make it easier to use Claude for scientific discovery, including Benchling, BioRender, PubMed, Scholar Gateway, Synapse.org, 10x Genomics and specialized Agent Skills for tasks like RNA sequencing quality control etc. [Details].
Bytedance introduced Seed3D 1.0, a foundation model that
generates simulation-ready 3D assets from single images, with accurate geometry, well-aligned textures, and realistic physically-based materials. Seed3D 1.0 is available on Volcano Engine [Details]
Hugging Face released an update to Hugging Face AI Sheets adding vision capabilities that let users extract data from images. AI Sheets is its open-source tool for building, transforming, and enriching data with open AI models [Details].
H Company introduced Surfer 2: a cross-platform computer-use agent that achieves state-of-the-art results on four major agent benchmarks spanning desktop, web, and mobile environments (OSWorld, WebVoyager, WebArena, AndroidWorld). It integrates third-party frontier models and Holo1.5 models in a unified architecture [Details].
Qwen Deep Research received a major upgrade: It now generates not only the report, but also a live webpage and a podcast - powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS [Details].
Lovable added Shopify integration enabling users to build online stores on the platform by chatting with AI [Details].
Eleven Labs Voice Isolator now supports video. Upload any file and remove background noise - chatter, music, feedback, or street noise [Details].
Nof1 launched Alpha Arena a new benchmark designed to measure AI’s investing abilities. Each model is given $10,000 of real money, in real markets, with identical prompts and input data [Details].
Runway:
Apps for Advertising collection: a growing library of easy-to-use apps for generating product shots for social and e-commerce to mocking up designs in real world placements, without the need for complicated prompting or advanced workflows [Details].
Workflows: allows you to create custom node-based workflows that chain together multiple models, modalities, and intermediary steps for greater control over AI generations [Details].
Model Fine-tuning for specific use cases. Model Fine-tuning is available now for select pilot partners and coming soon to everyon [Details].
Amazon is developing smart delivery glasses designed specifically for Delivery Associates, that would help them scan packages, follow turn-by-turn walking directions, and capture proof of delivery without the use of their phone [Details].
Fish Audio publicly launched Fish Audio S1, an expressive and emotionally rich text-to-speech model that is 6x cheaper than ElevenLabs. It generates lifelike voices that capture emotion, rhythm, and recreates a natural voice from 10 seconds of audio preserving accent, tone, and speaking habits [Details]
🔦 🔍 Weekly Spotlight
Articles/Courses/Videos:
Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers
Agent Engineering 101 - The intersection of software, systems and security engineering
🔍 🛠️ Product Picks of the Week
cto.new: A completely free AI coding agent with unlimited access to major frontier AI models
Shortcut: Excel AI agent that’s the first to achieve verified state-of-the-art results on the Spreadsheet Benchmark.
Huxe: Personal audio companion, built to turn what you care about into interactive audio.
Trupeer: Create professional videos and guides for any product in minutes, from just a rough screen recording
Director by Browserbase: Design repeatable browser agents with a single prompt in natural language.
Thanks for reading and have a nice weekend! 🎉 Mariam.


Heeey, long time no AI Brews - good to have you back!
Love this perspective! So many incredible AI advancements. Thnak you for this insightful overview.