Hunyuan-Large, AI model for open-world games, X-Portrait 2 for realistic character animations, FLUX1.1 [pro] Ultra and Raw, Magentic-One, Hume AI App, action model for GUI agents and More
Hunyuan-Large, AI model for open-world games, X-Portrait 2 for realistic character animations, FLUX1.1 [pro] Ultra and Raw, Magentic-One, Hume AI App, action model for GUI agents and More
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #83 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Tencent released Hunyuan-Large, the largest open-source Transformer-based mixture of experts model - a total of 389 billion parameters and 52 billion activation parameters. It outperforms LLama3.1-70B and shows comparable performance comparable to the much larger LLama3.1-405B model [Details | Demo].
AI lab Decart and Etched released Oasis, the first playable AI model that generates open-world games. Unlike many AI video models, which generate video from text, Oasis generates video frame-by-frame from keyboard and mouse inputs. It generates real-time gameplay, including physics, game rules, and graphics. Weights for Oasis 500M, a downscaled version of the model, along with inference code for action-conditional frame generation have been released [Details | Demo].
Black Forest labs added two modes in FLUX1.1 [pro] - Ultra and Raw Modes. Ultra enables image generation at four times the resolution of standard FLUX1.1 [pro] , without sacrificing prompt adherence and with generation time of only 10 seconds. Raw mode greatly increases diversity in human subjects and makes nature photography look more realistic [Details].
Bytedance introduced X-Portrait 2 model for expressiveness portrait animation. users only need to provide a static portrait image and a driving performance video. The model can transfer fast head movements, subtle and minuscule facial expressions from the actors as well as challenging expressions including pouting, tougue-out, cheek-puffing and frowning [Details].
Tencent released Hunyuan3D-1.0, a unified framework for both Text-to-3D and Image-to-3D generation. The lite model takes around 10 seconds to produce a 3D mesh from a single image on an NVIDIA A100 GPU, while the standard model takes roughly 25 seconds, while maintaining the quality [Details].
Microsoft introduced Magentic-One, a new generalist multi-agent system for solving open-ended web and file-based tasks across a variety of domains. An open-source implementation of Magentic-One on Microsoft AutoGen has been released [Details].
Anthropic released Claude 3.5 Haiku with a price increase to four times that of its predecessor, Claude 3 Haiku [Details]
Open AI released Predicted Outputs feature that significantly decrease latency for gpt-4o and gpt-4o-mini by providing a reference string [Details].
Researchers released OS-Atlas - an open-source foundational action model for GUI agents that excels at GUI grounding and in out-of-domain agentic tasks across MacOS, Windows, Linux, Android, and Web [Details].
Mistral launched a new API for content moderation. It’s powered by a fine-tuned model (Ministral 8B) trained to classify text, in a range of languages, into one of nine categories: sexual, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health, financial, law, and personally identifiable information [Details]
Runway launched advanced AI camera controls for its Gen-3 Alpha Turbo video generation model [Details].
Claude can now view images within a PDF, in addition to text. This helps Claude 3.5 Sonnet more accurately understand complex documents, such as those laden with charts or graphics. The Anthropic API now also supports PDF inputs in beta [Details].
Google’s Big Sleep AI model sets world first with discovery of SQLite security flaw [Details].
Nous Research launched Nous Chat, a web-based chat interface powered by its Hermes 3 70B model
Standard Intelligence released Hertz-dev, an open-source audio-only base model with 8.5 billion parameters for full-duplex conversational audio [Details].
Nvidia launched new AI and simulation tools for robot learning and humanoid development [Details].
🔦 Weekly Spotlight
AdvancedLivePortrait-WebUI: Gradio WebUI for AdvancedLivePortrait [Link].
Pushing the frontiers of audio generation - by Google DeepMind [Link].
ChatGPT Search is not OpenAI’s ‘Google killer’ yet [Link].
What it actually takes to deploy GenAI applications to enterprises [Video Link].
How to create software diagrams With ChatGPT and Claude [Link].
Sparrow: an open-source solution for efficient data extraction and processing from various documents and images [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Hume AI App: New app from Hume AI powered by its speech-language model, EVI 2, that excels in emotional intelligence.
FakeMe Hyper-Personalized AI Movie Trailer Generation: create fully AI-generated movie trailers with everything from story, narration, music, and video generated based on your input.
Clodura: GenAI powered sales co-pilot, from prospecting to closing.
Truva: Transforms sales activities into actionable insights.
Last week’s issue
Thanks for reading and have a nice weekend! 🎉 Mariam.