Claude's computer use, Mochi 1 & Allegro open-source video models, Aya Expanse, Stable Diffusion 3.5, HUGS by Hugging face, Meta Spirit LM, Act-One, Haiper 2.0, Multimodal Embed 3,Playground v3 & More

Oct 25, 2024

Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.

In today’s issue (Issue #81 ):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week

Sponsor AI Brews

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

Anthropic announced computer use, a new capability in public beta. Available on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Anthropic also announced a new model, Claude 3.5 Haiku and an upgraded Claude 3.5 Sonnet which demonstrates significant improvements in coding and tool use. The upgraded Claude 3.5 Sonnet is now available for all users, while the new Claude 3.5 Haiku will be released later this month [Details].
Cohere released Aya Expanse, a family of highly performant multilingual models that excels across 23 languages and outperforms other leading open-weights models. Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B, a model more than 2x its size, setting a new state-of-the-art for multilingual performance. Aya Expanse 8B, outperforms the leading open-weights models in its parameter class such as Gemma 2 9B, Llama 3.1 8B, and the recently released Ministral 8B [Details].
Genmo released a research preview of Mochi 1, an open-source video generation model that performs competitively with the leading closed models and is licensed under Apache 2.0 for free personal and commercial use. Users can try it at genmo.ai/play, with weights and architecture available on HuggingFace. The 480p model is live now, with Mochi 1 HD coming later this year [Details].
Rhymes AI released, Allegro, a small and efficient open-source text-to-video model that transforms text into 6-second videos at 15 FPS and 720p. It surpasses existing open-source models and most commercial models, ranking just behind Hailuo and Kling. Model weights and code available, Apache 2.0 [Details | Gallery]
Meta AI released new quantized versions of Llama 3.2 1B and 3B models. These models offer a reduced memory footprint, faster on-device inference, accuracy, and portability, all the while maintaining quality and safety for deploying on resource-constrained devices [Details].
Stability AI introduced Stable Diffusion 3.5. This open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo. Additionally, Stable Diffusion 3.5 Medium will be released on October 29th. These models are highly customizable for their size, run on consumer hardware, and are free for both commercial and non-commercial use under the permissive Stability AI Community License [Details].
Hugging Face launched Hugging Face Generative AI Services a.k.a. HUGS. HUGS offers an easy way to build AI applications with open models hosted in your own infrastructure [Details].
Runway is rolling out Act-One, a new tool for generating expressive character performances inside Gen-3 Alpha using just a single driving video and character image [Details].
Anthropic launched the analysis tool, a new built-in feature for Claude.ai that enables Claude to write and run JavaScript code. Claude can now process data, conduct analysis, and produce real-time insights [Details].
IBM released new Granite 3.0 8B & 2B models, released under the permissive Apache 2.0 license that show strong performance across many academic and enterprise benchmarks, able to outperform or match similar-sized models [Details]
Playground AI introduced Playground v3, a new image generation model focused on graphic design [Details].
Meta released several new research artifacts including Meta Spirit LM, an open source multimodal language model that freely mixes text and speech. Meta Segment Anything 2.1 (SAM 2.1), an update to Segment Anything Model 2 for images and videos has also been released. SAM 2.1 includes a new developer suite with the code for model training and the web demo [Details].
Haiper AI launched Haiper 2.0, an upgraded video model with lifelike motion, intricate details and cinematic camera control. The platform now includes templates for quick creation [Link].
Ideogram launched Canvas, a creative board for organizing, generating, editing, and combining images. It features tools like Magic Fill for inpainting and Extend for outpainting [Details].
Perplexity has introduced two new features: Internal Knowledge Search, allowing users to search across both public web content and internal knowledge bases., and Spaces, AI-powered collaboration hubs that allow teams to organize and share relevant information [Details].
Google DeepMind announced updates for: a) Music AI Sandbox, an experimental suite of music AI tools that aims to supercharge the workflows of musicians. b) MusicFX DJ, a digital tool that makes it easier for anyone to generate music, interactively, in real time [Details].
Microsoft released OmniParser, an open-source general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent [Details].
Replicate announced playground for users to experiment with image models on Replicate. It's currently in beta and works with FLUX and related models and lets you compare different models, prompts, and settings side by side [Link].
Embed 3 AI search model by Cohere is now multimodal. It is capable of generating embeddings from both text and images [Details].
DeepSeek released Janus, a 1.3B unified MLLM, which decouples visual encoding for multimodal understanding and generation. Its based on DeepSeek-LLM-1.3b-base and SigLIP-L as the vision encoder [Details].
Google DeepMind has open-sourced their SynthID text watermarking tool for identifying AI-generated content [Details].
ElevenLabs launched VoiceDesign - a new tool to generate a unique voice from a text prompt by describing the unique characteristics of the voice you need [Details].
Microsoft announced that the ability to create autonomous agents with Copilot Studio will be in public preview next month. Ten new autonomous agents will be introduced in Microsoft Dynamics 365 for sales, service, finance, and supply chain teams [Details].
xAI, Elon Musk’s AI startup, launched an API allowing developers to build on its Grok model[Detail].
Asana announced AI Studio, a No-Code builder for designing and deploying AI Agents in workflows [Details].

🔍 🛠️ AI Toolbox: Product Picks of the Week

Kick: Accounting software, that does the work for you
Krea AI: Access top AI video models from Runway, Hailuo, Pika, Kling etc. in one tool
DreamCut: AI-Powered video editing & screen recording
Brainy Docs: Convert PDF to explainer videos
using AI
Vidify: Turn Shopify product images into AI shoppable videos

Last week’s issue
Open Multimodal Native Model, BeaGo, Mistral advanced Edge Models, Suno Scenes, Supercomplete, Movie Gen, Dash for Business, F5-TTS, Interactive Meeting Avatar and more
October 18, 2024
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
Read full story