Open Multimodal Native Model, BeaGo, Mistral advanced Edge Models, Suno Scenes, Supercomplete, Movie Gen, Dash for Business, F5-TTS, Interactive Meeting Avatar and more

Oct 18, 2024

Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.

In today’s issue (Issue #80 ):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week

Sponsor AI Brews

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

Mistral released two new state-of-the-art models for on-device computing and at-the-edge use cases: Ministral 3B and Ministral 8B that outperform their peers. Both models support up to 128k context length and natively support function calling [Details].
Nvidia released Llama-3.1-Nemotron-70B, an open-weight, commercially permissive model that outperforms GPT-4o and Claude 3.5 Sonnet on multiple benchmarks [Details | Try].
Rhymes AI released Aria, the first open mixture-of-experts (MoE) model that is multimodal native (Apache 2.0 license). Aria processes text, images, video, and code all at once, without needing separate setups for each type. It outperforms Pixtral-12B and Llama3.2-11B across a range of multimodal, language, and coding tasks. Aria surpasses GPT-4o mini in long video understanding and outperforms Gemini-1.5-Flash in long document understanding. It is a mixture-of-expert model with 3.9B and 3.5B activated parameters per visual token and text token, respectively [Details].
Rhymes AI also launched a search app BeaGo, powered by Aria, that provides precise, high-quality answers instantly with up-to-date real-time insights [Details].
Zyphra released Zamba2-7B, a state-of-the-art small language model, outperforming leading 7B models such as Mistral-7B, Gemma-7B, and Llama3-8B. It is extremely inference-efficient, achieving 25% faster time to first token, a 20% improvement in tokens per second, and a significant reduction in memory usage compared to models such as Llama3-8B [Details].
Archetype AI introduced Newton, a physical AI foundation model that learns about the physical world directly from sensor data. It was trained on 0.59 billion samples from open-source datasets covering a wide range of physical behaviors, from electrical currents and river fluid flows to optical sensors. Newton can effectively encode and predict physical behaviors and processes it has never encountered before, without being explicitly taught underlying physical principles [Details]
Google update for NotebookLM, a tool powered by Gemini 1.5: You can now customise NotebookLM's Audio Overview feature, guiding what the AI hosts focus on and their expertise level [Details]
Adobe’s Firefly Video Model, is now limited public beta. The Firefly Video Model (beta) extends Adobe’s family of generative AI models, which already includes an Image Model, Vector Model and Design Model [Details].
Meta introduced Movie Gen, a cast of foundation models that generate high-quality, 1080p HD videos with different aspect ratios and synchronized audio. It enables precise video editing—from styles and transitions to fine-grained edit and can create personalized videos from an image [Paper | Demo Videos]
OpenAI released Swarm, an experimental open-source framework to explore multi-agent orchestration [Details | GitHub].
Dropbox introduced Dash for Business, an AI-powered universal search tool, which makes it easy for teams to search, organize, share, and protect content from across their connected apps, all in one place. Dash, was launched last year. Dash for Business goes further by improving the search experience and helping businesses reduce security risks, both with in-depth content access controls and by ensuring sensitive company information isn’t surfaced unintentionally [Details].
Researchers released an open-source text-to-speech model F5-TTS (A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching). Trained on a public 100K hours multilingual dataset, it can generate highly natural and expressive speech [Details].
Researchers released a new open-source AI video generator Pyramid Flow. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation [Details].
OpenAI introduced MLE-bench, a benchmark to evaluate AI agents on machine learning engineering tasks using 75 Kaggle competitions [Details].
Play AI introduced a new Text-To-Speech model, Play 3.0 mini. It’s faster, more accurate, handles multiple languages, supports streaming from LLMs, and it’s more cost-efficient [Details].
Lacoste is the first fashion brand to adopt a new AI-powered anti-counterfeit tool, Vrai AI, that can use a photo of a product or a specific detail of an item to examine microscopic visual details to differentiate between a genuine piece and a copy [Details]
Suno AI launched Suno Scenes, a new feature available in the iOS app that turns photos and videos into unique songs [Details].
Codeium, the AI code assistant, introduced a new feature Supercomplete. While Autocomplete passively predicts text, Supercomplete passively predicts intent. With Supercomplete, you now swiftly and easily accept suggestions of diffs, suggestions, and edits based on the next intent, regardless of your cursor position [Details].
Adobe MAX 2024 introduced several AI-powered features, including Premiere Pro’s Generative Expand, which seamlessly adds frames to extend video clips, and Photoshop’s Distraction Removal, which automatically detects and removes unwanted elements like wires and people, replacing them with AI-generated content [Details].
YouTube creators can now use the AI-powered Dream Track feature in Shorts to generate custom soundtracks via text prompts [Video].
Amazon's AI generator tool can now create audio ads [Details].

🔦 Weekly Spotlight

Use Ollama with any GGUF Model on Hugging Face Hub without creating a new Modelfile [Link].
Machines of Loving Grace - How AI Could Transform the World for the Better- by Anthropic CEO Dario Amodei [Link].
Open Canvas: an open source web application for collaborating with agents to better write documents. It is inspired by OpenAI's Canvas [Link].
LoRA Lab - Mix-up and combine multiple FLUX LoRAs [Link].
Foyle: An open-source AI Assistant to help developers deploy and operate their applications [Link].
The Open-Source AI Cookbook - a collection of notebooks illustrating practical aspects of building AI applications and solving various machine learning tasks using open-source tools and models [Link]
Open Financial LLM Leaderboard [Link].
Deploy Llama 3.2 Vision on Amazon SageMaker [Link]
How to build a real-time image generator with Flux and Together AI [Link]

🔍 🛠️ AI Toolbox: Product Picks of the Week

Hailuo AI: Text-to-video platform by MiniMax, a Chinese startup funded by Alibaba. It now supports image-to-video.
Interactive Meeting Avatar by HeyGen: AI avatars and digital twins can now join a Zoom meeting and interact.
SuperStudio by Kaiber AI: Creative AI playground powered by Flux Image and Luma Video models.
Glif: An AI sandbox to build, remix and share AI microapps (aka glifs)
Finic: Provides web browser infrastructure for bots, scrapers, automations, and AI agents

Last week’s issue
FLUX1.1 [pro], Canvas, Realtime API from OpenAI, open-sourcing of Reverb, Digital Twin Catalog, Copilot Vision, Depth Pro, new Whisper model, Pikaffects and more
October 4, 2024
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
Read full story