1T open-source LLM, Llama 3.1 405B, Mistral Large 2, Stable Video 4D, Outfit Anyone, SearchGPT, Llama Guard 3 and more

Jul 26, 2024

Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.

In today’s issue (Issue #72 ):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week

Sponsor AI Brews

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

Meta released Llama 3.1 405B, its first frontier-level open source AI model. . It is competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. Meta also released upgraded versions of the 8B and 70B models with have a longer context length of 128K, tool use, and overall stronger reasoning capabilities [Details].
Mistral AI announced Mistral Large 2, a new flagship 123B-parameter model with a 128k context window. It outperforms the previous Mistral Large, and performs on par with leading models such as GPT-4o, Claude 3 Opus, and Llama 3 405B. Mistral Large 2 provides a much stronger multilingual support, and advanced function calling capabilities. You can use Mistral Large 2 via la Plateforme as well as on le Chat. Weights for the instruct model are available and are also hosted on HuggingFace [Details].
Meta released two new models: Llama Guard 3 and Prompt Guard. Prompt Guard is a small classifier that detects prompt injections and jailbreaks. Llama Guard 3 is a safeguard model that can classify LLM inputs and generations [Details].
BAAI & TeleAI released Tele-FLM-1T, an open-source multilingual large language model with 1T parameters. Built upon the decoder-only transformer architecture, it has been trained on approximately 2T tokens [Details].
Stability AI released Stable Video 4D, an innovative model for dynamic multi-angle video generation. Users upload a single video and specify their desired 3D camera poses. Stable Video 4D then generates eight novel-view videos following the specified camera views, providing a comprehensive, multi-angle perspective of the subject. It is currently available on Hugging Face. [Details]
Text-to-video model Kling from China’s Kuaishou Technology is now available globally. It can generate 1080p high-definition videos lasting up to 2 minutes. The model is capable of producing large-scale realistic motions and simulating physical world characteristic [Link].
Cohere introduced Rerank 3 Nimble: a new foundation model in Cohere Rerank model series, for enterprise search and RAG systems, which is ~3x faster than Rerank 3 while maintaining a high level of accuracy. It’s available only on Amazon Sagemaker [Details].
Sakana AI released two models: Evo-Ukiyoe, which generates Ukiyoe-style images from Japanese prompts, and Evo-Nishikie, which colorizes monochrome Ukiyoe illustrations [Details]
OpenAI announced SearchGPT, a prototype AI search tool that combines real-time web data with conversational AI. It is currently being tested by a limited group of users and publishers [Details].
Moondream released a new version of moondream, with significant improvements in OCR and document understanding. moondream2 is a small vision language model designed to run efficiently on edge devices [Details].
Gemini 1.5 Flash is now available to all Gemini users on both web and mobile [Details].
Udio announced Udio v1.5, an update to their music model with improved audio quality, key control and improved global language results. The update also includes new features on the Udio platform [Details].
OpenAI now offers fine-tuning for GPT-4o mini to tier 4 & 5 users, with plans to expand access to all tiers. Users get 2M free training tokens per day until Sept 23 [Details].
Stability AI shared the Stable Audio Open research paper that describes the architecture and training process of Stability AI’s new open-weights text-to-audio model trained with Creative Commons data [Details].
Alibaba Group introduced Outfit Anyone, a two-stream conditional diffusion model for high-quality virtual try-ons, adaptable to various body shapes and poses, including realistic and anime images [Details]
Eleven Labs introduced Turbo 2.5, a high-quality low-latency text to speech model in 32 languages. It now also supports Vietnamese, Hungarian and Norwegian with English processing now being 25% faster [Details].
Salesforce AI research released MINT-1T, the first trillion token multimodal interleaved open-source dataset [Details].
Apple released released DCLM-Baseline-7B, a 7 billion open-source LLM trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark [Details].

🔦 Weekly Spotlight

Open Source AI Is the Path Forward by Mark Zuckerberg [Link].
LLama Agentic System by Meta AI [Link].
OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole [Link].
MimicMotion - generate high-quality videos of arbitrary length with any motion guidance [Link].
LLM Pricing Comparison Tool - Llama 3.1 405B and the new Mistral large [Link]
Agentic Workflows: Emerging Architectures and Components [Link]

🔍 🛠️ AI Toolbox: Product Picks of the Week

Deepgram’s Free Transcription Tool: Convert your conversations, audio files, or YouTube videos into text - supports over 30 languages and dialects.
RivalSense: RivalSense AI connects to 80+ public sources and curates competitor insights relevant to your business model
Flow Studio: Generates 3-minute videos with plots, consistent characters, automatically matched background music and sound effects

Last week’s issue
Mistral NeMo, GPT-4o mini, AI-powered platform to create controlled videos, SmolLM, Anthology Fund and more
July 19, 2024
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI. In today’s issue (Issue #71 ): AI Pulse: Weekly News & Insights at a Glance AI Toolbox: Product Picks of the Week
Read full story