Netflix of AI, Perplexity Pages, AI agent platform for financial analysis, Low-latency voice model, AI Prize Fight, Codestral, K2 and MAP-Neo models, AutoCoder, HuggingChat Tools and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #65 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
The Simulation (formerly Fable Studio) launched Showrunner, a platform for users to create TV shows with AI, dubbing it the 'Netflix of AI'. With just a 10-15 word prompt, users can generate scenes and episodes of 2-16 minutes, complete with AI dialogue, voices, editing, shot types, characters, and story development. Fable released a research paper last year on their SHOW-1 model and AI Showrunner Agents that can write, produce, direct, cast, edit, voice and animate episodes of AI TV [Details].
Mistral AI introduced Codestral, a 22B open-weight generative AI model explicitly designed for code generation tasks. With its larger context window of 32k, Codestral outperforms CodeLlama 70B, Llama 3 70B and DeepSeek Coder 33B. Codestral is licensed under the new Mistral AI Non-Production License. It is accessible through Le Chat, La Plateforme and is integrated into LlamaIndex and LangChain [Details | Hugging Face].
Cartesia introduced Sonic, a low-latency voice model that generates lifelike speech. The co-founders of Cartesia had created the state space model architecture. Sonic creates high quality lifelike speech for any voice with a model latency of 135ms—the fastest for a model of this class. Details on the new architecture will be released in a separate report. Sonic is released with a web playground and a low latency API [Details].
AI4Finance Foundation released FinRobot, a novel open-source AI agent platform supporting multiple financially specialized AI agents, each powered by LLM [Details].
IEIT-Yuan released Yuan2.0-M32, a Mixture-of-Experts (MoE) language model with 32 experts, of which 2 are active. Yuan 2.0-M32 is trained from scratch with 2000B token and has surpassed Llama3-70B on the MATH and ARC-Challenge benchmark [Details].
llama3v: a new SOTA vision model that is powered by Llama3 8B and siglip-so400m and trained with under $500. It outperforms LLaVA, the current open-source SOTA vision language model. llama3v features comparable vision abilities of models close to 100x larger in size like GPT4v, Gemini Ultra, and Claude Opus [Details | Hugging Face].
LLM360 released K2, a fully-reproducible 65 billion parameters large language model outperforming Llama 2 70B using 35% less compute. K2 is fully transparent - LLM360 open-sourced all artifacts, including code, data, model checkpoints, intermediate results, and more [Details].
Perplexity AI released a new tool Perplexity Pages, enabling users to create comprehensive, visually appealing content on any topic. Users can type in a topic and receive a structured draft instantly. Perplexity Pages offers the flexibility to create a page as a separate entity, similar to writing a document with full internet access, or you can continue asking questions on Perplexity and convert them into the Page format with a one-click convert button [Details].
Open-Sora is now on V1.1.0. This open-source project aims to reproduce Sora OpenAI’s text-to-video (T2V) model Sora. v1.1.0 significantly enhances video generation quality and text control capabilities [Details].
Multimodal Art Projection (M-A-P) Research released MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T tokens. MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs [Details].
All ChatGPT Free users can now use browse, vision, data analysis, file uploads, and GPTs, earlier available to only pro subscribers [Details].
Higgsfield introduced NOVA-1 text to video model that provides marketers with precise control. Companies can train a custom version of the NOVA-1 model using their product and brand assets [Details].
ByteDance introduced INSTADRAG, a rapid approach enabling high quality drag-based image editing in ∼ 1 second. Code will be released in 2-4 weeks [Details].
Suno announced v3.5, which is now available to all users. It lets you make 4 minute songs, provides full song in a single generation and featres improved song structure and vocal flow. Make a song from any sound feature coming soon [Details].
6079 announced AI Prize Fight, a first-of-its-kind street fighting esports competition where teams will go head-to-head training AI agents for the championship belt. Registration will begin the week of June 3rd [Details].
Scale released the SEAL Leaderboards, which rank frontier LLMs using curated private datasets that can’t be gamed. The initial domains covered include Coding, Instruction Following, Math and Multilinguality [Details].
Researchers released AutoCoder, a code LLM that outperforms GPT-4 Turbo and GPT-4o on the HumanEval benchmark. It’s code interpreter can install external packages instead of limiting to built-in packages tasks. The base model is deepseeker-coder [Details].
Microsoft launched Copilot for Telegram - a personal generative AI assistant powered by GPT model and Bing Search, available within Telegram [Details].
LMSYS Chatbot Arena Leaderboard update: Gemini 1.5 Pro/Advanced at #2, closing in on GPT-4o. Gemini 1.5 Flash at #9, outperforming Llama-3-70b and nearly reaching GPT-4-0125 [Link].
Udio introduced Udio-130, a new music generation model capable of two-minute generations and new features [Details].
Tools are now available in HuggingChat. Tools open up a wide range of new possibilities, allowing the model to determine when a tool is needed, which tool to use, and what arguments to pass (via function calling) [Details].
SambaNova's Samba-1 Turbo has set a new record for large language model inference performance in recent benchmarking by Artificial Analysis. Samba-1 Turbo runs Llama 3 8B at 1000 tokens per second (t/s) on just 16 chips, and can concurrently host up to 1000 Llama3 checkpoints on a single 16-socket SN40L node. This is the fastest speed for serving Llama 3, while maintaining full precision at a lower cost [Details].
GitHub announced the 2024 cohort for its GitHub Accelerator program, featuring 11 open-source AI projects [Details].
Opera browser has integrated Google’s Gemini AI models into its existing Aria AI extension. Aria, released last year, acts like an AI assistant to answer user queries, write code, and perform other tasks [Details].
Tool use, which enables Claude to interact with external tools and APIs, is now generally available across the entire Claude 3 model family on the Anthropic Messages API, Amazon Bedrock, and Google Cloud's Vertex AI [Details].
Google adds new built-in AI-powered features to Chromebook [Details].
Gemini is now available in Chrome DevTools to help devs understand errors and warnings better with AI [Details].
🔦 Weekly Spotlight
Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 - Andrej Karpathy [Link].
Understanding the Cost of Generative AI Models in Production [Link].
Meta AI: An introduction to Vision Language Models (VLMs) - what VLMs are, how they are trained, and how to effectively evaluate VLMs [Link].
Finalists from the Mistral AI and Cerebral Valley hackathon in Paris [Link].
A series of Video tutorials by Meta AI on how to build with Llama models across Linux, Windows, Mac and more [Video Links].
OpenDevin x CodeQwen1.5: A Small Finetuned Model for Open Coding Assistant [ Video Link]
Building an AI Agent for SEO Research and Content Generation [Link].
Hi, AI: Our Thesis on AI Voice Agents - by a16z [Link].
LlamaFS: A self-organizing file system with llama 3 [Link]
What We Learned from a Year of Building with LLMs (Part I) [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
timeOS: an AI productivity companion that captures and summarizes your day, organizes information, and proactively surfaces the knowledge you need.
Stable Projectorz: A free tool for generating textures via Automatic1111 StableDiffusion.
IKI.AI:Save any web page, pdf, youtube, or note. An assistant, aware of all your knowledge, will fetch information, provide structured answers, brainstorm, extract ideas, or write text. Augmented with web search and curated index.
Frontly: Build AI powered apps and internal tools with no code
Last week’s issue
You can support my work via BuyMeaCoffee.
Thanks for reading and have a nice weekend! 🎉 Mariam.
I subscribe to many AI newsletters and these updates you shared are really interesting. I also appreciated the 'Weekly Spotlight' section. P.S. I subscribed!