Emu Video Edit , General game-playing AI agent, fully autonomous AI software engineer, DeepSeek-VL, Robotics Foundation Model, and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #55 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
DeepSeek released DeepSeek-VL, an open-source Vision-Language (VL) model designed for real-world vision and language understanding applications. The DeepSeek-VL family, includes 7B and1.3B base and chat models and achieves state-of-the-art or competitive performance across a wide range of visual-language benchmarks. Free for commercial use [Details | Hugging Face | Demo]
Cohere released Command-R, a 35 billion parameters generative model with open weights, optimized for long context tasks such as retrieval augmented generation (RAG) and using external APIs and tools for production-scale AI for enterprise [Details | Hugging Face].
Google DeepMind introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent for 3D virtual environments, trained on nine different video games. It can understand a broad range of gaming worlds, and follows natural-language instructions to carry out tasks within them, as a human might. It doesn’t need access to a game's source code or APIs and requires only the images on screen, and natural-language instructions provided by the user. SIMA uses keyboard and mouse outputs to control the games’ central character to carry out these instructions [Details].
Meta AI introduces Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data [Details].
Cognition Labs introduced Devin, the first fully autonomous AI software engineer. Devin can learn how to use unfamiliar technologies, can build and deploy apps end to end, can train and fine tune its own AI models. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted [Details].
Pika Labs adds sound effects to its AI video tool, Pika, allowing users to either prompt desired sounds or automatically generate them based on video content. [Video link].
Anthropic’s Claude 3 Opus ranks #1 on LMSYS Chatbot Arena Leaderboard, along with GPT-4 [Link].
The European Parliament approved the Artificial Intelligence Act. The new rules ban certain AI applications including biometric categorisation systems, Emotion recognition in the workplace and schools, social scoring and more [Details].
Huawei Noah's Ark Lab introduced PixArt--Σ, a Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. It achieves superior image quality and user prompt adherence with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters) [Details].
South Korean startup Hyodol AI has launched a $1,800 LLM-powered companion doll specifically designed to offer emotional support and companionship to the rapidly expanding elderly demographic in the country [Details].
Covariant introduced RFM-1 (Robotics Foundation Model -1), a large language model (LLM), but for robot language. Set up as a multimodal any-to-any sequence model, RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and a range of numerical sensor readings [Details].
Figure 01 robot integrated with an OpenAI vision-language model can now have full conversations with people [Link]
Deepgram announced the general availability of Aura, a text-to-speech model built for responsive, conversational AI agents and applications [Details | Demo].
Claude 3 Haiku model is now available alongside Sonnet and Opus in the Claude API and on claude.ai for Pro subscribers. Haiku outperforms GPT-3.5 and Gemini 1.0 pro while costing less, and is three times faster than its peers for the vast majority of workloads [Details].
Paddle announced AI Launchpad, a 6-week remote program for AI founders to launch and scale an AI business with $20,000 in cash prize [Details].
Midjourney adds feature for generating consistent characters across multiple gen AI images [Details].
The Special Committee of the OpenAI Board announced the completion of the review. Altman, Brockman to continue to lead OpenAI [Details]
Together.ai introduced Sequoia, a scalable, robust, and hardware-aware speculative decoding framework that improves LLM inference speed on consumer GPUs (with offloading), as well as on high-end GPUs (on-chip), without any approximations [Details].
OpenAI released Transformer Debugger (TDB), a tool developed and used internally by OpenAI's Superalignment team for investigating into specific behaviors of small language models [GitHub].
Elon Musk announced that xAI will open source Grok this week [Link].
🔦 Weekly Spotlight
LlamaParse (by LlamaIndex), the first gen AI-native document parsing solution [Link].
Claude Prompt Library by Anthropic [Link]
A deep dive into Midjourney's new Character References feature and Kaiber 3.0 update, showcasing its improved Video Motion capabilities [Video Link].
The Top 100 Gen AI Consumer Apps by a16z [Link].
Run any ComfyUI workflow with zero setup, directly on Hugging Face [Link].
Phospho: Open-source text analytics platform for LLM apps. Detect issues and extract insights from text messages of your users or your app [Link].
Gemini 1.5 Pro report by Google DeepMind [Link].
LiteLLM: Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs) [Link].
MobileClip: Official implementation of Apple’s research paper ‘MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training’ [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Charmed: A suite of AI tools for creating 3D video game art.
Replica API by Tavus: Tavus’ Phoenix model generates exceptionally realistic talking head videos, complete with natural face movements and expressions synchronized with input. Phoenix is access available via Replica API.
Thanks for reading and have a nice weekend! 🎉 Mariam.
Great post. Last was jam-packed with new stuff. It felt like a year of innovations squashed into a week!