Mistral Large, vocal expressive avatar videos, Generative virtual worlds, Reliable text rendering and Magic Prompt, DJ Mode, AI-powered film making and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #53):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
From our sponsors:
Add an AI assistant to your SaaS in minutes.
Rehance's AI assistants help users get things done on your site, reducing churn and keeping them happy. After you integrate an assistant, your users can ask it in plain English to perform tasks in your SaaS, and the assistant will take care of the rest!
Help your users automate their work at rehance.ai.
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Mistral introduced a new model Mistral Large. It reaches top-tier reasoning capabilities, is multi-lingual by design, has native function calling capacities and has 32K tokens context window. The pre-trained model has 81.2% accuracy on MMLU. Alongside Mistral Large, Mistral Small, a model optimized for latency and cost has been released. Mistral Small outperforms Mixtral 8x7B and has lower latency. Mistral also launched a ChatGPT like new conversational assistant, le Chat Mistral [Details].
Alibaba Group introduced EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, it can generate vocal avatar videos with expressive facial expressions, and various head poses [Details].
Ideogram introduced Ideogram 1.0, a text-to-image model trained from scratch for state-of-the-art text rendering, photorealism, prompt adherence, and a feature called Magic Prompt to help with prompting. Ideogram 1.0 is now available to all users on ideogram.ai [Details].
Google DeepMind introduced Genie (generative interactive environments), a foundation world model trained exclusively from Internet videos that can generate interactive, playable environments from a single image prompt [Details].
Pika Labs launched Lip Sync feature, powered by audio from Eleven Labs, for its AI generated videos enabling users to make the characters talk with realistic mouth movements [Video].
UC Berkeley introduced Berkeley Function Calling Leaderboard (BFCL) to evaluate the function calling capability of different LLMs. Gorilla Open Functions v2, an open-source model that can help users with building AI applications with function calling and interacting with json compatible output has also been released [Details].
Qualcomm launched AI Hub, a curated library of 80+ optimized AI models for superior on-device AI performance across Qualcomm and Snapdragon platforms [Details].
BigCode released StarCoder2, a family of open LLMs for code and comes in 3 different sizes with 3B, 7B and 15B parameters. StarCoder2-15B is trained on over 4 trillion tokens and 600+ programming languages from The Stack v2 dataset [Details].
Researchers released FuseChat-7B-VaRM, which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B, surpassing GPT-3.5 (March), Claude-2.1, and approaching Mixtral-8x7B-Instruct [Details].
The Swedish fintech Klarna’s AI assistant handles two-thirds of all customer service chats, some 2.3 million conversations so far, equivalent to the work of 700 people [Details].
Lightricks introduces LTX Studio, an AI-powered film making platform, now open for waitlist sign-ups, aimed at assisting creators in story visualization [Details].
Morph partners with Stability AI to launch Morph Studio, a platform to make films using Stability AI–generated clips [Details].
JFrog's security team found that roughly a 100 models hosted on the Hugging Face platform feature malicious functionality [Details].
Playground released Playground v2.5, an open-source text-to-image generative model, with a focus on enhanced color and contrast, improved generation for multi-aspect ratios, and improved human-centric fine detail [Details].
Together AI and the Arc Institute released Evo, a long-context biological foundation model based on the StripedHyena architecture that generalizes across DNA, RNA, and proteins.. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale (over 650k tokens in length) [Details].
Adobe previews a new generative AI music generation and editing tool, Project Music GenAI Control, that allows creators to generate music from text prompts, and then have fine-grained control to edit that audio for their precise needs [Details | video].
Microsoft introduces Copilot for Finance, an AI chatbot for finance workers in Excel and Outlook [Details].
The Intercept, Raw Story, and AlterNet sue OpenAI and Microsoft, claiming OpenAI and Microsoft intentionally removed important copyright information from training data [Details].
Huawei spin-off Honor shows off tech to control a car with your eyes and chatbot based on Meta’s AI [Details].
Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI [Details].
🔦 Weekly Spotlight
Official doc of Qwen1.5 [Link].
Prompt Engineering with Llama 2 - a new free short course by DeepLearning.AI in collaboration with Meta [Link].
Chat with your data natively on Apple Silicon using MLX Framework [Link].
TTS Arena: Benchmarking TTS Models in the Wild [Link].
MobiLlama : Small Language Model tailored for edge devices [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Blobr: An AI assistant that consolidates and reconciles data from all your connected SaaS and provides insights.
MusicFX DJ: Google has added ‘DJ Mode’ in MusicFX, the generative text-to-music tool powered by Google's MusicLM. DJ Mode lets you generate a real-time stream of music by adding and adjusting musical prompts to evolve the music live. You can add up to 10 musical prompts that can include instruments, genres, emotions, etc.
Cursor Copilot++: A more powerful version of Copilot for developers that can suggest mid-line completions and entire diffs. Trained to autocomplete on sequences of edits, it's quick to understand the change you're making.
Thanks for reading and have a nice weekend! 🎉 Mariam.
The EMO demo videos are so crazy. We're *this* close to making real-life versions of the talking portraits from Harry Potter.