Cross-language voice cloning from 3-sec audio, any-to-any multimodal LLM, Stablility AI Music generator, Chain of Density (CoD) prompt, and more
Greetings and welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #31):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Stability AI launched Stable Audio, a generative AI tool for music & sound generation from text. The underlying latent diffusion model architecture uses audio conditioned on text metadata as well as audio file duration and start time [Details].
Coqui released XTTS - a new voice generation model that lets you clone voices in 13 different languages by using just a quick 3-second audio clip [Details].
Microsoft Research released and open-sourced Phi-1.5 - a 1.3 billion parameter transformer-based model with performance on natural language tasks comparable to models 5x larger [Paper ].
Project Gutenberg, Microsoft and MIT have worked together to use neural text-to-speech to create and release thousands of human-quality free and open audiobooks [Details].
Researchers present NExT-GPT - an any-to-any multimodal LLM that accepts inputs and generate outputs in arbitrary combinations of text, images, videos, and audio [Details | Demo].
Chain of Density (CoD): a new prompt introduced by researchers from Salesforce, MIT and Colombia University that generates more dense and human-preferable summaries compared to vanilla GPT-4 [Paper].
Adept open-sources Persimmon-8B, releasing it under an Apache license. The model has been trained from scratch using a context size of 16K [Details].
Adobe's Firefly generative AI models, after 176 days in beta, are now commercially available in Creative Cloud, Adobe Express, and Adobe Experience Cloud. Adobe is also launching Firefly as a standalone web app [Details].
Deci released DeciLM 6B, a permissively licensed, open-source foundation LLM that is 15 times faster than Llama 2 while having comparable quality [Details].
Researchers release Scenimefy - a model transforming real-life photos into Shinkai-animation-style images [Details | GitHub].
Microsoft open sources EvoDiff, a novel protein-generating AI that could be used to create enzymes for new therapeutics and drug delivery methods as well as new enzymes for industrial chemical reactions [Details].
Several companies including Adobe, IBM, Nvidia, Cohere, Palantir, Salesforce, Scale AI, and Stability AI have pledged to the White House to develop safe and trustworthy AI, in a voluntary agreement similar to an earlier one signed by Meta, Google, and OpenAI [Details].
Microsoft will provide legal protection for customers who are sued for copyright infringement over content generated using Copilot, Bing Chat, and other AI services as long as they use built-in guardrails [Details].
NVIDIA beta released TensorRT-LLM, an open-source library that accelerates and optimizes inference performance on the latest LLMs on NVIDIA Tensor Core GPUs [Details].
Pulitzer Prize winning novelist Michael Chabon and several other writers sue OpenAI of copyright infringement [Details].
NVIDIA partners with two of India’s largest conglomerates, Reliance Industries Limited and Tata Group, to create an AI computing infrastructure and platforms for developing AI solutions [Details].
Roblox announced a new conversational AI assistant that let creators build virtual assets and write code with the help of generative AI [Details].
Google researchers introduced MADLAD-400 - a 3T token multilingual, general web-domain, document-level text dataset spanning 419 Languages [Paper].
A recent survey by Salesforce show that 65% of generative AI users are Millennials or Gen Z, and 72% are employed. The survey included 4,000+ people across the United States, UK, Australia, and India [Details].
Meta is reportedly working on an AI model designed to compete with GPT-4 [Details].
🔦 Weekly Spotlight
How Are Consumers Using Generative AI? A detailed report by a16z [Link].
Apple’s iPhone 15 launch focused heavily on AI — even though the tech giant didn’t mention it [Link].
Asking 60+ LLMs a set of 20 questions [Link].
A Twitter thread on companies that are hiring for Generative AI talent [Link].
Agents: an open-source library/framework for building autonomous language agents. [GitHub Link]
RestGPT: a large language model based autonomous agent to control real-world applications, such as movie database and music player [GitHub Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Pika Labs: Pika lets you generate videos from text prompts like Runway’s generative AI video tool. Pika Labs have now introduced a camera movement parameter that lets you control camera with customized intensity and direction with "-camera".
Trickle: Provides AI-generated insightful summaries for each uploaded screenshot, especially for text-heavy and unstructured diagrams.
Video Translate by HeyGen: Translate your video with natural voice clone & speaking style with lip-syncing.
v0: a generative user interface system by Vercel. It generates copy-and-paste friendly React code based on Shadcn UI and Tailwind CSS.
📕 📚 AI Skillset: Learn & Build
Google has released new courses and labs for learning about generative AI [Link].
Implementing LLM output validation using Guardrails AI [Link].
Build your own Notion chatbot [Link].
AI Brews is free, and your sharing it with a friend helps us grow. Thanks for your support and have a nice weekend! 🎉 Mariam.