AudioPaLM by Google, Hyper-realistic AI generated images, Wimbledon's use of AI-generated spoken commentary, Dropbox AI, Self-improving robotic agent and More
Greetings and welcome to this week's AI Brews - your thoughtfully curated guide to AI products, learning resources and a concise roundup of the week's impactful news. Our goal? To provide a balanced selection in the rapidly evolving AI landscape, keeping you well-informed without the information overload. We value your feedback - don't hesitate to reply to this email with suggestions on how we can make this better for you. Thanks!
In today’s issue:
AI Pulse: News, Insights and Social Spotlight of the Week
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build
🗞️🗞️ AI Pulse: News, Insights and Social Spotlight of the Week
🔥 News & Insights
Stability AI has announced SDXL 0.9, a significant upgrade to their text-to-image model suite that can generate hyper-realistic images. SDXL 0.9 has one of the largest parameter counts in open-source image models (3.5B) and is available on the Clipdrop by Stability AI platform [Details].
Google presents AudioPaLM, a Large Language Model that can speak and listen. AudioPaLM fuses text-based PaLM-2 and speech-based AudioLM models into a unified multimodal architecture that can process and generate text and speech [Examples | paper].
Google researchers present DreamHuman, a method to generate realistic animatable 3D human avatar models solely from textual descriptions [Details].
Meta introduced Voice box - the first generative AI model for speech that can accomplish tasks it wasn't specifically trained for. Like generative systems for images and text, Voicebox creates outputs in a vast variety of styles, and it can create outputs from scratch as well as modify a sample it’s given. But instead of creating a picture or a passage of text, Voicebox produces high-quality audio clips [Details | Samples | Paper].
Microsoft launched Azure OpenAI Service on your data in public preview, which enables companies to run supported chat models (ChatGPT and GPT-4) on their connected data without needing to train or fine-tune models [Details].
Google Deepmind introduced RoboCat, a new AI model designed to operate multiple robots. It learns to solve new tasks on different robotic arms, like building structures, inserting gears, picking up objects etc., with as few as 100 demonstrations. It can improve skills from self-generated training data [Details].
Wimbledon will use IBM Watsonx, to produce AI-generated spoken commentary for video highlights packages for this year's Championships. Another new feature for 2023 is the AI Draw Analysis, which utilises the IBM Power Index and Likelihood to Win predictions to assess each player’s potential path to the final [Details].
Dropbox announced Dropbox Dash and Dropbox AI. Dropbox Dash is AI-powered universal search that connects all of your tools, content and apps in a single search bar. Dropbox AI can generate summaries and provide answers from documents as well as from videos [Details].
Wayve presents GAIA-1 - a new generative AI model that creates realistic driving videos using video, text and action inputs, offering fine control over vehicle behavior and scene features [Details].
Opera launched a new 'One' browser with integrated AI Chatbot, ‘Aria’. Aria provides deeper content exploration by being accessible through text highlights or right-clicks, in addition to being available from the sidebar. [Details].
ElevenLabs announced ‘Projects’, available for early access, for long-form speech synthesis. This will enable anyone to create an entire audiobook without leaving the platform. ElevenLabs has reached over 1 million registered users [Details].
Vimeo is introducing new AI-powered video tools: a text-based video editor for removing filler words and pauses, a script generator, and an on-screen teleprompter for script display [Details].
Midjourney launches V5.2 that includes zoom-out outpainting, improved aesthetics, coherence, text understanding, sharper images, higher variation modes and a new /shorten command for analyzing your prompt tokens [Details].
Parallel Domain launched a new API, called Data Lab, that lets users use generative AI to build synthetic datasets [Details]
OpenAI considers creating an App Store in which customers could sell AI models they customize for their own needs to other businesses [Details]
OpenLM Research released its 1T token version of OpenLLaMA 13B - the permissively licensed open source reproduction of Meta AI's LLaMA large language model. [Details].
ByteDance, the TikTok creator, has already ordered around $1 billion worth of Nvidia GPUs in 2023 so far, which amounts to around 100,000 units [Details].
GPT-Engineer: Specify what you want it to build, the AI asks for clarification, generates technical spec and writes all necessary code [GitHub Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
💎 BeforeSunset: an AI daily planner tool that creates a customized schedule by syncing your calendar and to-do list.
💎 Nekton: A GPT-4 powered tool to automate day-to-day tasks using natural language.
💎 Upword: An AI research tool that makes reading, learning, and organizing new content more efficient. It can process multiple documents at the same time.
💎 QR Craft: A Stable Diffusion+Control Net powered tool for generating artistic QR-codes from a text prompt. You can try here without any sign-up.
📕 📚 AI Skillset: Learn & Build
Emerging Architectures for LLM Applications - a report by a16z based on conversations with AI startup founders and engineer [Link].
Create an AI digital assistant with Zapier [Link].
The New Language Model Stack: How companies are bringing AI applications to life [Link].
Why GPT-4's new ability to use tools via function calling is a big deal [Link].
Free Course by Activeloop on LangChain & Vector Databases in Production [Link].
AI Brews is completely free, and your sharing it with a friend helps us grow. Thanks for your support and have a nice weekend! 🎉 Mariam.