Meta's new text-to-video model + new stereo models, Music generation model by Google, Open-source multi-task audio-language model, LLaVA-Plus, Open-source LLM for European languages & more
Greetings and welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #40):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Meta AI introduces:
Emu Video: new text-to-video model that leverages Meta’s Emu image generation model and can respond to text-only, image-only or combined text & image inputs to generate high quality video [Details].
Emu Edit: This new model is capable of free-form editing through text instructions. Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched [Details].
Researchers present LLaVA-Plus, a general-purpose multimodal assistant that expands the capabilities of large multimodal models. LLaVA-Plus maintains a skill repository that contains a wide range of vision and vision-language pre-trained models (tools), and is able to activate relevant tools, given users’ multimodal inputs, for performing real-world tasks [Details].
Google Deepmind in collaboration with YouTube announce [Details]:
Lyria, a model that excels at generating high-quality music with instrumentals and vocals, performing transformation and continuation tasks, and giving users more nuanced control of the output’s style and performance.
Dream Track: an experiment in YouTube Shorts. Users can simply enter a topic and choose an artist from the carousel to generate a 30 second soundtrack for their Short. Using the Lyria model, Dream Track simultaneously generates the lyrics, backing track, and AI-generated voice in the style of the participating artist selected.
Music AI tools: Users can create new music or instrumental sections from scratch, transform audio from one music style or instrument to another, and create instrumental and vocal accompaniments. Louis Bell, Producer/Songwriter, builds a track with just a hum:
SiloGen announced Poro, an open-source 34 billion parameter LLM for English, Finnish and code. Future releases to support other European languages. Poro is freely available for both commercial and research use [Details].
Meta AI released new stereo models for MusicGen. By extending the delay codebook pattern to cover tokens from both left & right channels, these models can generate stereo output with no extra computational cost vs previous models [Hugging face |Paper ].
Alibaba Cloud introduced Qwen-Audio, an open-source multi-task audio-language model that supports various tasks, languages, and audio types, serving as a universal audio understanding model [Details | Demo].
Researchers present JARVIS-1, an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control in Minecraft [Details].
Microsoft announced:
Microsoft Copilot Studio: a low-code tool to quickly build, test, and publish standalone copilots and custom GPTs [Details].
Windows AI Studio to enable developers to fine-tune, customize and deploy state-of-the-art small language models, for local use in their Windows apps. In the coming weeks developers can access Windows AI Studio as a VS Code Extension [Details].
Microsoft Azure Maia: Custom-designed chip optimized for large language models training and inference [Details].
Text to speech avatar feature in Azure AI Speech to create synthetic videos of a 2D photorealistic avatar speaking [Details].
The addition of 40 new models to the Azure AI model catalog including Mistral, Phi, Jais, Code Llama, NVIDIA Nemotron [Details].
Redwood Research, a research lab for AI alignment, has unveiled that large language models (LLMs) can master “encoded reasoning,” a form of steganography. This allows LLMs to subtly embed intermediate reasoning steps within their generated text in a way that is undecipherable to human reader [Details].
Microsoft Research introduced phi-2 - at 2.7B size, phi-2 is much more robust than phi-1.5 with improved reasoning capabilities [Details].
Forward Health announced CarePods, a self-contained, AI-powered doctor’s office. CarePod users can get their blood drawn, throat swabbed and blood pressure read, all without a doctor or nurse. Custom AI powers the diagnosis, and behind the scenes, doctors write the appropriate prescription [Details].
You.com launched YOU API to connect LLMs to the web. The API is launching with three dedicated endpoints: Web Search, News and RAG [Details].
Notion announced Q&A, an AI assistant that provides answers using information from a Notion workspace [Details].
OpenAI has paused new ChatGPT Plus sign-ups due to the surge in usage post devday [Link].
Together.ai announced Together Inference Engine that up to 2x faster than other serverless APIs (eg: Perplexity, Anyscale, Fireworks AI, or Mosaic ML [Details].
Researchers in China have developed an AI-powered robot chemist that might be able to extract oxygen from water on Mars. The robot uses materials found on the red planet to produce catalysts that break down water, releasing oxygen [Details].
Nvidia announced H200 GPU that features 141GB of memory at 4.8 terabytes per second, nearly double the capacity and 2.4x more bandwidth compared with its predecessor, the NVIDIA A100 [Details].
🔦 Weekly Spotlight
Retool’s 2023 report on State of AI in production which surveyed 1,500+ tech people [Link].
Exploring GPTs: ChatGPT in a trench coat? by Simon Willison [Link].
draw-a-ui: an open-source app that uses tldraw and the gpt-4-vision API to generate html based on a wireframe you draw [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Meshy: Create 3D models with AI. Three modes available: Text to 3D, Image to 3D and Text to Texture
VideoGPT by VEED: Create videos directly in ChatGPT using only text.
📕 📚 AI Skillset: Learn & Build
How to create advanced GPTs for your website (Custom Actions w/ Assistants API) [Link].
OpenAI DevDay: Breakout Sessions [Link].
OpenAI DevDay: AI Frontiers Session [Link].
AI Brews is free, and your sharing it with a friend helps us grow. Thanks for your support and have a nice weekend! 🎉 Mariam.