New Text-to-Video model, DALL·E 3 , Long-form spoken audio generation, Training robots using Large Behavioral Models and more

Mariam

Sep 22, 2023

Greetings and welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.

In today’s issue (Issue #32):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

Genmo releases a new text-to-video model: Genmo Replay v0.1, which generate high-quality videos from text without the need for advanced prompt engineering. Genmo is available for free to create AI videos [Details | Genmo Replay] .
OpenAI unveils DALL·E 3 - a major update to the text-to-image model, which will be integrated in ChatGPT. It will be available to ChatGPT Plus and Enterprise users in October, via the API and in Labs later this fall. Creators can now also opt their images out from future training [Details].
Toyota Research Institute has developed a technique, powered by generative AI, that enables teaching robots new manipulation abilities in a single afternoon. Using the same robot, same code, and same setup, TRI taught over 60 different dexterous behaviors like peeling vegetables, using hand mixers, preparing snacks, and flipping pancakes [Details].
Microsoft announced [Details]:
1. Availability of AI Copilot for Windows from September 26th. Copilot will incorporate the context and intelligence of the web, your work data and what you are doing in the moment on your PC to provide better assistance. It will be integrated in Windows 11, Microsoft 365, Edge and Bing.
2. Bing will add support for DALL.E 3 and deliver more personalized answers based on search history.
3. New AI powered experiences in Paint, Photos and Clipchamp.
4. New AI-powered shopping experience
ElevenLabs released Projects - a tool that lets you generate an entire audiobook at the click of a button. Projects now supports .epub, .pdf, and .txt file imports, as well as initializing a project from a URL [Details].
Deci presents DeciDiffusion 1.0 - an open-source text-to-image latent diffusion model which is 3x faster than Stable Diffusion v1.5 with the same quality [Details].
Google researchers present a new approach that produces photo-realistic animations from a single picture. The model is trained on automatically extracted motion trajectories from a large collection of real video sequences [Details].
Google has updated Bard [ Details | YouTube]:
1. Bard Extensions: With extensions, Bard can now connect to your Google apps and services like Gmail, Docs, Drive, Google Maps, YouTube, and Google Flights and hotels.
2. Users can use Bard’s “Google it” button to more easily double-check its answers and evaluate whether there is content across the web to substantiate it.
3. Bard can now let you continue chat via shared public links.
YouTube announces new AI tools for creators. Dream Screen will let users create an AI-generated video or image background from text. Automatic AI-dubbing tool called Aloud, which will be integrated into YouTube Studio. AI-powered insights to generate video ideas and draft outlines. Assistive Search in Creator Music where AI will suggest the right music based on your description of your content [Details].
Amazon announced that its voice assistant Alexa is being upgraded with a new, custom-built large language model [Details].
IBM open-sources MoLM - a collection of ModuleFormer-based language models ranging in scale from 4 billion to 8 billion parameters. ModuleFormer is a new neural network architecture based on the Sparse Mixture of Experts (SMoE) by IBM researchers. [GitHub | Paper].
Neuralink, Elon Musk's brain implant startup, set to begin human trials [Details].
Lexica has released Aperture v3.5 - their latest next-gen image model that can create photorealistic images and follows your prompt with precision [Link].
OpenAI has invited domain experts to collaborate in evaluating and improving the safety of OpenAI's models by joining the new OpenAI Red Teaming Network [Link].
GitHub Copilot Chat (beta) is now available for all individuals [Link]
Replit announced virtual hackathon for projects built using Replit ModelFarm [Twitter Link].
Oracle brings voice-activated AI to healthcare with Clinical Digital Assistant [Details].
Google and the Department of Defense are building an AI-powered microscope to help doctors spot cancer [Details].

🔦 Weekly Spotlight

Generative AI’s Act Two - by Sequoia Capital [Link].
How to Get Hired in the Era of Generative AI - Harvard Business Review [Link].
38TB of data accidentally exposed by Microsoft AI researchers [Link].
DeepMind is using AI to pinpoint the causes of genetic disease [Link].
Tabby - a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot [Link].

🔍 🛠️ AI Toolbox: Product Picks of the Week

Meshy: 3D generative AI toolbox for effortlessly creating 3D assets from text or images
Understand Better: Natural Language Listening exercises that help you jump into conversations with native Speakers
Klu: From Notion to Gmail, Trello to GitHub—Klu seamlessly integrates,
letting you search, chat, and engage with your data across platforms

📕 📚 AI Skillset: Learn & Build

Andrew Ng: Opportunities in AI - 2023 - a discussion hosted by the Stanford Graduate School of Business [YouTube Link].
Fixing JSON errors in OpenAI functions using more OpenAI Functions [Link].
Fine-tuning GPT with OpenAI, Next.js and Vercel AI SDK [Link].

AI Brews is free, and your sharing it with a friend helps us grow. Thanks for your support and have a nice weekend! 🎉 Mariam.

Share AI Brews

Daniel Nest

Lots of news for end-users this week. Have been working on an all-news edition of 10X AI, and it's crazy how much overlap it has with your list. Meshy sounds pretty cool - I've toyed with https://www.csm.ai/ before, so curious to see how they compare!

Expand full comment

2 replies by Mariam and others

2 more comments...

AI Brews