Video-to-Text model by Twelve Labs, ChatGPT-powered robot tour guide, AI agent for teaching robots pen-spinning, Poe's Creator monetization, Grant for generative AI non-fiction short films and more
Greetings and welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #37):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Twelve Labs announced video-language foundation model Pegasus-1 (80B) along with a new suite of Video-to-Text APIs. Pegasus-1 integrates visual, audio, and speech information to generate more holistic text from videos, achieving the new state-of-the-art performance in video summarization benchmarks [Details].
Segmind announced open-source SSD-1B, the fastest diffusion-based text-to-image model. SSD-1B is 50% smaller and 60% faster compared to the SDXL 1.0 model with a minimal impact on image quality when compared to SDXL 1.0. Segmind has licensed it for commercial use [Detail].
BostonDynamics has created a robot tour guide using Spot integrated with Chat GPT and other AI models as a proof of concept for the robotics applications of foundational models [Details].
Jina AI launched jina-embeddings-v2 an Open-Source Text Embedding model with 8K context length, rivaling OpenAI’s proprietary model, text-embedding-ada-002 [Details].
NVIDIA research developed Eureka- an AI agent that uses LLMs to automatically generate reward algorithms to train robots to accomplish complex tasks. Eureka has taught robots to open drawers and cabinets, perform rapid pen-spinning tricks, toss and catch balls, manipulate scissors among others [Details].
Apple ML research introduces Matryoshka Diffusion (MDM), a new class of diffusion models for end-to-end high-resolution image and video synthesis. Distinct from existing works, MDM doesn't need a pre-trained VAE (e.g., SD) or training multiple upscaling modules [Hugging Face].
Generative AI startup 1337 (Leet) is paying users to help create AI-driven influencers [Details].
Meta research released an update of Habitat, an AI simulation platform for training robots on real-world interactions, alongside a 3D dataset, Habitat Synthetic Scenes Dataset. Habitat 3.0 supports both robots and humanoid avatars to enable human-robot collaboration on everyday tasks (e.g., tidying up the living room, preparing a recipe in the kitchen) [Details].
Quora has launched Creator monetization program for its chatbot platform, Poe. It is currently available to US residents, but will be expanding to other countries soon [Details].
Runway Studios in parternship with Artefacto announced OpenDocs - A program that provides selected documentary film projects with $2,500, an unlimited Runway plan and mentorship [Details].
Google expands its bug bounty program to target generative AI attacks [Details].
Amazon rolls out AI-powered image generation to help advertisers deliver a better ad experience for customers [Details].
Google Search rolls out ‘About this Image’ feature, allowing access to image metadata including fields that may indicate that it has been generated or enhanced by AI [Details].
OpenAI announced AI Preparedness Challenge for ‘catastrophic misuse prevention’. Responses will be accepted on a rolling basis through December 31, 2023. [Details].
🔦 Weekly Spotlight
AI products in the Time’s ‘The 200 Best Inventions of 2023’ list. Stability AI’s Stable Audio and Meta's SeamlessM4T are part of the list amongst others [Link].
Nightshade, a new data poisoning tool, messes up training data in ways that could cause serious damage to image-generating AI models [Link].
Twitter/X thread on the projects at the Dreamscape Creativity Hackath [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
TimeToTok: An AI copilot and agent for TikTok account growth. The AI agent analyzes your content and account daily, identify issues and potential growth opportunities and provides action suggestions via email.
Galileo AI: Generative AI for user interface design.
Sync: an API for realtime lip-sync. Sync any video to any audio in any language.
📕 📚 AI Skillset: Learn & Build
State of Open Source AI Book - 2023 Edition - An open-source book covering the major categories in the Open Source AI space, from model evaluations to deployment [Link].
New free short course on Deeplearning.ai: Functions, Tools, and Agents with LangChain [Link].
AI Brews is free, and your sharing it with a friend helps us grow. Thanks for your support and have a nice weekend! 🎉 Mariam.