Meta's AI for recreation of mental imagery, Multimodal model for agents, fastest voice LLM, Microsoft AI bug bounty, new text-to-3D tool and more
Greetings and welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #36):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build
From our sponsors:
Generative AI: An Executive Guide
Take your lead from top executives and learn how to apply AI in your organisation today with Generative AI: An Executive Guide.
"The most clear, structured and thoughtful guide to generative AI I have read." - Chief Commercial Officer, TeamViewer
“This is easily the best AI guide I have read (and I have given up on a few). Provides really useful structures to organise and understand the various aspects of AI and LLMs.” - Business Development Manager, Brittain Wynard
Use AIBREWS at checkout for a special 20% discount!
Buy now 👉 Generative AI: An Executive Guide
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Adept open-sources Fuyu-8B - a multimodal model designed from the ground up for digital agents, so it can support arbitrary image resolutions, answer questions about graphs and diagrams, answer UI-based questions and more. It has a much simpler architecture and training procedure than other multi-modal models- there is no image encoder [Details].
Meta AI researchers present an AI system that can be deployed in real time to reconstruct, from brain activity, the images perceived and processed by the brain at each instant. It uses magnetoencephalography (MEG), a non-invasive neuroimaging technique in which thousands of brain activity measurements are taken per second [Details].
Scaled Foundations released GRID (General Robot Intelligence Development) - a platform that combines foundation models, simulation and large language models for rapid prototyping of AI capabilities in robotics. GRID can ingest entire sensor/control APIs of any robot, and for a given task, generate code that goes from sensor -> perception -> reasoning -> control commands [Details].
DALL·E 3 is now available in ChatGPT Plus and Enterprise. OpenAI shares the DALL·E 3 research paper [Details | Paper].
PlayHT released PlayHT Turbo - a new version of their conversational voice model, PlayHT 2.0 that generates speech in under 300ms via network [Details].
Google announced a new feature of Google Search that helps English learners practice speaking words in context. Responses are analyzed to provide helpful, real-time suggestions and corrections [Details].
Researchers from EleutherAI present Llemma: an open language model for math trained on up to 200B tokens of mathematical text. The performance of Llemma 34B approaches Google's Minerva 62B despite having half the parameters [Details].
Midjourney partnered with Japanese game company Sizigi Studios to launch Niji Journey, an Android and iOS app. Users can generate entire range of art styles, including non-niji images, by selecting “v5” in the settings. Existing Midjourney subscribers can log into it using their Discord credentials without paying more. [Details].
Microsoft Azure AI present Idea2Img - a multimodal iterative self-refinement system that enhances any T2I model for automatic image design and generation, enabling various new image creation functionalities togther with better visual qualities [Details].
China’s Baidu unveiled the newest version of its LLM, Ernie 4.0 and several AI-native applications including Baidu Maps for AI-powered navigation, ride-hailing, restaurant recommendations, hotel booking etc. [Details].
Stability AI released stable-audio-tools - repo for training and inference of generative audio models [Link].
Microsoft announced the new Microsoft AI bug bounty program with awards up to $15,000 to discover vulnerabilities in the AI-powered Bing experience [Details].
Google researchers present PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger [Paper].
Morph Labs released Morph Prover v0 7B, the first open-source model trained as a conversational assistant for Lean users. Morph Prover v0 7B is a chat fine-tune of Mistral 7B that performs better than the original Mistral model on some benchmarks [Details].
Microsoft research presented HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world [Details].
YouTube gets new AI-powered ads that let brands target special cultural moments [Details].
Anthropic Claude is now available in 95 countries [Link].
Runway AI is launching 3-month paid Runway Acceleration Program to help software engineers become ML practitioners [Details].
🔦 Weekly Spotlight
Twitter/X thread on the finalists at the TED Multimodal AI Hackathon [Link].
3D to Photo: an open-source package by Dabble, that combines threeJS and Stable diffusion to build a virtual photo studio for product photography [Link]
Multi-modal prompt injection image attacks against GPT-4V [Link].
Meet two open source challengers to OpenAI’s ‘multimodal’ GPT-4V [Link].
From physics to generative AI: An AI model for advanced pattern generation [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Riffusion: An open-source generative AI music app that generates lyrics, music and the related image from a text prompt.
Masterpiece X – Generate: a new text-to-3D AI web tool by Masterpiece Studio in collaboration with NVIDIA that lets you create humans, animals, and object 3D model using words. Generated models are compatible with Blender, Unity, and Unreal Engine.
📕 📚 AI Skillset: Learn & Build
How to design an Agent for Production - A tutorial that explains the underlying tech and logic used to deploy a scheduling agent, Cal.ai, in production using LangChain [Link].
A deep dive into the world's smartest email AI - A blog post by Shortwave on how their AI assistant works at a deep technical level [Link].
AI Brews is free, and your sharing it with a friend helps us grow. Thanks for your support and have a nice weekend! 🎉 Mariam.
Okay, the Riffusion link was amazing, what a great tool.