Foundation Model for Efficient Enterprise Search, fully open-source Text-to-speech model, Native Audio understanding in Gemini 1.5 Pro, AI film competition, Physical AI model, Mixtral 8×22B & More
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #58 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
From our sponsors:
💥 Master AI & ChatGPT for FREE in just 3 hours 🤯
Join this 3-hour ChatGPT & AI Masterclass (worth $99) by Growthschool to master AI tools and ChatGPT hacks at no cost.
Click to Register ($0 for the First 100 people)
In this 3 hour Masterclass, you will learn how to:
🚀 Do quick excel analysis & make AI-powered PPTs in just 5 minutes
🚀 Build your own CustomGPTs & personal AI assistant to save 10+ hours
🚀 Become an expert at prompting & learn 20+ AI tools
🚀Research faster & make your life a lot simpler & more…
👉 Register for the masterclass here (Offer Valid for First 100 people only)🎁
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Cohere introduced Rerank 3, a new foundation model purpose built for efficient enterprise search and Retrieval Augmented Generation (RAG) systems. It enables search over multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables in 100+ languages [Details].
Google DeepMind used deep reinforcement learning (deep RL) to train humanoid robots to play a simplified one-versus-one soccer game. The agents learnt by trial and error and could cope with unexpected interference in the real world. They were able to walk, turn, kick and stand up faster than manually programmed skills on this type of robot. They could also combine movements to score goals, anticipate ball movements and block opponent shots - thereby developing a basic understanding of the game [Details ].
Hugging Face researchers released Parler TTS, a fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more [Details | Demo]
Mistral AI released Mixtral 8×22B, a 176B parameters Sparse Mixture of Experts model with context length of 65k tokens - Apache 2.0 license [Link | Hugging Face].
Google :
The input modalities for Gemini 1.5 Pro now expanded to include audio (speech) understanding in both the Gemini API and Google AI Studio. You can upload an audio recording of a lecture, for example, and Gemini 1.5 Pro can turn it into a quiz with an answer key. Additionally, Gemini 1.5 Pro is now able to reason across both image (frames) and audio (speech) for videos uploaded in Google AI Studio [Details].
Gemini 1.5 Pro is now available in 180+ countries via the Gemini API in public preview [Details].
Two new variants to Gemma family of lightweight, open models: CodeGemma for code completion and generation tasks as well as instruction following, and RecurrentGemma, an efficiency-optimized architecture for research experimentation [Details + Hugging Face blog].
Google Vids, a new AI-powered video creation app for work with real-time collaboration announced. It can generate a storyboard that you can easily edit, and after choosing a style, it pieces together your first draft with suggested scenes from stock videos, images, and background music and voiceover. Vids is being released to Workspace Labs in June [Details].
Vertex AI Agent Builder launched. It lets developers easily build and deploy enterprise-ready gen AI experiences using natural language or a code-first approach [Details].
new Gemini-powered security updates to Chronicle and Workspace [Details].
Gemini 1.0 Pro added to Android Studio as AI coding assistant [Details].
Cohere released Command R+, a RAG-optimized multilingual model designed to tackle enterprise-grade workloads. It support Multi-Step Tool Use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. Command R+ is available on HuggingChat [Details |Hugging Face].
Archetype AI introduced Newton, a physical AI foundational model that is capable of perceiving, understanding and reasoning about the world. It fuses real-time sensor data – such as from radars, cameras, accelerometers, temperature sensors, and more – with natural language, so you can ask open-ended questions about the world around you [Details].
Intercom launched Fin AI Copilot, a personal AI assistant for customer service agents. It uses RAG + semantic search to generate answers for support agents via internal knowledge bases, public URLs etc. Fin AI Copilot retains the context from a conversation with a support agent, so the agent can ask Fin follow-up questions later [Details].
Meta AI released Open-Vocabulary Embodied Question Answering (OpenEQA) framework—a new benchmark which measures an AI agent’s understanding of physical spaces via questions like “Where did I leave my badge?” [Details].
OpenAI’s new GPT-4 Turbo model, with improved capabilities in writing, math, logical reasoning, and coding, is now available to paid ChatGPT users and generally available via the API. Vision requests can now also use JSON mode and function calling [Details].
Poe introduced a new way for model developers and bot creators to generate revenue on Poe platform. Creators can now set a per-message price for their bots and generate revenue every time a user messages them [Details].
Oracle Financial Services introduced Oracle Financial Services Compliance Agent that helps banks mitigate anti-money-laundering risks [Details].
Apple Researchers present Ferret-UI, a new multimodal large language model (MLLM) tailored for enhanced understanding of mobile UI screens. Ferret-UI is able to perform referring tasks (e.g., widget classification, icon recognition, OCR) with flexible input formats (point, box, scribble) and grounding tasks (e.g., find widget, find icon, find text, widget listing) on mobile UI screens [Paper].
Stability AI released Stable LM 2 12B, a pair of powerful 12 billion parameter language models trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch, featuring a base and instruction-tuned model [Details].
Anthropic announced the Build with Claude contest, running from April 9th to April 16th, 2024. The top 5 winners will win $1,000 in API credits [Details].
Meta AI introduced the next generation of the Meta Training and Inference Accelerator (MTIA), the family of custom-made chips designed for Meta’s AI workloads. This new MTIA chip has improved performance by 3x over the first generation chip across four key model evaluations [Details].
Pika Labs and ElevenLabs are launching a 72-hour AI short film competition, FilmFAST, from April 12-14 [Details].
Intel introduced the Gaudi 3 AI accelerator, claiming to deliver 50% on average better inference and 40% on average better power efficiency than Nvidia H100 at a lower cost [Details].
Stability AI released Cos Stable Diffusion XL 1.0 and Cos Stable Diffusion XL 1.0 Edit, fine-tuned SDXL models that can produce full color range images [Hugging Face | Unofficial Demo]
Replit announced Code Repair, a low-latency code repair AI agent that fixes code automatically without prompting and outperforms GPT-4 and Claude 3 Opus. Replit also announced early access to a new AI-powered Replit Teams product [Details].
Meta confirmed that its Llama 3 open source LLM is coming in the next month [Details].
Apple researchers have developed an AI system called ReALM (Reference Resolution As Language Modeling) that can ‘see’ and understand screen context [Details | Paper]
🔦 Weekly Spotlight
AIDE, an AI-powered data science assistant that can autonomously understand task requirements, design, and implement solutions [Link].
Anthropic cookbook for tool use. Learn how to integrate Claude with external tools and functions to extend its capabilities [Link].
Extracting data from unstructured text and images with Datasette and GPT-4 Turbo [Link].
SWE-agent: A Deep Dive - the open-source agent by Princeton researchers that autonomously turns GitHub issues into pull requests using GPT-4 [Video | GitHub].
OpenUI by Weights & Biases platform - an open-source tool that let's you describe UI and it then renders it live. You can ask for changes and convert HTML to React, Svelte, Web Components, etc. {GitHub]
Free short course ‘Red Teaming LLM Applications’ by DeepLearning.AI - Learn to identify and evaluate vulnerabilities in large language model (LLM) applications [Link]
🔍 🛠️ AI Toolbox: Product Picks of the Week
Udio: A_powered app for music creation from text-prompting, developed by former Google DeepMind researchers. It’s free in beta with up to 1200 songs generations per month
AI App Generator by UIBakery: Generate internal tools, CRUD apps and admin panels on top of data with only a text prompt.
Sound AiSleep: an iOS app for kids bedtime stories spoken in your voice
VoiceNotes: Record new ideas, family moments, meetings, podcast takeaways, etc. Ask the AI to review past notes or brainstorm new ideas.
Thanks for reading and have a nice weekend! 🎉 Mariam.