Real-time speech-to-speech model, Magic Insert, CriticGPT, Meta 3D Gen, Multimodal Canvas, InternLM 2.5, AI Voice Isolator, llama-agents and more
Real-time speech-to-speech model, Magic Insert, CriticGPT, Meta 3D Gen, Multimodal Canvas, InternLM 2.5, AI Voice Isolator, llama-agents and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #69):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
From our sponsors:
200+ hours of research on AI tools & hacks packed in 3 hours (Extended July 4th Sale) 🇺🇸
The only AI Crash Course you need to master 20+ AI tools, multiple hacks & prompting techniques to work faster & more efficiently.
Just 3 hours - and you become a pro at automating your workflow and save upto 16 hours a week.
Get the crash course here for free (valid FOR next 24 hours only!)
This course on AI has been taken by 1 Million people across the globe, who have been able to:
Build No-code apps using UI-ZARD in minutes
Write & launch ads for your business (no experience needed + you save on cost)
Create solid content for 7 platforms with voice command & level up your social media
And 10 more insane hacks & tools that you’re going to LOVE!
Register & save your seat now (100 free seats only) 🎁
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Kyutai, a non-profit lab based in Paris, unveiled Moshi - a real-time speech-to-speech model that can listen and speak continuously, with no need for explicitly modelling speaker turns or interruptions. Moshi can perform small talk, explain various concepts, engage in roleplay in many emotions and speaking styles. It can be freely tested online here. Code and model weights will be released soon [Details].
Salesforce AI Research introduced APIGen, an automated data generation pipeline to produce verifiable high-quality datasets for function-calling applications. Salesforce trained two function-calling models of 1.3B and 6.7B sizes using APIGen. 6.7B model achieved a rank of 6th on the Berkeley Function-Calling Leaderboard, surpassing GPT-4o and Gemini-1.5-Pro, while the 1.3B model outperforms GPT-3.5-Turbo and Claude-3 Haiku [Details].
InternLM released InternLM 2.5 7B open model series with 1M context window and better tool utilization-related capabilities. It achieves State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B [Details].
Resemble AI introduced DETECT-2B, a new foundation model for multilingual deepfake detection of audio with 94%+ accuracy rate, 200ms for prediction and support for 30+ languages [Details].
Google presents Magic Insert - a generative AI method that allows you to drag-and-drop a subject into an image with a vastly different style achieving a style-harmonized and realistic insertion of the subject [Demo | Details]
Lmsys released RouteLLM, an open-source framework for cost-effective routing between large language models [Details].
Open AI has trained a model, based on GPT-4, called CriticGPT to catch errors in ChatGPT's code output and found that when people get help from CriticGPT to review ChatGPT code they outperform those without help 60% of the time [Details].
MIT’s CSAIL department introduced a soft robotic system for delicate grocery packing that integrates vision, tactile sensing, and soft fingers [Details].
Apple, in collaboration with the Swiss Federal Institute of Technology Lausanne (EPFL), released a public demo of their 4M AI model on the Hugging Face Spaces platform [Details]
Meta introduced Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation in under a minute. It combines two components, one for text-to-3D generation and one for text-to-texture generation [Details].
Perplexity improved ‘Pro search’ to tackle more complex queries and perform advanced math and programming computations. Pro Search can analyze search results and take intelligent actions based on its findings. This includes initiating follow-up searches that build on previous results. Pro Search is available for free for five times every four hours [Details].
Meta AI published a paper in April on a new training approach for better & faster LLMs using multi-token prediction. Meta has now released pre-trained models for code completion using this approach [Details].
LlamaIndex released (Alpha) llama-agents, a new open-source framework designed to simplify the process of building, iterating, and deploying multi-agent AI systems [Details]
Runway’s Gen-3 Alpha Text to Video is now available to everyone [Details].
Anthropic announced Build with Claude June 2024 contest, inviting developers to create projects using Claude through the Anthropic API [Details].
Figma temporarily disables its Make Design AI feature that was said to be ripping off the designs of Apple’s own Weather app [Details].
YouTube now lets you request removal of AI-generated content that simulates your face or voice [Details].
ElevenLabs has brought the voices of the legendary stars Judy Garland, James Dean, Burt Reynolds, and Sir Laurence Olivier to its Reader App for narrating digital texts [Details].
Cloudflare launched a new one-click feature to block all AI bots [Details].
Synthesia announced major updates to its platform including full-body avatars that will be fully controllable: users will be able to specify avatar appearance with images and videos, and create animations with skeleton sequences. AI Screen Recorder, a new product to turn screen recordings into video presentations, powered by AI avatars. Bulk creation and brand templates coming to AI Video Assistant soon [Details].
Meta, Hugging Face and Scaleway announced a new AI accelerator program for European startups looking to integrate open foundation models into their products. [Details].
🔦 Weekly Spotlight
Hyperspace: world's largest peer-to-peer AI network [Link].
GraphRAG by Microsoft: A modular graph-based Retrieval-Augmented Generation (RAG) system to extract meaningful, structured data from unstructured text using LLMs [Link].
Automated evaluation of RAG pipelines with exam generation [Link].
Beyond Benchmarks 2024 by Emergence Capital - report is based on data collected in April 2024 from over 600 B2B software companies [Link]
LLM Pricing Comparison Tool [Link]
Why we no longer use LangChain for building our AI agents by Octomind [Link].
Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
AI-Flow: Connect multiple AI models easily. Open-source platform for creating custom AI tools through a simple drag and drop interface.
AI Voice Isolator by ElevenLabs: Simply upload a file and remove street noise, mic feedback, and any other unwanted background noise.
GPT4All 3.0: Open-ssource local LLM desktop app with support for popular models like LLaMa, Mistral, Nous-Hermes, and hundreds more. Fully customize your chatbot experience with your own system prompts, temperature, context length, batch size, and more
Multimodal Canvas: an experimental test console for developers by Google, built with the Gemini API. Using Gemini 1.5 Flash, you can rapidly test multimodal prompts using drawing, camera, images, and more.
Last week’s issue
You can support my work via BuyMeaCoffee.
Thanks for reading and have a nice weekend! 🎉 Mariam.