Recraft V3, new best open-source compact language models, Wonder Animation, X to Voice, Meta's MarDini model, GitHub Spark and more
Recraft V3, new best open-source compact language models, Wonder Animation, X to Voice, Meta's MarDini model, GitHub Spark and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #82 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Recraft AI introduced Recraft V3, a new frontier Text to Image model that recently topped the Artificial Analysis leaderboard under the pseudonym 'red_panda' with an ELO rating of 1172, surpassing Flux 1.1[Pro], Ideogram v2, Midjourney 6.1, Stable Diffusion 3.5 large and others. Recraft V3 excels at generating images with long texts, maintaining correct anatomy, and rendering complex scenes with accurate object count, color, and positioning. It allows users to control text size and placement.I t is now available for both free and paid users on desktop (Canvas), mobile (iOS & Android), and via API [Details].
Meta AI released MobileLLM, a state-of-the-art language model designed for mobile devices (125M, 350M, 600M, 1B and 1.5B sizes) with model weights and training code now available. It shows significant improvements compared to previous sub-billion models on chat benchmarks, and in API calling tasks, MobileLLM-350M even achieves a comparable exact-match score as the larger LLaMA-v2 7B model [Details].
Hugging Face research team released SmolLM2, a family of open-source compact language models available in three size: 135M, 360M, and 1.7B parameters. SmolLM2 1.7B outperforms Qwen 2.5 1.5B & Llama 3.21B. The models can handle text rewriting, summarization and function calling [Details].
GitHub Copilot now offers developers a choice of AI models, including Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview and o1-mini [Details].
GitHub introduced GitHub Spark, an AI-powered tool to build applications entirely in natural language. Sparks are fully functional micro apps that can integrate AI features and external data sources without requiring any management of cloud resources [Details].
Google Research introduced InkSight, an approach to convert photos of handwriting into digital ink. The model was trained to build an understanding of “reading”, so it can recognize written words, and “writing”, so it can output strokes that resemble handwriting. The model and inference code has been released [Details].
The Open Source Initiative (OSI), a long-running institution aiming to define and “steward” all things open source, released version 1.0 of its Open Source AI Definition (OSAID) [Details].
OpenAI’s search engine is now live in ChatGPT. It is available for paid subscribers and will expand to free, enterprise, and education users in the coming weeks [Details].
Apple introduced Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV . Apple also released weights for Ferret-UI [Paper | Hugging Face].
Wonder Dynamics announced the beta launch of Wonder Studio’s newest feature: Wonder Animation. It enables artists to shoot a scene with any camera, in any location, and turn the sequence into an animated scene with CG characters in a 3D environment [Details].
Meta released several new research artifacts for touch perception, robot dexterity, and human-robot interaction. The release include Meta Sparsh, the first general-purpose touch representation that works across many sensors and many tasks; Meta Digit 360, a breakthrough tactile fingertip with human-level multimodal sensing capabilities; and Meta Digit Plexus, a standardized hardware-software platform to integrate various fingertip and skin tactile sensors onto a single robot hand [Details].
Artificial Analysis has shared initial Video Arena results (20k votes). Hailuo AI tops the leaderboard, followed by open-source Mochi 1 model, released last week and Runway Gen 3 Alpha at no.3 [Details].
Meta AI present MarDini, a new family of video diffusion models that combine the strengths of diffusion and masked auto-regressive approaches for large-scale video generation [Details].
OpenAI is rolling out chat history search on ChatGPT web, allowing you to quickly search or resume past chats. Advanced Voice Mode is now available on MacOS and Windows desktop apps for ChatGPT. Realtime API now includes new voices and reduced prices [Link].
Google introduced Grounding with Google Search in its Gemini API and Google AI Studio, allowing developers to enhance responses from AI models with real-time information sourced directly from Google Search [Details].
OpenAI shared the GPT-4o System Card, providing details on GPT-4o's capabilities, limitations, and safety evaluations [Details].
Agentforce by Salesforce is now generally available. It lets companies build and deploy AI agents that can autonomously take action across any business function [Details].
OpenAI released SimpleQA, a factuality benchmark that measures the ability for language models to answer short, fact-seeking questions [Details]
Botto, a fully autonomous 'AI artist' made $351,600 in sales at the auction house Sotheby's, setting a new milestone in the history of AI art [Details].
The first set of Apple Intelligence features is now available with iOS 18.1, iPadOS 18.1, and macOS Sequoia 15.1. Apple Intelligence uses on-device processing, meaning that many of the models that power it run entirely on device. For more demanding tasks, Private Cloud Compute processes data in the cloud without storing or sharing it with Apple. Independent experts can inspect the code that runs on Apple silicon servers to continuously verify this privacy promise [Details].
Google shared that more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers [Source].
🔦 Weekly Spotlight
NotebookLlama by Meta : An open source version of NotebookLM [Link].
Agent.exe: a simple Electron app that lets Claude 3.5 Sonnet control your local computer directly, via the new Claude computer use capability [Link].
Integuru: An AI agent that generates integration code by reverse-engineering platforms' internal APIs [Link].
Microsoft’s agentic AI tool OmniParser rockets up the open source charts [Link].
Stable Diffusion 3.5 Large fine-tuning tutorial [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Loomos: Convert rough Loom recordings to professional videos.
Bolt.new: Prompt, run, edit & deploy full-stack web apps
Dream Lab: Canva’s new image generator powered by Leonardo’s Phoenix model, transforms text descriptions into images in various styles, including 3D render and illustration.
X to Voice by ElevenLabs: An open-source app, built using the new Voice Design API. It generates an AI Avatar with a unique voice with the data from your X/Twitter profile to create a prompt for what the voice might sound like.
Google Learn About: Grasp new topics and deepen your understanding with a conversational learning companion that adapts to your unique curiosity and learning goals
Last week’s issue
Thanks for reading and have a nice weekend! 🎉 Mariam.