Ultra 1.0, new multilingual model, open-source conversational and empathic AI Voice Assistant, InteractiveVideo and more
Greetings and welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #51 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Google launches Ultra 1.0, its largest and most capable AI model, in its ChatGPT-like assistant which has now been rebranded as Gemini (earlier called Bard). Gemini Advanced is available, in 150 countries, as a premium plan for $19.99/month, starting with a two-month trial at no cost. Google is also rolling out Android and iOS apps for Gemini [Details].
Alibaba Group released Qwen1.5 series, open-sourcing models of 6 sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B. Qwen1.5-72B outperforms Llama2-70B across all benchmarks. The Qwen1.5 series is available on Ollama and LMStudio. Additionally, API on together.ai [Details | Hugging Face].
NVIDIA released Canary 1B, a multilingual model for speech-to-text recognition and translation. Canary transcribes speech in English, Spanish, German, and French and also generates text with punctuation and capitalization. It supports bi-directional translation, between English and three other supported languages. Canary outperforms similarly-sized Whisper-large-v3, and SeamlessM4T-Medium-v1 on both transcription and translation tasks and achieves the first place on HuggingFace Open ASR leaderboard with an average word error rate of 6.67%, outperforming all other open source models [Details].
Researchers released Lag-Llama, the first open-source foundation model for time series forecasting [Details].
LAION released BUD-E, an open-source conversational and empathic AI Voice Assistant that uses natural voices, empathy & emotional intelligence and can handle multi-speaker conversations [Details].
MetaVoice released MetaVoice-1B, a 1.2B parameter base model trained on 100K hours of speech, for TTS (text-to-speech). It supports emotional speech in English and voice cloning. MetaVoice-1B has been released under the Apache 2.0 license [Details].
Bria AI released RMBG v1.4, an an open-source background removal model trained on fully licensed images [Details].
Researchers introduce InteractiveVideo, a user-centric framework for video generation that is designed for dynamic interaction, allowing users to instruct the generative model during the generation process [Details |GitHub ].
Microsoft announced a redesigned look for its Copilot AI search and chatbot experience on the web (formerly known as Bing Chat), new built-in AI image creation and editing functionality, and Deucalion, a fine tuned model that makes Balanced mode for Copilot richer and faster [Details].
Roblox introduced AI-powered real-time chat translations in 16 languages [Details].
Hugging Face launched Assistants feature on HuggingChat. Assistants are custom chatbots similar to OpenAI’s GPTs that can be built for free using open source LLMs like Mistral, Llama and others [Link].
DeepSeek AI released DeepSeekMath 7B model, a 7B open-source model that approaches the mathematical reasoning capability of GPT-4. DeepSeekMath-Base is initialized with DeepSeek-Coder-Base-v1.5 7B [Details].
Microsoft is launching several collaborations with news organizations to adopt generative AI [Details].
LG Electronics signed a partnership with Korean generative AI startup Upstage to develop small language models (SLMs) for LG’s on-device AI features and AI services on LG notebooks [Details].
Stability AI released SVD 1.1, an updated model of Stable Video Diffusion model, optimized to generate short AI videos with better motion and more consistency [Details | Hugging Face] .
OpenAI and Meta announced to label AI generated images [Details].
Google saves your conversations with Gemini for years by default [Details].
🔦 Weekly Spotlight
A completely open-source AI Wearable device like Avi’s Tab, Rewind’s pendant, and Humane’s Pin [Link].
The Abundance Agenda by a16z: How AI will transform consumer technology [Link].
localllm by Google Cloud: Run LLMs locally on Cloud Workstations [Link].
In-browser background removal. Runs locally in your browser, powered by the RMBG V1.4 model [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Dorik AI: Generate websites with a prompt and edit without code
Galileo AI: a UI generation platform for easy and fast design ideation. It’s now open to all
Wondera: Karaoke and transform any songs in your AI voice
📕 📚 AI Skillset: Learn & Build
Mistral Getting Started guide by Mistral AI - covers prompting, RAG and embeddings [Link].
A guide to upscaling images with AI by Replicate [Link].
Thanks for reading and have a nice weekend! 🎉 Mariam.