OpenAI's new reasoning model, Empathic Voice Interface 2, Covers by Suno, Pixtral Multimodal model, DataGemma, Notes to Podcast and more
OpenAI's new reasoning model, Empathic Voice Interface 2, Covers by Suno, Pixtral Multimodal model, DataGemma, Notes to Podcast and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #76 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
OpenAI has released o1, a new series of AI models designed to spend more time thinking before they respond. o1-preview, is a new AI reasoning model designed to solve complex problems in science, coding, and math. Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. OpenAI has also released o1-mini, a faster, cheaper reasoning model that is particularly effective at coding and is 80% cheaper than o1-preview [Details].
Hume AI introduced Empathic Voice Interface 2 (EVI 2), a new voice-to-voice foundation model. EVI 2 merges language and voice into a single model trained specifically for emotional intelligence. It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities [Details | video].
Mistral released Pixtral 12B, a multi-modal (image and text) vision LLM with128,000 token context. It is Apache 2.0 licensed and can handle 1024x1024 pixel images [Details].
Jina AI released reader-lm-0.5b and reader-lm-1.5b, two Small Language Models (SLMs) specifically trained to generate clean markdown directly from noisy raw HTML. Both models are multilingual and support a context length of up to 256K tokens. Despite their compact size, these models achieve state-of-the-art performance on this task, outperforming larger LLM counterparts while being only 1/50th of their size [Details].
Sunno AI added a new feature Covers in their AI music tool. Covers can transform anything, from a simple voice recording to a fully-produced track, into an entirely new style all while keeping the original melody. It lets users experiment with different genres, add lyrics to instrumentals, and modify their own singing voice [Details].
Google’s NotebookLLM now offers a new feature Audio Overview that turns your documents into engaging audio discussions. With one click, two AI hosts start up a lively “deep dive” discussion based on your sources. They summarize your material, make connections between topics, and banter back and forth [Details].
DeepSeek released DeepSeek-V2.5, an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions and outperforms both DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724 on most benchmarks [Details].
Fish Audio released Fish Speech 1.4, an open-source text-to-speech (TTS) model with with ultra-low latency that is trained on 700k hours of audio data in multiple languages [Hugging Face | Playground].
Adobe shares a peek at their upcoming Firefly Video Model [Details].
Phind introduced Phind-405B, a new flagship model based on Meta Llama 3.1 405B. Phind-405B scores 92% on HumanEval, matching Claude 3.5 Sonnet [Details]
Google released DataGemma, a set of open models that utilize Data Commons through Retrieval Interleaved Generation (RIG) & Retrieval Augmented Generation (RAG) to address AI hallucinations by grounding LLMs in real-world data for fact-checking. Data Commons is a publicly available knowledge graph containing over 240 billion rich data points across hundreds of thousands of statistical variables [Details].
Oracle has included 50+ role-based AI agents within the Oracle Fusion Cloud Applications Suite including Shift scheduling assistant, Employee hiring advisor, Benefits analyst etc. [Details].
SambaNova launched SambaNova Cloud, the fastest AI inference service . SambaNova Cloud runs Llama 3.1 70B at 461 tokens per second (t/s) and 405B at 132 t/s at full precision [Details].
Salesforce unveiled Agentforce, a low-code platform for building autonomous AI agents that can handle tasks in service, sales, marketing, and commerce [Details]
🔦 Weekly Spotlight
Mem0: an open-source memory layer for AI applications [Link].
What is Apple Intelligence, when is it coming and who will get it? [Link].
Reflection 70B model maker breaks silence amid fraud accusations [Link].
Stable Diffusion 3 Medium Fine-tuning Tutorial [Link].
Notes on OpenAI’s new o1 chain-of-thought models [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Meshy: 3D generative AI toolbox for effortlessly creating 3D assets from text or images
Hypernatural: Turn your ideas, scripts, podcasts and more into incredible short-form videos in minutes.
Thunderbit: a Chrome Extension that automates your web tasks using AI and No-Code
Last week’s issue
You can support my work via BuyMeaCoffee.
Thanks for reading and have a nice weekend! 🎉 Mariam.
No update on about the Reflection 70B scam? I think is part of your responsibility after announcing something that end up being a scam, announce that too, so people stay informed ;)