OpenAI's new reasoning model, Empathic Voice Interface 2, Covers by Suno, Pixtral Multimodal model, DataGemma, Notes to Podcast and more

Sep 13, 2024

Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.

In today’s issue (Issue #76 ):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week

Sponsor AI Brews

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

OpenAI has released o1, a new series of AI models designed to spend more time thinking before they respond. o1-preview, is a new AI reasoning model designed to solve complex problems in science, coding, and math. Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. OpenAI has also released o1-mini, a faster, cheaper reasoning model that is particularly effective at coding and is 80% cheaper than o1-preview [Details].
Hume AI introduced Empathic Voice Interface 2 (EVI 2), a new voice-to-voice foundation model. EVI 2 merges language and voice into a single model trained specifically for emotional intelligence. It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities [Details | video].
Mistral released Pixtral 12B, a multi-modal (image and text) vision LLM with128,000 token context. It is Apache 2.0 licensed and can handle 1024x1024 pixel images [Details].
Jina AI released reader-lm-0.5b and reader-lm-1.5b, two Small Language Models (SLMs) specifically trained to generate clean markdown directly from noisy raw HTML. Both models are multilingual and support a context length of up to 256K tokens. Despite their compact size, these models achieve state-of-the-art performance on this task, outperforming larger LLM counterparts while being only 1/50th of their size [Details].
Sunno AI added a new feature Covers in their AI music tool. Covers can transform anything, from a simple voice recording to a fully-produced track, into an entirely new style all while keeping the original melody. It lets users experiment with different genres, add lyrics to instrumentals, and modify their own singing voice [Details].
Google’s NotebookLLM now offers a new feature Audio Overview that turns your documents into engaging audio discussions. With one click, two AI hosts start up a lively “deep dive” discussion based on your sources. They summarize your material, make connections between topics, and banter back and forth [Details].
DeepSeek released DeepSeek-V2.5, an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions and outperforms both DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724 on most benchmarks [Details].
Fish Audio released Fish Speech 1.4, an open-source text-to-speech (TTS) model with with ultra-low latency that is trained on 700k hours of audio data in multiple languages [Hugging Face | Playground].
Adobe shares a peek at their upcoming Firefly Video Model [Details].
Phind introduced Phind-405B, a new flagship model based on Meta Llama 3.1 405B. Phind-405B scores 92% on HumanEval, matching Claude 3.5 Sonnet [Details]
Google released DataGemma, a set of open models that utilize Data Commons through Retrieval Interleaved Generation (RIG) & Retrieval Augmented Generation (RAG) to address AI hallucinations by grounding LLMs in real-world data for fact-checking. Data Commons is a publicly available knowledge graph containing over 240 billion rich data points across hundreds of thousands of statistical variables [Details].
Oracle has included 50+ role-based AI agents within the Oracle Fusion Cloud Applications Suite including Shift scheduling assistant, Employee hiring advisor, Benefits analyst etc. [Details].
SambaNova launched SambaNova Cloud, the fastest AI inference service . SambaNova Cloud runs Llama 3.1 70B at 461 tokens per second (t/s) and 405B at 132 t/s at full precision [Details].
Salesforce unveiled Agentforce, a low-code platform for building autonomous AI agents that can handle tasks in service, sales, marketing, and commerce [Details]

🔦 Weekly Spotlight

Mem0: an open-source memory layer for AI applications [Link].
What is Apple Intelligence, when is it coming and who will get it? [Link].
Reflection 70B model maker breaks silence amid fraud accusations [Link].
Stable Diffusion 3 Medium Fine-tuning Tutorial [Link].
Notes on OpenAI’s new o1 chain-of-thought models [Link].

🔍 🛠️ AI Toolbox: Product Picks of the Week

Meshy: 3D generative AI toolbox for effortlessly creating 3D assets from text or images
Hypernatural: Turn your ideas, scripts, podcasts and more into incredible short-form videos in minutes.
Thunderbit: a Chrome Extension that automates your web tasks using AI and No-Code

Last week’s issue
Replit Agent, world’s top open-source model, new real-time audio conversational model, AlphaProteo, style vs substance, fully open-source mixture-of-expert (MoE) language model and more
September 6, 2024
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
Read full story