Sitemap - 2024 - AI Brews

Google's reasoning model, New open-source physics AI engine, Odyssey's Explorer, OmniAudio, AI phone calling, FACTS Leaderboard, Meta Apollo, Global Talent Network, Falcon 3, Veo 2,DeepSeek-VL2 & more

Meta Motivo behavioral foundation model, Multimodal AI Agents, Gemini 2.0 Flash, Real-time Video and screen-sharing in the Multimodal Live API and ChatGPT, Phi-4, Sora, TRELLIS, Deep Research & more

New World Models, World's smallest vision language model, o1 Pro Mode, Luma Photon, Largest Open-Source video model, Amazon Nova, PaliGemma 2, Fish Speech 1.5, LTX Video and more

Open-Weight alternative to GPT-4o Realtime, Athene-V2, Stripe Agent Toolkit, Qwen2.5-Coder-32B, Prompt Canvas and Promptim, Vidu-1.5, MagicQuill, OpenCoder and more

Hunyuan-Large, AI model for open-world games, X-Portrait 2 for realistic character animations, FLUX1.1 [pro] Ultra and Raw, Magentic-One, Hume AI App, action model for GUI agents and More

Recraft V3, new best open-source compact language models, Wonder Animation, X to Voice, Meta's MarDini model, GitHub Spark and more

Claude's computer use, Mochi 1 & Allegro open-source video models, Aya Expanse, Stable Diffusion 3.5, HUGS by Hugging face, Meta Spirit LM, Act-One, Haiper 2.0, Multimodal Embed 3,Playground v3 & More

Open Multimodal Native Model, BeaGo, Mistral advanced Edge Models, Suno Scenes, Supercomplete, Movie Gen, Dash for Business, F5-TTS, Interactive Meeting Avatar and more

FLUX1.1 [pro], Canvas, Realtime API from OpenAI, open-sourcing of Reverb, Digital Twin Catalog, Copilot Vision, Depth Pro, new Whisper model, Pikaffects and more

Molmo, Meta's Vision Models, Next-Token Prediction Multimodal model, AlphaChip, Hundred Film Fund, HuggingChat macOS, Updated models from OpenAI and Google and more

Qwen 2.5, Seed-Music, StoryMaker, Jina Embeddings V3, Multimodal RAG, Luma Labs and Runway APIs, CogVideoX image-to-video generation model and More

OpenAI's new reasoning model, Empathic Voice Interface 2, Covers by Suno, Pixtral Multimodal model, DataGemma, Notes to Podcast and more

Replit Agent, world’s top open-source model, new real-time audio conversational model, AlphaProteo, style vs substance, fully open-source mixture-of-expert (MoE) language model and more

Ultra-long context, Qwen2-VL outperforms GPT-4o, new open weights Text to Video model, Eagle multimodal large language model, fastest AI inference and more

Jamba 1.5, Ideogram 2.0, Phi-3.5-MoE, Transfusion, Dream Machine 1.5, Mistral-NeMo-Minitron 8B, fine-tuning for GPT-4o and more

1T open-source LLM, Llama 3.1 405B, Mistral Large 2, Stable Video 4D, Outfit Anyone, SearchGPT, Llama Guard 3 and more

Mistral NeMo, GPT-4o mini, AI-powered platform to create controlled videos, SmolLM, Anthology Fund and more

Multilingual speech recognition model with emotion recognition, LivePortrait, Lynx open source hallucination detection model, EchoMimic, In-browser speech recognition and more

Real-time speech-to-speech model, Magic Insert, CriticGPT, Meta 3D Gen, Multimodal Canvas, InternLM 2.5, AI Voice Isolator, llama-agents and more

Claude 3.5 and Artifacts, Florence-2 and Meta's models, Expressive talking and singing characters, Video to Sounds Effects app, DeepSeek-Coder-V2, video-to-audio and more

Dream Machine, Apple Intelligence, AI to understand animals communication, Mixture of Agents, Real-time Expressive Generative Humans, Skybox AI new model and more

Qwen2, Kling video model, Text to Sound Effects, No Language Left Behind model, video gaming AI assistant, Audio uploads and more

Netflix of AI, Perplexity Pages, AI agent platform for financial analysis, Low-latency voice model, AI Prize Fight, Codestral, K2 and MAP-Neo models, AutoCoder, HuggingChat Tools and more

Copilot+ PCs, Phi-3 models, open Multimodal outperforms GPT-4V, Cohere's multilingual Aya 23 model and more

GPT-4o, Google I/O updates, interact with tables and charts in real-time, ZeroGPU, Gemini API dev competition, Chinese-to-image generation and more

Stable Artisan, ElevenLabs Music, Conversational AI Teams, AlphaFold 3, EMOPortraits, Retrieval Augmented Fine Tuning and more

Hyper-SD, Llama3 with 1M+ context length, new Robot that folds clothes & cooks, Qwen1.5-110B, Vidu AI and more

Apple Open Source AI Models, Expressive AI Avatars, phi-3-mini, Snowflake Arctic, Firefly Image 3 and More

Llama 3, Lifelike Audio-Driven Talking Faces, Reka Core, Stable Assistant with Stable Diffusion 3, Meta's real-time image generation, Driving with Natural Language, multi-bot chat and More

Foundation Model for Efficient Enterprise Search, fully open-source Text-to-speech model, Native Audio understanding in Gemini 1.5 Pro, AI film competition, Physical AI model, Mixtral 8×22B & More

Jamba hybrid SSM-Transformer Model, empathic LLM, Databrix MoE model, Animation driven by Audio, Qwen1.5-MoE, generative AI nurses and more

SceneScript, Automating the generation of foundation models, 01 Light, Stable Video,3D, AnimateDiff-Lightning, foundation models for self-driving and humanoid robots, NVIDIA NIM and more

Emu Video Edit , General game-playing AI agent, fully autonomous AI software engineer, DeepSeek-VL, Robotics Foundation Model, and more

Calude 3 Opus, Train a 70b language model at home, Firewall for AI, Fast 3D Object Generation from Single Images, multimodal foundation model for any-to-any search tasks, and more

Mistral Large, vocal expressive avatar videos, Generative virtual worlds, Reliable text rendering and Magic Prompt, DJ Mode, AI-powered film making and more

Meta's V-JEPA vision models, OpenAI's Sora video model, Gemini 1.5 Pro with 1 million tokens context, Reka Flash, Largest text-to-speech AI model and more

Ultra 1.0, new multilingual model, open-source conversational and empathic AI Voice Assistant, InteractiveVideo and more

Truly Open Models, Code Llama 70B, Amazon AI Hackathon , AI Grant, world’s greenest 7B model and more

Fix to ‘lazy’ GPT-4, commercially permissive OSS LLaVA models, new multimodal model for digital agents, Google's new video model and more

Screenshots to Code Dataset, Multi Motion Brush in Gen-2, Open-source AGI, AI system that solves complex geometry problems, AI in drug discovery and more

GPT Store, text-to-3d in under 10 seconds, DeepSeekMoE 16B, jailbreaking advanced LLMs, LLaVA-ϕ, Microsoft's open-source agent framework, and more

Open source AI voice cloning, Meta's full-bodied photorealistic avatars from audio, Mobile-ALOHA and more

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts