Jamba 1.5, Ideogram 2.0, Phi-3.5-MoE, Transfusion, Dream Machine 1.5, Mistral-NeMo-Minitron 8B, fine-tuning for GPT-4o and more

Aug 23, 2024

Hi there. AI Brews is back - apologies for the missed issues! Sincere thanks for your messages and support; it means a lot!

In today’s issue (Issue #73 ):

AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week

From our sponsors:

Streamline AI Coding Workflow with 16x Prompt

Try Now for Free

16x Prompt is an app designed to streamline AI coding workflow. It offers a structured way to create prompts for coding tasks with relevant context.
Easily manage code context locally across different projects, save your frequently used prompts and custom instructions for different tasks, and compare responses from different AI models.
16x Prompt is an all-in-one toolbox that allows you to ship code faster with ChatGPT and Claude.
Try Now for Free

🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance

🔥 News

AI21 released the Jamba 1.5 family of open models with a 256K effective context window: Jamba 1.5 Mini and Jamba 1.5 Large, built on the novel SSM-Transformer architecture which weaves together Transformer’s quality with Mamba’s efficiency. Jamba 1.5 Mini outperforms Claude 3 Haiku, Mixtral 8x22B and Command-R+. Jamba 1.5 Large similarly outpaces Claude 3 Opus, Llama 3.1 70B, and Llama 3.1 405B, offering better value per cost for its size class [Details].
Ideogram released Ideogram 2.0, a new frontier text to image model trained from scratch with improved generation of realistic images, graphic design and typography. You can now choose from a number of distinct styles, including Realistic, Design, 3D and Anime and can generate images that adhere to your specific color palette. Ideogram iOS app and the beta version of the Ideogram API have also been launched [Details].
Microsoft released Phi 3.5 family of models. These include Phi-3.5-mini, a better mini model with multilingual support; Phi-3.5-MoE , a MoE model with 6.6B active parameters that outperforms bigger models in reasoning capability and is only behind GPT-4o-mini; Phi-3.5-vision, a new vision model supporting multiple images [Details].
Luma AI released Dream Machine 1.5, an upgrade to their AI video model with higher-quality text-to-video, smarter understanding of text prompts, custom text rendering and improved image-to-video [Source]
Midjourney has opened its web platform to all users along with the release of a new unified AI image editor on the web. It unifies various existing features such as inpainting (repainting parts of an image with new AI-generated visuals using text prompts), outpainting/canvas extension (stretching the boundaries of the image in different directions and filling the new space with new AI visuals), and more into a single view [Details].
Nvidia released Mistral-NeMo-Minitron 8B, a pruned and distilled version of the recently released Mistral NeMo 12B model achieving high accuracy across nine popular benchmarks for chatbots, virtual assistants, content generation, coding, and educational tools [Details].
Google released the latest version of Imagen 3, its AI text-to-image generator, to users in the US; available now via ImageFX and VertexAI [Details].
Runway ML has officially released Gen-3 Alpha Turbo, the latest version of the AI video generation model that is seven times faster and half the cost of its predecessor, Gen-3 Alpha [Details].
Agibot, a Chinese robotics startup, introduced five wheeled and biped humanoid robots for tasks ranging from household chores to industrial operations. Their flagship biped robot, Yuanzheng A2 is equipped with AI-powered sensors that allow it to see, hear, and understand text, audio, and visual information. It can perform tasks as precise as threading a needle [Details].
The University of California, Berkeley School of Law announced that it will offer the first-ever law degree with a focus on artificial intelligence. The new AI-focused Master of Laws (LL.M.) program is scheduled to launch in summer 2025 [Details].
Meta and Waymo present Transfusion, a method for training a single unified model to understand and generate both discrete and continuous modalities. In their demonstration, Meta pretrains a transformer model on 50% text and 50% image data using a different objective for each modality: next token prediction for text and diffusion for images. The model outperforms DALL-E 2 and SDXL; unlike those image generation models, it can generate text, reaching the same level of performance as Llama 1 on text benchmarks [Paper].
Google AI recently developed a bioacoustic AI model called Health Acoustic Representations (HeAR) that can detect diseases by analyzing cough and other bodily sounds. HeAR is now available to researchers to help accelerate development of custom bioacoustic models [Details].
OpenAI announced the availability of fine-tuning for GPT-4o , offering 1M training tokens per day for free for every organization through September 23 [Details].
Nvidia unveiled a new on-device small language model, Nemotron-4 4B Instruct that improves the conversation abilities of game characters, allowing them to more intuitively comprehend players and respond naturally. Nemotron-4 4B Instruct is part of NVIDIA ACE, a suite of digital human technologies that provide speech, intelligence and animation powered by generative AI [Details].
Stability AI added Search and Recolor feature to its Stable Assistant tool that allows you to easily change the color of any object. Search and Recolor automatically segments an object and applies the specified colors from your prompt [Details].
Nvidia ACE, the company’s AI-powered system for giving voices and conversation skills to in-game characters, is set to debut in Mecha Break, a new multiplayer mech battle game coming to PC, Xbox X / S, and PlayStation 5 in 2025 [Details].

🔦 Weekly Spotlight

Getting Started with Real-World Robots - tutorial by Hugging Face [Link].
Top 100 Gen AI Consumer Apps - 3rd edition by a16z [Link].
How to create your own Flux AI Model (Flux LoRA Fine Tuning) [Link].
Anthropic's educational courses [Link].
Self-hosted AI Starter Kit by n8n - an open, docker compose template that quickly bootstraps a fully featured Local AI and Low Code development environment [Link].
The 5 Most Under-Rated Tools on Hugging Face [Link].
Demo of Qwen2-Math-72B You can input either images or texts of mathematical or arithmetic problems [Link].
3 AI Use Cases (That Are Not a Chatbot) - Feature engineering, structuring unstructured data, and lead scoring [Link].

🔍 🛠️ AI Toolbox: Product Picks of the Week

LTX Studio: AI-powered visual storytelling platform. Immediate access available to all now.
Hamming: Test your AI voice agent against 1000s of simulated users in minutes.
16x Prompt: An all-in-one toolbox that allows you to ship code faster with ChatGPT and Claude.
D-ID Video Translate: Instant video translation into multiple languages. The AI tool translates the text, clones the speaker’s voice and adapts the speaker's lip movements to match the dubbed audio precisely.
Hexus: Convert your screen flow recordings into engaging interactive product demos, videos, and docs.

Last Issue
1T open-source LLM, Llama 3.1 405B, Mistral Large 2, Stable Video 4D, Outfit Anyone, SearchGPT, Llama Guard 3 and more
July 26, 2024
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
Read full story