GPT‑5, Qwen-Image, Skywork UniPic, Genie 3, Grok Imagine, Gemini Storybooks and guided learning, Opus 4.1, gpt-oss-120b and gpt-oss-20b, Eleven Music, Seed Diffusion Preview, Lindy 3.0 and more
GPT‑5, Qwen-Image, Skywork UniPic, Genie 3, Grok Imagine, Gemini Storybooks and guided learning, Opus 4.1, gpt-oss-120b and gpt-oss-20b, Eleven Music, Seed Diffusion Preview, Lindy 3.0 and more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
In today’s issue (Issue #109):
AI Pulse: Weekly News at a Glance
Weekly Spotlight: Noteworthy Reads and Open-source Projects
AI Toolbox: Product Picks of the Week
From our sponsors:
Make sense of what matters in AI in minutes —minus the clutter
There’s so much AI hype going around, it’s overwhelming to keep up.
… And that’s assuming you’re following the stuff that actually matters.
Fortunately, there's The Deep View, a newsletter that sifts through all the AI-related noise for you.
It gives you 5-minute insights on what truly matters right now in AI, and it’s trusted by over 452,000 subscribers, including executives at Microsoft, Scale, and Coinbase.
Sign up now for free and start making smarter decisions in the AI space.
🗞️🗞️ Weekly News at a Glance
Open AI:
GPT‑5 launched: It’s a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT‑5 thinking) for harder problems, and a real‑time router that quickly decides which to use based on conversation type, complexity, tool needs, and your explicit intent (for example, if you say “think hard about this” in the prompt). GPT‑5 sets a new state of the art across math (94.6% on AIME 2025 without tools), real-world coding (74.9% on SWE-bench Verified, 88% on Aider Polyglot), multimodal understanding (84.2% on MMMU), and health (46.2% on HealthBench Hard). GPT‑5 is the new default in ChatGPT, replacing GPT‑4o, o3, o4-mini, GPT‑4.1, and GPT‑4.5 for signed-in users. GPT‑5 is available to all users, with Plus subscribers getting more usage, and Pro subscribers getting access to GPT‑5 pro, a version with extended reasoning for even more comprehensive and accurate answers [Details|.
gpt-oss-120b and gpt-oss-20b open-weight language models released under the Apache 2.0 license. The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with 16 GB of memory [Details].
OpenAI Harmony: OpenAI's response format for its open-weight model series gpt-oss
Anthropic released Claude Opus 4.1, an upgrade to Claude Opus 4, achieving 74.5% on SWE-bench Verified up from 72.5% . It also improves Claude’s in-depth research and data analysis skills, especially around detail tracking and agentic search [Details].
Google:
Previewed Genie 3: a general purpose world model that can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p. Genie 3’s consistency is an emergent capability. Other methods such as NeRFs and Gaussian Splatting also allow consistent navigable 3D environments, but depend on the provision of an explicit 3D representation. By contrast, worlds generated by Genie 3 are far more dynamic and rich because they’re created frame by frame based on the world description and actions by the user [Details].
Google Deep Think: available in the Gemini app to Google AI Ultra subscribers. It is a variation of the model that recently achieved the gold-medal standard at this year’s International Mathematical Olympiad (IMO).
Gemini Storybooks: create personalized, illustrated storybooks with read-aloud narration. It's free and available globally now.
Guided Learning: a new mode in Gemini that acts as a learning companion guiding you with questions and step-by-step support instead of just giving you the answer. Guided Learning provides rich, multimodal responses — including images, diagrams, videos and interactive quizzes
An updated open source version of Perch - an AI model for analyzing bioacoustic data to aid conservationists. Available on Kaggle [Details].
Gemini CLI GitHub Actions: a no-cost, powerful AI coding teammate for your repository. It acts both as an autonomous agent for critical routine coding tasks, and an on-demand collaborator you can quickly delegate work to.
Game Arena: a new open-source platform where AI models and agents compete head-to-head in a variety of strategic games [Details].
xAI has rolled out Grok Imagine, an image and video generator that can turn text or image prompts into a 15-second video with native audio. Grok Imagine is available free in the Grok app [Details].
Alibaba Qwen introduced Qwen-Image, an open-source image generation foundation model that excels at complex text rendering and precise image editing [Details].
ElevenLabs launched Eleven Music, an AI audio platform that creates studio quality music from natural language prompts. It gives users complete control over genre, style, and structure, supports vocals or instrumentals and allows precise editing of sound and lyrics for individual sections or the entire song [Details].
ByteDance introduced Seed Diffusion Preview, a large scale language model based on discrete-state diffusion, specializing in code generation. It achieves an inference speed of 2,146 token/s, a 5.4x improvement over autoregressive models of comparable size [Details].
Skywork AI released Skywork UniPic, a 1.5 billion-parameter open-source autoregressive model that unifies image understanding, text-to-image generation, and image editing within a single architecture [Details].
Anthropic added automated security reviews in Claude Code. Using the GitHub Actions integration and a new /security-review command, developers can ask Claude to identify security concerns, and then have it fix them [Details].
Alibaba Qwen released the APIs of their Flash series, which support Qwen3-Coder and Qwen3-2507 now. Both APIs support the context length of 1M tokens. Qwen-Plus-Latest now also supports the context length of 1M tokens [Details].
Shopify announced Agentic commerce - enables AI agents to help buyers find products, add them to cart, and checkout [Details].
🔦 🔍 Weekly Spotlight
Articles/Courses/Videos:
Open-Source Projects:
Kitten TTS: an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.
LangExtract by Google: a Python library that uses LLMs to extract structured information from unstructured text documents based on user-defined instructions
DroidRun: a powerful framework for controlling Android and iOS devices through LLM agents.
🔍 🛠️ Product Picks of the Week
Lindy 3.0: Write prompts that build custom agents instantly, without writing code. Lindy 3.0 introduces Autopilot, giving Lindy agents their own cloud-based computers that they can see and use.
Mocha: an AI-powered application builder to build custom software without coding with built-in database, auth, email and storage.
Suna by Kortix: an open-source generalist AI Agent that acts on your behalf.
https://www.sokosumi.com
Last Issue
Cogito v2, GLM-4.5 , first open-source MoE Video Model, fully autonomous ML agent, FLUX.1 Krea [dev], Qwen3-Coder-Flash, Manus Wide Research, Runway Aleph, Intern-S1, Step3, Action Agent & more
Hey there! Welcome back to AI Brews - a concise roundup of this week's major developments in AI.
Thanks for reading and have a nice weekend! 🎉 Mariam.



![Cogito v2, GLM-4.5 , first open-source MoE Video Model, fully autonomous ML agent, FLUX.1 Krea [dev], Qwen3-Coder-Flash, Manus Wide Research, Runway Aleph, Intern-S1, Step3, Action Agent & more](https://substackcdn.com/image/fetch/$s_!Ov4W!,w_1300,h_650,c_fill,f_auto,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9266ee6d-0aa4-4515-b863-d39f2b07cc63_1686x1620.heic)