Hyper-SD, Llama3 with 1M+ context length, new Robot that folds clothes & cooks, Qwen1.5-110B, Vidu AI and more
Hi. Welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #61 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Alibaba Research released Qwen1.5-110B, the largest model in the Qwen1.5 series with over 100 billion parameters in the series. It demonstrates competitive performance against Llama-3-70. The model supports the context length of 32K tokens and is multilingual [Details].
Gradient released a model, Llama-3 8B Gradient Instruct 1048k, that extends LLama-3 8B's context length from 8k to 1M+ [Details].
Abacus.AI released Llama-3-Giraffe-70B model that extends the context length of Llama 3 70B to approximately 128k [Details].
ByteDance released Hyper-SD, offering hyper-fast and hyper-quality text-to-image generation. The model achieves Single-Step inference on both SD1.5 and SDXL architecture without evident losses of aesthetics, styles and structures [Details | Scribble Demo | T2I Demo]
BigCode released StarCoder2-15B-Instruct-v0.1, the very first entirely self-aligned code LLM trained with a fully permissive and transparent pipeline. The open-source pipeline uses StarCoder2-15B to generate thousands of instruction-response pairs, which are then used to fine-tune StarCoder-15B itself without any human annotations or distilled data from huge and proprietary LLMs [Details].
Stardust introduced AI robot, the Astribot S1, that can perform complex tasks such as folding clothes, sorting items, flipping pots for cooking, vacuuming, and competitive cup stacking. The S1 robot is expected to be commercialized within 2024 [Details | video].
Amazon Q, a generative AI-powered assistant for businesses and developers by AWS, is now generally available. Amazon Q includes Amazon Q Developer (a generative AI-powered conversational assistant to build and operate AWS applications), Amazon Q Business ( AI assistant that can generate content, and securely complete tasks based on data and information in enterprise systems) and Amazon Q Apps (to build generative AI-powered apps from their company’s data, without any prior coding experience) [Details| Q Apps video]
Atlassian introduced Rovo, an AI tool that accelerates finding, learning, and acting on information dispersed across a range of internal tools and third-party apps. It also lets you add specialized agents to workflows [Details].
China's Shengshu Technology and Tsinghua University have unveiled Vidu AI, a text-to-video model capable of generating 16-second clips at 1080p resolution [video]
PyTorch released ExecuTorch alpha, a framework focused on deploying large language models across mobile and edge devices including wearables, embedded devices and microcontrollers [Details].
Memory is now available to all ChatGPT Plus users. Tell ChatGPT anything you’d like it to remember and it can use this information as context when generating a future related answer. Memory can be turned on or off in settings and is not currently available in Europe or Korea. You can also start a Temporary Chat for one-off conversations, which won’t appear in your history or in memory [Details].
GitHub announced GitHub Copilot Workspace: the Copilot-native developer environment. Within Copilot Workspace, developers can brainstorm, plan, build, test, and run code in natural language. This new task-centric experience leverages different Copilot-powered agents from start to finish, while giving developers full control over every step of the process [Details].
Researchers from KAIST AI and others released Prometheus 2, an open source Language Model specialized in evaluating other language models. Compared to the Prometheus 1 models, the Prometheus 2 models support both absolute grading) and relative grading [Details].
Google deepMind introduced Med-Gemini, a family of multimodal medical models built upon Gemini that establish new state-of-the-art (SoTA) performance on 10 out of 14 medical benchmarks [Paper].
Nous Research released Hermes 2 Pro - Llama-3 8B. Hermes Pro comes with Function Calling and Structured Output capabilities, and the Llama-3 version now uses dedicated tokens for tool call parsing tags, to make handling streaming function calls easier [Details].
Anthropic introduced the Claude Team plan and iOS app [Details].
Empathic Voice Interface (EVI) API, announced last month by Hume AI, is now publicly availably. Powered by an empathic LLM (eLLM) that processes your tone of voice, EVI unlocks new capabilities like knowing when to speak, generating more empathic language, and intelligently modulating its own tune, rhythm, and timbre [Details].
Google’s new ‘Speaking practice’ feature uses AI to help users improve their English skills [Details].
Meta’s ‘set it and forget it’ AI ad tools are misfiring and blowing through cash [Details].
Google introduced a new shortcut in the Chrome desktop address bar for quick access to the Gemini chatbot [Link].
🔦 Weekly Spotlight
2024 AI Readiness Report by Scale AI - surveyed over 1,800 ML practitioners to understand the state of AI development and adoption today [Link].
The Possibilities of AI - Talk by Sam Altman at Stanford [Link].
Code Interpreter SDK: Python & JS/TS SDK for adding code interpreting to your AI app [Link]
Implementing FrugalGPT: Reducing LLM Costs & Improving Performance [Link].
Perplexica: an AI-powered search engine that is an Open source alternative to Perplexity AI [Link].
Self-Learning Llama-3 Voice Agent with Function Calling and Automatic RAG - video tutorial by Nvidia Jetson AI lab [Link].
Phi 3 running on your browser, 100% local, powered by WebGPU & Rust [Link].
Official Meta Llama 3 Hackathon, hosted by Meta in collaboration with Cerebral Valley and SHACK15 (In-person) [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Loom AI Workflows: Turn any loom video into share-ready docs. AI workflows follows the steps outlined in your video to create pull request descriptions, SOPs, and more.
Mindtrip: AI-powered travel tool that deliver personalized travel experiences.
Brainy Docs: Converts PDFs into explainer videos using AI
Hunch: Chain together AI tasks in a visual, no-code workspace
You can support my work via BuyMeaCoffee.
Thanks for reading and have a nice weekend! 🎉 Mariam.