Screenshots to Code Dataset, Multi Motion Brush in Gen-2, Open-source AGI, AI system that solves complex geometry problems, AI in drug discovery and more
Greetings and welcome to this week's AI Brews for a concise roundup of the week's major developments in AI.
In today’s issue (Issue #48 ):
AI Pulse: Weekly News & Insights at a Glance
AI Toolbox: Product Picks of the Week
AI Skillset: Learn & Build
🗞️🗞️ AI Pulse: Weekly News & Insights at a Glance
🔥 News
Google DeepMind introduced AlphaGeometry, an AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist. It was trained solely on synthetic data. The AlphaGeometry code and model has been open-sourced [Details | GitHub].
Codium AI released AlphaCodium, an open-source code generation tool that significantly improves the performances of LLMs on code problems. AlphaCodium is based on a test-based, multi-stage, code-oriented iterative flow instead of using a single prompt [Details | GitHub].
Apple presented AIM, a set of large-scale vision models pre-trained solely using an autoregressive objective. The code and model checkpoints have been released [Paper | GitHub].
Alibaba presents Motionshop, a framework to replace the characters in video with 3D avatars [Details].
Hugging Face released WebSight, a dataset of 823,000 pairs of website screenshots and HTML/CSS code. Websight is designed to train Vision Language Models (VLMs) to convert images into code. The dataset was created using Mistral-7B-v0.1 and and Deepseek-Coder-33b-Instruct [Details | Demo].
Runway ML introduced a new feature Multi Motion Brush in Gen-2 . It lets users control multiple areas of a video generation with independent motion [Link].
LMSYS introduced SGLang, Structured Generation Language for LLMs, an interface and runtime for LLM inference that greatly improves the execution and programming efficiency of complex LLM programs by co-designing the front-end language and back-end runtime [Details].
Meta CEO Mark Zuckerberg said that the company is developing open source artificial general intelligence (AGI) [Details].
MAGNeT, the text-to-music and text-to-sound model by Meta AI, is now on Hugging Face [Link].
The Global Health Drug Discovery Institute (GHDDI) and Microsoft Research achieved significant progress in discovering new drugs to treat global infectious diseases by using generative AI and foundation models. The team designed several small molecule inhibitors for essential target proteins of Mycobacterium tuberculosis and coronaviruses that show outstanding bioactivities. Normally, this could take up to several years, but the new results were achieved in just five months. [Details].
US FDA provides clearance to DermaSensor's AI-powered real-time, non-invasive skin cancer detecting device [Details].
Deci AI announced two new models: DeciCoder-6B and DeciDiffuion 2.0. DeciCoder-6B, released under Apache 2.0, is a multi-language, codeLLM with support for 8 programming languages with a focus on memory and computational efficiency. DeciDiffuion 2.0 is a text-to-image 732M-parameter model that’s 2.6x faster and 61% cheaper than Stable Diffusion 1.5 with on-par image quality when running on Qualcomm’s Cloud AI 100 [Details].
Figure, a company developing autonomous humanoid robots signed a commercial agreement with BMW to deploy general purpose robots in automotive manufacturing environments [Details].
ByteDance introduced LEGO, an end-to-end multimodal grounding model that accurately comprehends inputs and possesses robust grounding capabilities across multi modalities,including images, audios, and video [Details].
Google Research developed Articulate Medical Intelligence Explorer (AMIE), a research AI system based on a LLM and optimized for diagnostic reasoning and conversations [Details].
Stability AI released Stable Code 3B, a 3 billion parameter Large Language Model, for code completion. Stable Code 3B outperforms code models of a similar size and matches CodeLLaMA 7b performance despite being 40% of the size [Details].
Nous Research released Nous Hermes 2 Mixtral 8x7B SFT , the supervised finetune only version of their new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. Also released an SFT+DPO version as well as a qlora adapter for the DPO. The new models are avaliable on Together's playground [Details].
Google Research presented ASPIRE, a framework that enhances the selective prediction capabilities of large language models, enabling them to output an answer paired with a confidence score [Details].
Microsoft launched Copilot Pro, a premium subscription of their chatbot, providing access to Copilot in Microsoft 365 apps, access to GPT-4 Turbo during peak times as well, Image Creator from Designer and the ability to build your own Copilot GPT [Details].
Samsung’s Galaxy S24 will feature Google Gemini-powered AI features [Details].
Adobe introduced new AI features in Adobe Premiere Pro including automatic audio category tagging, interactive fade handles and Enhance Speech tool that instantly removes unwanted noise and improves poorly recorded dialogue [Details].
Anthropic shares a research on Sleeper Agents where researchers trained LLMs to act secretly malicious and found that, despite their best efforts at alignment training, deception still slipped through [Details].
Microsoft Copilot is now using the previously-paywalled GPT-4 Turbo, saving you $20 a month [Details].
Perplexity's pplx-online LLM APIs, will power Rabbit R1 for providing live up to date answers without any knowledge cutoff. And, the first 100K Rabbit R1 purchases will get 1 year of Perplexity Pro [Link].
OpenAI provided grants to 10 teams who developed innovative prototypes for using democratic input to help define AI system behavior. OpenAI shares their learnings and implementation plans [Details].
🔦 Weekly Spotlight
Official implementation of ‘PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding’ [Link].
Māori Speech AI Model helps preserve and promote New Zealand indigenous language [Link].
Open TTS Tracker: Track all open-access/ source TTS models [Link].
Mentat: an open-source AI tool that assists you with any coding task, right from your command line [Link].
How OpenAI is approaching 2024 worldwide elections [Link].
🔍 🛠️ AI Toolbox: Product Picks of the Week
Marblism: Describe and generate a React front-end and NodeJS back-end in minutes.
Sama: a personalized AI companion that helps you reflect, introspect, and become the best version of yourself. iOS beta available with hardware launching soon.
Book and Bot: an AI led publishing project where each children’s book comes with a companion chatbot for continued storyplay.
📕 📚 AI Skillset: Learn & Build
Building AI Agents with Microsoft AutoGen Studio, an open-source app with a user-friendly interface built on top of the AutoGen framework [Link].
Model Prompting Guides covering some of the recent language models [Link].
LLMOps: new short course by DeepLearning.AI [Link].
Thanks for reading and have a nice weekend! 🎉 Mariam.