This AI newsletter is all you need #94
What happened this week in AI by Louie
For the past few weeks, we have been following an increased pace of voice and music AI model releases. In particular, Suno AI’s v3 music generation model was released two weeks ago and has gained momentum this week, with some referring to it as the “Chatgpt” moment of generative music. Suno can make full two-minute songs from the prompt, with lyrics in any genre, in many languages, and with many accents. Some of the results are very impressive — here’s a fun one we asked it to make this week about “AI.” Technical details and disclosure of training data were limited, raising questions on legal risk if it has been trained on any copyrighted music. However, the music industry is particularly experienced and well-organized in protecting its copyright!
We also saw a new music model from Stability AI — Stable Audio 2.0, a latent diffusion model employing a diffusion transformer. It is exclusively trained on a licensed dataset from the AudioSparx music library, can generate high-quality audio tracks of up to 3 minutes, and supports audio-to-audio generation. Many large tech companies have released music models over the past year, but we expect to see more impressive results from the next generation of these models in the coming months.
Why should you care?
The pace at which Generative AI is being applied to new domains is incredible. All these models generally use similar architectures with some combination and variation of diffusion and transformer models. Music generation is the latest, and it now appears to have reached a capability threshold that could start to have a real impact. Whether it is to generate full radio hit songs from scratch or use them by artists as inspiration, it is unclear what this means for artists, record labels, and Spotify. We hope artists will be compensated and allowed to opt-out if their music is being used to train these models — but we expect it will take some time to establish exactly how existing copyright laws apply. We think this could lead to more value attributed to an artist’s brand and live music, while songwriting and session music could get more commoditized. However, we think the best music made using these AI tools will be generated in collaboration with talented human musicians who have great taste and their own inspiration mixed in for some time to come!
- Louie Peters — Towards AI Co-founder and CEO
Hottest News
1.OpenAI’s Alleged Use of YouTube Data for AI Training Comes Under Scrutiny
A recent report from The New York Times highlights allegations that OpenAI and Google may have infringed on YouTube creators’ copyrights by using transcriptions of YouTube videos to train their AI models. OpenAI’s usage of its Whisper tool to transcribe video content for GPT-4 training, as well as Google’s own training practices, are under scrutiny, despite Google’s assertions that they only use content from consenting creators.
2. Assembly AI Claims Its New Universal-1 Model Has 30% Fewer Hallucinations Than Whisper
Assembly AI has a new speech recognition model called Universal-1. Trained on more than 12.5 million hours of multilingual audio data, the company says it does well with speech-to-text accuracy across English, Spanish, French, and German. It claims Universal-1 can reduce hallucinations by 30% on speech data and 90% on ambient noise compared to OpenAI’s Whisper Large-v3 model.
3. Stability AI Released Stable Audio 2.0
Stable Audio 2.0 introduces significant advancements in music generation AI. It can generate high-quality audio tracks of up to 3 minutes and supports audio-to-audio generation, where users upload a sample they want to use as a prompt. Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators.
4. Lambda Announces $500M GPU-Backed Facility To Expand Cloud for AI
Lambda has secured a special-purpose GPU financing vehicle of up to $500 million to expand its on-demand cloud offering. This innovative asset-based structure is secured by the GPUs and supported by their cash flow generation. It represents a significant milestone within the AI compute market. It allows Lambda to fund on-demand cloud deployments for thousands of users without needing them to sign a long-term contract.
5. Introducing Command R+: A Scalable LLM Built for Business
Cohere’s Command R+ is a new medium-size LLM focusing on business-oriented features. It is an upgrade from Cohere’s previous model, Command R, in the same zone, improving Advanced RAG and Tool Use even further. According to the company report, Command R+ outperforms similar models in the scalable market category and is competitive with more expensive models on key business-critical capabilities.
Five 5-minute reads/videos to keep you learning
1. Build Autonomous AI Agents with Function Calling
This comprehensive tutorial on Function Calling focuses on practical implementation, building a fully autonomous AI agent, and integrating it with Streamlit for a ChatGPT-like interface. Although it uses OpenAI, this tutorial can be easily adapted for other LLMs supporting Function Calling, such as Gemini and Anthropic.
This blog post discusses the advantages and disadvantages of Mamba. It also covers what Mamba means for Interpretability, AI Safety, and Applications.
3. Introduction to State Space Models (SSM)
State Space Models (SSM) are increasingly influential in deep learning for dynamic systems, gaining attention with the “Efficiently Modeling Long Sequences with Structured State Spaces” paper in October 2021. This article focuses on the S4 model, an essential theoretical framework that, while not widely used in practical applications, underscores the evolution of alternatives to transformer architectures in AI.
4. Going Beyond Zero/Few-Shot: Chain of Thought Prompting for Complex LLM Tasks
This prompting guide highlights various prompting techniques such as zero-shot, few-shot, and chain-of-thought prompting, as well as advanced techniques like recursive, Tree of Thoughts, Automatic Reasoning and Tool-Use, and more. It focuses on the Chain of Thought prompting, covering its benefits and limitations.
5. Cosmopedia: How To Create Large-Scale Synthetic Data for Pre-Training
This blog post outlines the challenges and solutions involved in generating a synthetic dataset with billions of tokens to replicate Phi-1.5, leading to the creation of Cosmopedia. Cosmopedia aims to reproduce the training data used for Phi-1.5. This post shares the initial findings and discusses plans to improve the current dataset.
Repositories & Tools
1. SWE-Agent is a tool for autonomously resolving bugs in GitHub repositories.
2. AutoWebGLM is a project to build a more efficient language model-driven automated web navigation agent.
3. Plandex is an AI coding engine for complex tasks.
4. Keywords AI is a unified DevOps platform for LLM applications.
5. Lemonfox is the fast, easy, and cheap OpenAI alternative.
Top Papers of The Week!
1. Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
One drawback of modern transformers is that each token uses the same amount of predictive compute. The research demonstrates that transformers can learn to dynamically allocate compute to specific positions in a sequence, optimizing the allocation along the sequence for different layers across the model depth. The results are 50% fewer flops at generation time for equivalent performance.
This study unveils a technique called “many-shot jailbreaking,” highlighting how crafting multiple deceptive dialogues can trick large language models into providing banned responses. It exposes a link between this vulnerability and the models’ in-context learning capabilities.
3. Long-context LLMs Struggle with Long In-context Learning
This paper introduces a specialized benchmark, LongICLBench, focusing on long in-context learning within extreme-label classification. It has six datasets with a label range spanning 28 to 174 classes covering different input (few-shot demonstration) lengths from 2K to 50K tokens. It finds that the long-context LLMs perform relatively well on less challenging tasks with shorter demonstration lengths by effectively utilizing the long context window.
4. ReFT: Representation Finetuning for Language Models
PEFT methods adapt large models via updates to a small number of weights. However, before interpretability, work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. This research develops a family of Representation Finetuning (ReFT) methods, which operate on a frozen base model and learn task-specific interventions on hidden representations.
5. Training LLMs over Neurally Compressed Text
This paper investigates training large language models (LLMs) using highly compressed text by neural text compressors to improve training and serving efficiency and better manage long text sequences. Although the method results in higher perplexity than traditional subword tokenizers, it benefits from shorter sequence lengths, leading to fewer generation steps and reduced latency.
Quick Links
1. OpenAI CEO Sam Altman and former Apple design chief Jony Ive aim to raise $1 billion for their mysterious AI device startup. The product they’re developing is set to redefine personal AI interaction, moving beyond the traditional smartphone framework.
2. Tesla is increasing compensation for AI teams, says Elon Musk. The move was a way to retain the employee who was being poached by OpenAI, which has reportedly been attempting to recruit Tesla engineers with attractive compensation plans.
3. Google is reportedly considering charging for premium content generated by AI. The company is said to be revamping its business model. It would be the first time Google charged for any of its content.
4. Open AI-backed startup Ghost Autonomy has shut down. The shutdown comes just five months after the startup partnered with OpenAI through the OpenAI Startup Fund to gain early access to OpenAI systems and Azure resources from Microsoft.
Who’s Hiring in AI!
Software Engineer, AI/Computer Vision @Snail Games (Remote/USA)
Data Developer @Gate.io (Remote/APAC)
Software Development Engineer, ML_AI @Amazon (Seattle, WA, USA)
Python — Backend Developer @Simetrik (Remote)
Machine Learning Intern @CommerceIQ (Sunnyvale, CA, USA)
Technical Support Engineer @Salesforce (Remote/India)
Gen AI Data Scientist @Deloitte (Houston, TX, USA)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #93 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.