This AI newsletter is all you need #96
What happened this week in AI by Louie
We are glad to say this was a week for Open-Source AI and small LLMs, with the release of LLama 3 by META and Microsoft’s announcement of Phi-3. LLama 3 is a big win for open-source and cheap and fast smaller models, but it has some limitations. The company chose to focus the model on text format, English language, and a shorter context window (8k).
LLama 3 is a very similar model architecture to LLama 2 — the key difference with v3 is a more intelligent and aggressive training data filter (including the use of llama 2 as a data classifier), 7x more data (now a massive 15 trillion tokens) and improved and scaled use of human feedback in fine-tuning. The breakthrough is huge jumps in model capabilities and benchmark scores for small model formats (8bn and 70bn parameters) and huge jumps in capabilities of the best open-source models. The speed advantage of these smaller models will be particularly important for agent workflows where latency per call can stack up. LLama 3 8B and 70B models can be run at home or fine-tuned to specific use cases. They can also be accessed on the cloud, such as on Together.ai, for $0.2 and $0.9 per million tokens, respectively, relative to GPT-3.5-Turbo and GPT-4-Turbo at an average (using 3–1 input vs output) of $0.75 and $15. Grok also offers LLama 3, with 70B at an average $0.64 cost per million tokens with faster inference speed.
With LLama 3, we think the biggest gains relative to existing models are likely coming from better training data filtering. META also chose to push hard on training data quantity relative to model parameter size. This is a sub-optimal choice for training cost vs. intelligence (very far from Chinchilla optimal, and more intelligence per unit of training compute would have come from extra parameters rather than extra training tokens). However, the choice is geared towards improved inference costs, creating a smarter, smaller model that will be cheaper to run.
Microsoft’s release of Phi-3 3.8B, 7B, and 14B has even more impressive benchmark scores relative to model size. The models were trained on highly filtered web data and synthetic data (3.3T to 4.8T tokens) and traveled further along the path of data quality prioritization. We await more details on the model release, real-world testing, and whether it is fully open source.
Current costs and key KPIs of leading LLMs
Why should you care?
When choosing the best LLM for your application, there are many trade-offs and priorities to choose between. Superior model affordability and response speed generally come together with smaller models. At the same time, intelligence, coding skills, multi-modality, and larger context lengths are usually things you pay more for with larger models. We think LLama 3 and Phi-3 will change the game for smaller, faster, cheaper models and will be a great choice for many LLM use cases. Particularly given that it is open-source and flexible, it can be fine-tuned and tailored to specific use cases.
It is incredible how far we have come with LLMs in less than two years! In August 2022, the best model available was da-Vinci-002 from OpenAI for $60 per million tokens, scoring 60% on the MMLU test (16k questions across 57 tasks with human experts at 89.8%). Now, Lllama 3 8B costs an average of $0.2 or 300x cheaper while scoring 68.4% MMLU. The most capable models (GPT-4 & Opus) are now at 86.8% on MMLU while multimodal and have 50–100x larger context length. Now, there are a large number of models that are competitive for certain use cases. We expect this to accelerate innovation and adoption of LLMs even further.
- Louie Peters — Towards AI Co-founder and CEO
Hottest News
1.FineWeb: 15 Trillion Tokens of High-Quality Web Data
The FineWeb dataset consists of over 15 Trillion tokens of cleaned and deduplicated English web data from CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile, and SlimPajama. It is accessible on HuggingFace.
2. Meta Introduced Meta Llama 3
Meta has launched Llama 3, the newest addition to its Llama series, accessible on Hugging Face. It is available in 8B and 70B versions, each with base and instruction-tuned variants featuring enhanced multilingual tokenization. Llama 3 is designed for easy deployment on platforms like Google Cloud and Amazon SageMaker.
3. Mistral AI Launched Mixtral 8x22B
Mistral unveiled Mixtral 8x22B, an efficient sparse Mixture-of-Experts model with 39B active out of 141B total parameters. It specializes in multilingual communication, coding, and mathematics and excels in reasoning and knowledge tasks. The model has a 64K token context window, is compatible with multiple platforms, and is available under the open-source Apache 2.0 license.
4. Adobe To Add AI Video Generators Sora, Runway, and Pika to Premiere Pro
Adobe announced that it aims to update Premiere Pro to add plug-ins to emerging third-party AI video generator models, including OpenAI’s Sora, Runway ML’s Gen-2, and Pika 1.0. With this addition, Premiere Pro users would be able to edit and work with live-action video captured on traditional cameras alongside and intermixed with AI footage.
5. Google’s New Chips Look To Challenge Nvidia, Microsoft and Amazon
Google has unveiled the Cloud TPU v5p, an AI chip that delivers nearly triple the training speed of its predecessor, the TPU v4, reinforcing its position in AI services and hardware. Additionally, Google introduced the Google Axion CPU, an Arm-based processor that competes with similar offerings from Microsoft and Amazon, boasting a 30% performance improvement and better energy efficiency.
Five 5-minute reads/videos to keep you learning
1.OpenAI or DIY? Unveiling the True Cost of Self-Hosting LLMs
The article examines the financial considerations of leveraging OpenAI’s API versus self-hosting LLMs. It highlights the trade-off between the greater control over data achieved through self-hosting, which comes with higher costs for fine-tuning and maintenance, and the potential cost savings of OpenAI’s usage-based pricing model.
2. CUDA Is Still a Giant Moat for NVIDIA
Despite everyone’s focus on hardware, AI software is what protects NVIDIA. This blog dives into the role and importance of the CUDA software ecosystem in NVIDIA, maintaining its leading position in AI.
The 2024 AI Index Report from Stanford presents key trends in AI, including technical progress, rising costs of advanced models, and AI-enhanced workforce productivity. It also notes the uptick in AI-focused regulations and investments, particularly in generative AI. This is set against increased public consciousness and concern regarding AI’s societal implications.
4. Getting Started With Gemini 1.5 Pro and Google AI Studio
In this article, the author explores the capabilities of Gemini 1.5 Pro and Google AI Studio. The tutorial provides an overview of Google AI Studio, including its fundamentals, various modes, how to utilize the available multimodal features, and when to use Google AI Studio vs. Gemini.
5. You Can’t Build a Moat With AI
This article explores why it is difficult to build a moat with AI, especially LLMs and presents ideas for a potentially successful approach. Success in AI applications increasingly depends on leveraging unique, customer-specific data for training rather than just innovations in models like LLMs. Data engineering is key to creating competitive AI solutions.
Repositories & Tools
1. LLM Transparency Tool is an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models.
2. Llama Factory unifies the fine-tuning of 100+ LLMs.
3. Reader converts any URL to an LLM-friendly input with a simple prefix.
4. Open Agent Studio is a no-code agent editor.
5. AgentRun is a Python library that makes it easy to run Python code safely from LLMs with a single line of code.
Top Papers of The Week!
1. Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Megalodon, a new model architecture designed for efficient sequence modeling with unlimited context length, addresses the scalability limitations of Transformers due to their quadratic complexity and poor performance with long sequences. Building upon the Mega architecture, it incorporates advancements such as complex exponential moving average (CEMA), timestep normalization, and a normalized attention mechanism.
2. VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Microsoft has developed VASA, a framework that can create realistic talking faces with expressive visual affective skills from a single image and audio input, featuring synchronized lip-syncing and dynamic facial expressions for enhanced authenticity.
3. Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
This paper introduces Mini-Gemini, a simple framework enhancing multi-modality Vision Language Models (VLMs). Mini-Gemini mines the potential of VLMs and simultaneously empowers current frameworks with image understanding, reasoning, and generation. It supports a series of dense and MoE LLMs from 2B to 34B.
4. RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems
This paper introduces RecAI, a practical toolkit designed to augment recommender systems with the advanced capabilities of LLMs. RecAI provides a suite of tools, including Recommender AI Agent, Recommendation-oriented Language Models, Knowledge Plugin, RecExplainer, and Evaluator, to facilitate the integration of LLMs into recommender systems.
5. Learn Your Reference Model for Real Good Alignment
Researchers address instability in LLM alignment methods such as RLHF and DPO by proposing Trust Region DPO (TR-DPO), which actively updates the reference policy during training. TR-DPO outperforms DPO by up to 19%, per GPT-4 automatic evaluations.
Quick Links
1. Poe introduces multi-bot chat and plans enterprise tier to dominate the AI chatbot market. With a recent $75 million funding round, Poe is betting big on the potential of a thriving ecosystem around AI-powered chatbots.
2. OpenAI seeks to dismiss Elon Musk’s lawsuit, calling contract claims ‘revisionist.’ The company has stated that Musk’s claim that it violated its contractual commitments to create an open-source, nonprofit entity is an attempt to promote his own competing AI firm.
3. After months of leaks, OpenAI has reportedly fired two researchers linked to company secrets going public. According to reports from The Information, the firm has fired researchers Leopold Aschenbrenner and Pavel Izmailov.
Who’s Hiring in AI!
Machine Learning Summer Internship @NDAX Canada Inc. (Calgary, Canada)
Staff Technical Product Manager — Text Generation @OctoAI (Remote)
Senior Software Engineer (Java) @Activate Interactive (Singapore)
Graphics AI Software Engineer @Plain Concepts (Remote)
Summer Intern, AI (Artificial Intelligence) @Nextech (USA/Remote)
Senior Developer Gen — AI @Capco (Sao Paulo, Brazil)
Junior Data Scientist @Moonpig (London, UK)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #96 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.