This AI newsletter is all you need #92
What happened this week in AI by Louie 
This week, we watched developments in the next generation of AI supercomputers with Nvidia GTC and Broadcom AI in Infrastructure events. It was also another eventful week for leadership drama in the AI startup ecosystem, as two of Inflection AI’s co-founders and much of the team left to join Microsoft, and Emad Mostaque stepped down as Stability AI CEO.
Anticipation is heavy for the next generation of LLM models, such as GPT 4.5/5 or Gemini Ultra 2.0. To some extent, the bottleneck in how far they will progress comes down to the availability of AI compute towards a single training run. GPT-4 is widely believed to have been trained on 25,000 Nvidia A100s. GPT-5 will likely be trained on H100s with roughly 3x more compute per GPU. However, whether OpenAI/Microsoft Azure has the capacity for a 50,000 or 150,000 GPU single training cluster remains unclear. Similarly, Google accelerated the deployment of TPUv5 chips (designed in partnership with Broadcom) in late 2023, and its Gemini 2.0 model is likely to take advantage of this. While it will not begin production until late this year, Nvidia’s newly announced B100 series of GPUs will take capabilities even further — potentially 4x in training capacity (vs. H100) and even over 20x for inference in some situations. In this context — Broadcom’s presentation of its custom AI chip and AI supercomputer infrastructure capabilities was also interesting. “Now, we know we can do 10,000, 20,000, 30,000, 60,000, 100,000 today, but there was also this consortium founded by Broadcom and a couple of others about two years ago. And the idea is, let’s actually take this to a million plus nodes.”
We think both custom AI chip and GPU training clusters are likely to scale to 1 million chips in the coming years, and indeed, SemiAnalysis revealed that Microsoft and Google already have plans for “larger than Gigawatt class” training clusters.
Separate from scaling compute — it is also important for the next generation of models to continue scaling data and implementing algorithmic improvements and breakthroughs. On this topic, Jason Huang’s comments on the next generation of models at GTC were also fascinating — stating they would include “insanely large context windows, state space vectors, synthetic data generation, essentially models talking to themselves, reinforcement learning, essentially AlphaGo of large language models, Tree Search.” Debate remains strong in the AI community on exactly how intelligent LLMs really are and what the limitations of the LLM transformer architecture will be. In the next year or two, we should see if scaling compute hits a dead end and what reasoning capabilities can be introduced via the new methods above!
Why should you care?
The amount now being invested into LLMs is truly staggering — and maybe some factor in the drama at AI startups. The choice for many to leave Inflection AI to join Microsoft after already raising over $1bn for the startup and only weeks after launching their new model seems a strange one on paper. However, with Inflection 2.5 coming up short relative to GPT-4 and the prospect of a new generation of AI chips soon needed to remain competitive, perhaps the prospect of competing with the budget of big tech companies in foundation model development looked too daunting. In open source AI, the departure of Emad Mostaque as Stability AI CEO followed Mistral’s surprise release of a closed model last month — and perhaps is also a sign of AI Venture Capital investors putting more pressure on monetization and fears over the ability to compete with the increasing AI budgets at big tech. In our view, this increasing centralization and barriers to entry would be a concerning development. However, there is still plenty of room for startups and individuals to build vertical products on top of the latest foundation models and work with smaller open-source models for specific applications.
- Louie Peters — Towards AI Co-founder and CEO
Building AI for Production (E-book): Available for Pre-orders!
Our book, ‘Building AI for Production: Enhancing LLM Abilities and Reliability with Fine-Tuning and RAG,’ is now available on Amazon for pre-orders. It is a roadmap to the future tech stack, offering advanced techniques in Prompt Engineering, Fine-Tuning, and RAG, curated by experts from Towards AI, LlamaIndex, Activeloop, Mila, and more.
This e-book focuses on adapting large language models (LLMs) to specific use cases by leveraging Prompt Engineering, Fine-Tuning, and Retrieval Augmented Generation (RAG), tailored for readers with an intermediate knowledge of Python. It is an end-to-end resource for anyone looking to enhance their skills or dive into the world of AI for the first time as a programmer or software student, with over 500 pages, several Colab notebooks, hands-on projects, community access, and our AI Tutor.
The e-book is a journey through creating LLM products ready for production, leveraging the potential of AI across various industries. Pre-ordering your copy now and take the first step towards your AI journey!
Hottest News
1.OpenAI Is Expected To Release a ‘Materially Better’ GPT-5 for Its Chatbot Mid-Year, Sources Say
OpenAI is preparing to release GPT-5 around mid-year, offering significant improvements over GPT-4, particularly in enhanced performance for business applications. Although the launch date is not fixed due to continued training and safety evaluations, preliminary demonstrations to enterprise clients suggest new features and capabilities, raising anticipation for GPT-5’s impact on the generative AI landscape.
2. ‘We Created a Processor for the Generative AI Era,’ NVIDIA CEO Says
At the GTC conference, NVIDIA CEO Jensen Huang announced the NVIDIA Blackwell computing platform. The platform aims to advance generative AI with superior training and inference capabilities. It includes enhanced interconnects for better performance and scalability. NVIDIA also launched NIM microservices for tailored AI deployment and Omniverse Cloud APIs for sophisticated simulation, signaling a transformative impact on sectors like healthcare and robotics.
3. Stability AI CEO Resigns To “Pursue Decentralized AI”
Emad Mostaque has stepped down as CEO of Stability AI to concentrate on decentralized AI developments. The company will be temporarily co-led by COO Shan Shan Wong and CTO Christian Laforte, maintaining its generative AI advancements. This leadership change occurs amid a notable industry trend of talent movement, highlighted by Microsoft’s acquisition of Inflection AI’s team and Google DeepMind’s co-founder Mustafa Suleyman.
4. Introducing Stable Video 3D: Quality Novel View Synthesis and 3D Generation From Single Images
Stability AI has introduced Stable Video 3D (SV3D), a new generative model that enhances 3D tech with better quality and consistency. SV3D offers two versions: SV3D_u for single-image-based orbital videos without camera paths and SV3D_p for more advanced 3D video creation using specified camera trajectories. It requires a Stability AI Membership for commercial use, while non-commercial users can access the model weights through Hugging Face and consult the accompanying research paper.
5. Google Starts Testing AI Overviews From SGE in the Main Google Search Interface
Google is expanding its AI-assisted search to a small subset of U.S. users as it moves toward a more widespread public rollout. The AI search feature, called Search Generative Experience (SGE), was previously only available to people who signed up to test it. The AI-generated responses, marked as “experimental,” are highlighted in a green box at the top of search results.
Five 5-minute reads/videos to keep you learning
1. The Anthropic Prompt Library
The Anthropic Prompt Library provides a suite of task-specific prompts to enhance performance in areas such as business, personal development, and user-generated content. It supports diverse activities, including game development, corporate analysis, web design, coding, and creative storytelling.
2. Artificial Intelligence: An Engagement Guide
This guide presents an investor viewpoint on the necessary company policies regarding AI, including investors’ expectations of companies, responsible AI practices, and management processes to identify, assess, and mitigate AI-related risks.
3. How People Are Really Using GenAI
This is a broad overview of how people use AI and what they use it for, compiled using thousands of comments on sites such as Reddit and Quora. It has identified 100 categories, divided into six top-level themes: Technical Assistance & Troubleshooting (23%), Content Creation & Editing (22%), Personal & Professional Support (17%), Learning & Education (15%), Creativity & Recreation (13%), Research, Analysis & Decision Making (10%)
4. LLM Inference Speed of Light
The article presents “calm,” a streamlined CUDA solution designed for rapid inference in LLMs. It emphasizes the “speed of light” theoretical maximum for LLM inference. It highlights LLMs’ reliance on sequential token generation, constrained by memory bandwidth rather than computational power with current CPUs and GPUs.
5. How Quickly Do Large Language Models Learn Unexpected Skills
This blog explains a new study that suggests emergent abilities actually develop gradually and predictably. The paper, by a trio of researchers at Stanford University, posits that the sudden appearance of these abilities is just a consequence of the way researchers measure the LLM’s performance.
Repositories & Tools
1. MusicLang/musiclang_predict leverages the LLAMA2 architecture for symbolic music generation, offering advanced features such as manipulation of chord progressions and export functionality to MIDI for DAWs.
2. Awesome-ai-agents compiles a list of autonomous agents covering open-source and closed-source projects.
3. OpenDevin is an open-source project that aims to replicate, enhance, and innovate upon Devin, an autonomous AI software engineer capable of executing complex engineering tasks.
4. Lobe Chat is an open-source LLMs/AI chat framework that supports Multiple AI Providers, Modes (Vision/TTS), and a plugin system.
Top Papers of The Week! 
1. Evolutionary Optimization of Model Merging Recipes
The paper presents an evolutionary algorithm designed to automate the combination of open-source models into sophisticated foundation models, eliminating the reliance on human expertise and large-scale resources. The approach optimally adjusts parameters and data flow, creating a high-performing Japanese Language LLM with mathematical capabilities and a culturally sensitive Visual Language Model (VLM).
2. RAFT: Adapting Language Model to Domain-Specific RAG
This paper introduced RAFT (Retrieval Augmented FineTuning) as a post-training method to improve LLMs for domain-specific tasks by training them to selectively leverage relevant documents, enhancing information citing and reasoning in “open-book” scenarios. Its effectiveness is validated on datasets like PubMed, HotpotQA, and Gorilla.
3. Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Mora is a new open-source, multi-agent video generation framework introduced to provide an alternative to OpenAI’s proprietary Sora model. It supports various tasks like text-to-video, image-to-video conversion, video extension, editing, and digital world simulation, with performance close to Sora in certain areas. However, it does not yet match Sora’s overall capabilities.
4. Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
This paper explores the potential of enhancing the scalability and performance of deep reinforcement learning (RL) by adopting a classification approach for training value functions instead of the traditional regression method. This method effectively addresses common challenges in value-based RL, such as noisy targets and non-stationarity, while improving scalability with minimal additional cost.
5. Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
Agent-FLAN fine-tune language models for Agents. It enables Llama2–7B to outperform prior best works by 3.5\% across various agent evaluation datasets. It greatly alleviates hallucination issues and consistently improves the agent capability of LLMs when scaling model sizes while slightly enhancing their general capability.
Quick Links
1. Tech giant Microsoft will be paying AI startup Inflection $650 million to license its software a week after it roped in the company’s two cofounders and most of its staff, as per a Bloomberg report. In June 2023, Inflection announced it had raised $1.3 billion to build what it called “more personal AI.” The lead investor was Microsoft.
2. GitHub’s latest AI tool can automatically fix code vulnerabilities. This new feature combines the real-time capabilities of GitHub’s Copilot with CodeQL, the company’s semantic code analysis engine.
3. The new LLM called KL3M, a two-year-old startup co-founded by Daniel Martin Katz, is challenging the notion that it’s impossible to create a useful model without relying on copyrighted data. It has earned the distinction of being the first LLM to receive a “Licensed Model (L) Certification” from independent auditing company Fairly Trained
Who’s Hiring in AI! 
AI Technical Writer and Developer for Large Language Models @Towards AI Inc (Remote/Freelance)
Senior ML Platform @Factored (Remote)
Data Engineer @Octopus Energy Group (London, UK)
Senior Artificial Intelligence Developer@FullStack Labs (Remote)
Tech Lead, AI/ML @Forge Global (New York, NY, USA)
Software Engineer, Core Machine Learning @Whatnot (Remote)
Senior Machine Learning Engineer @Rent The Runway (Remote)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #92 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.


