TAI #133: Microsoft's $80Bn Bet on AI Compute for 2025; Will Synthetic Data Cause GPU Bottlenecks in 2025?
Also, METAGENE-1's Pathogen Transformer, NVIDIA CES 2025, Memory Layers at Scale, and More!
What happened this week in AI by Louie
After a rapid period of major AI model releases to close out 2024, this week was somewhat slower. In new models we were excited to see Prime Intellect applying transformers to the domain of pathogen detection with the 7Bn parameter METAGENE-1 — trained on over 1.5 trillion DNA and RNA base pairs from wastewater samples. Nvidia also released an array of impressive new models and hardware at CES 2025 (more on this below).
In other news, Microsoft announced an incredible $80 billion investment in AI-enabled data centers for fiscal 2025, highlighting there is still an enormous appetite for more GPU capacity. This should be a positive sign for anyone working or learning in the AI or LLM industry; there is no sign of investment slowing down! With so many rapidly moving pieces in AI technology and AI adoption, the outlook, even months ahead, is very difficult to read and predict. Two key data points have been particularly clear to track in the past two years, however — the remarkable growth in GPU deployment and the adoption of ChatGPT, the latter of which increased from 100 million to 300 million weekly active users during 2025. Beyond Microsoft’s announcement, in November, Nvidia had reported Datacenter revenue of $31bn for its quarter ending in October ($123bn annual rate) driven by its H100 GPUs, while Broadcom (who partner with Google on their TPU chips) recently reported AI revenue of $12bn for 2024.
Rapid technological progress and consequent token cost declines for inference on non-reasoning models (400x in two years on some measures or 4,000x for cached input) have recently raised questions about whether we should move into a period of reduced GPU scarcity. There have also been question marks in the media on whether scaling laws for pre-training compute are plateauing (reducing appetite for larger GPU clusters) and whether Generative AI end-user applications are really justifying the huge spend in AI compute capacity. Unfortunately, for LLM users and developers who may be hoping for much cheaper cloud GPUs and LLM models, we don’t think there will be GPU oversupply anytime soon. A key reason for this is reasoning models.
We’ve been talking for months about training compute vs. inference compute, and last week we broke down the up to 600,000x increase in compute you can use for o3 vs. its base 4o model on the same task. In practice, most reasoning model usage will be far less than this (we think o1 normally uses ~30x more than 4o on average, and the smaller o1/o3 mini models will be popular) — but it still is a huge step change, and can be very high for some high-value complex tasks. As another example, Sam Altman this week recently disclosed their $200 per month ChatGPT Pro subscription is actually losing money! Beyond the increased inference requirements when using reasoning models, however — another key factor is compute usage for synthetic data generation.
To understand why synthetic data suddenly becomes so significant, it is worth taking a step back. The basic issue reasoning models solve, in my opinion, is that the internet data LLMs have traditionally been trained on very rarely contain the full thinking steps needed to reason through complex problems; people very rarely type out their full inner monologue or “chains of thought” but instead just write down key details and skip to the conclusion. This means in a way LLMs were actively trained not to reason and instead were forced to skip these essential steps and guess at the answer.
Reasoning models try to fix this by commissioning (from human experts) and creating (synthetic data from earlier LLM models) data with a much more granular walkthrough of reasoning steps needed to solve complex problems. However, human experts are very expensive and not necessarily adept at conveying their thoughts and intuition in words. Therefore most of this data likely needs to be generated synthetically. While the details of o1 and o3 are not disclosed, I expect OpenAI now has a huge dataset (millions) of complex problems for which it has created full reasoning steps. A key issue, however, is that most synthetic attempts at solving new complex problems are likely to be incorrect or suboptimal. Therefore, to create the training dataset, hundreds or thousands of “chain of thought” solutions may well need to be generated for each problem before the best outputs are filtered down. You can see how the compute intensity for synthetic data escalates! This huge curated dataset of reasoning traces can then be used for the reasoning reinforcement learning step of the model during post-training. The implementation of this is complex, but it will essentially reward the model for creating good reasoning chains of thought (and learning to correct itself from dead ends) and punish it for creating bad ones.
Part of the magic in the speed of iteration we have seen here from o1 to o3 might be that the reasoning traces created by o1 were much stronger (and maybe many more were used) than the ones used for training o1 in the first place. So, you have this self-improving cycle of data created by earlier models being used to train the next generation. The reasoning steps created by o3 are far better again (according to benchmarks), so when enough of these have been created (huge amounts spent here on generating tens or hundreds of trillions of tokens), we may well see o3.5/o4.
Why should you care?
As we head into 2025, we do not see any letup in the pace of AI progress. We see huge potential for further gains in model capabilities this year yet again, particularly for reasoning models. We don’t expect a slowdown in GPU purchases either, and we don’t expect an AI winter. This year, AI adoption throughout the economy should progress to the next level as both foundation model capabilities and customized LLM pipeline and agent performance improve. However, GPU availability may be a bottleneck for both foundation model developers and end users in getting the most out of these models in 2025. They could be in very short supply despite the launch of Nvidia’s new Blackwell GPU. Even OpenAI might have been surprised at the success of reasoning models in the last few months. It feels very unlikely the industry has prepared for the step change increase in compute now needed for both synthetic reasoning data generation and test time compute scaling.
We now have three core scaling laws for increasing AI compute in return for increased capabilities: 1) Training compute (invested in parameters, training data, or pre-training flops per parameter*token). 2) Synthetic data generation (much will now be reasoning traces but it can be other text data or also synthetic video for video gen models like Sora or other applications like Tesla FSD). 3) Inference or Test -time compute (we discussed this in detail last week). In 2025, we expect all leading AI labs to continue scaling larger models but we also think greater returns on investment (or compute) and easier capability gains will come from synthetic reasoning data and test time compute. For this reason, all leading AI labs will be scrambling to create huge datasets of synthetic reasoning data, and maybe this will de-prioritize building larger training compute clusters in the near term.
One consequence of scaling laws for pre-training has been the increasing difficulty for startups to compete with LLM training budgets and some companies with in-house LLMs have instead pivoted to lower compute post-training (adapting open source models such as Llama). An interesting consequence of reasoning models is that AI startups may again have larger compute requirements if they aim to create their own synthetic reasoning data for reinforcement learning in post-training (particularly with open-sourced reasoning models from Qwen and Deepseek and hopefully some day META).
One key limitation of reasoning models is that synthetic chain of thought data generation works much more easily in domains where we can automatically check the solution (with yet more LLMs). For example, o1 and o3’s benchmark gains have been focused on mathematics, scientific reasoning, and coding. This is because it is much easier to find and use the best synthetic reasoning data when we have generated code. We can test it by executing it or by using mathematics proofs and complex science evaluation training sets where we know the correct answer. This also creates huge new potential for startups and LLM Developers. Millions of LLM developers can build on top of reasoning foundation models and handcraft and curate human expert reasoning solutions (hundreds/thousands of examples) for reinforcement learning fine-tuning and evaluations in their industry/domain. This could equate to a brute-forced, or crowd-sourced equivalent to AGI, where many thousands of different industry-customized LLM pipelines are built on top of foundation models by integrating industry expertise from top employees in the industry niche. Let’s see how 2025 plays out!
— Louie Peters — Towards AI Co-founder and CEO
Hottest News
1. Microsoft To Spend $80 Billion in FY’25 on Data Centers for AI
Microsoft plans to invest $80 billion in 2025 to build data centers capable of supporting AI tasks. More than half of this investment will occur in the U.S., stated Microsoft Vice Chair and President Brad Smith, reflecting the company’s commitment to expanding its AI infrastructure. In their blog, Microsoft outlines their view that the next four years present a golden opportunity for the United States to harness AI to invigorate the economy and maintain global leadership. This requires government support for AI R&D, AI infrastructure build-out, private sector innovation, and workforce development programs that equip Americans with the necessary skills.
2. Nvidia Revealed Several New Models and Products at CES 2025
NVIDIA introduced the Cosmos platform to power “physical AI,” featuring generative models and advanced video processing pipelines for robotics, industrial AI, and autonomous vehicles. They also unveiled Blackwell-based GeForce RTX 50 Series GPUs, headlined by the RTX 5090 with 92 billion transistors and delivering 3,352 TOPS, alongside DLSS 4 for up to 8x performance boosts and Reflex 2 for up to 75% latency reduction. New Automotive solutions were also announced with the NVIDIA DRIVE Hyperion platform, featuring the AGX Thor SoC and integrated synthetic data pipelines powered by Omniverse and Cosmos. Finally, Project DIGITS was revealed as a compact personal AI supercomputer powered by the GB10 Grace Blackwell Superchip, bringing NVIDIA’s full AI stack to developers’ desktops.
3. Prime Intellect Released METAGENE-1
METAGENE-1 is a 7B parameter autoregressive transformer model trained on over 1.5 trillion DNA and RNA base pairs derived from wastewater samples. It is open-sourced to support tasks in biosurveillance, pandemic monitoring, and pathogen detection, leveraging deep metagenomic sequencing methods. Developed in collaboration with USC researchers, METAGENE-1 demonstrates strong performance on human-pathogen detection benchmarks and broader metagenomic applications.
Nvidia has completed its acquisition of Run:ai, a software company for orchestrating GPU clouds for AI, and said that it would open-source the software. The purchase price wasn’t disclosed but was pegged by reports at $700 million when Nvidia first reported its intent to close the deal in April. Run:ai’s software remotely schedules Nvidia GPU resources for AI in the cloud.
5. Anthropic Reaches Deal With Music Publishers Over Lyric Dispute
Anthropic has made a deal to settle parts of a copyright infringement lawsuit for allegedly distributing protected song lyrics. The agreement requires Anthropic to apply existing guardrails in training future AI models and establish a procedure for music publishers to intervene when copyright infringement is suspected.
6. Meta’s AI-Generated Bot Profiles Are Not Being Received Well
Meta’s AI-generated bot profiles, including characters like “Jane Austen” and “Liv,” have drawn criticism. Users recently noticed these profiles, first made in 2023, amid reports of Meta’s vision for social media bots. Meta confirmed the removal of these profiles to fix blocking issues.
Five 5-minute reads/videos to keep you learning
1. Fast LLM Inference From Scratch
The author built an LLM inference engine from scratch using C++ and CUDA to optimize single-GPU performance without libraries. Inspired by Arseny Kapoulkine’s calm and Andrej Karpathy’s llama2.c, the project explores optimizations for single-batch inference on consumer devices, surpassing llama.cpp in token throughput.
2. Introducing Smolagents, a Simple Library To Build Agents
This article introduces Smolagents, a very simple library that unlocks agentic capabilities for language models. It emphasizes the concept of agents, when to use them, what code agents are, and how to build one.
3. When NOT To Use Large Language Models
This video dives into where LLMs truly shine and, more importantly, where they might fall short, along with the trade-offs you need to consider. This should give you a clear idea of whether or not LLMs are the right fit for your problem.
4. AIs Will Increasingly Attempt Shenanigans
Recent research shows that AI models increasingly engage in scheming behaviors like lying, deception, and sabotage. Frontier models, including o1, were tested for in-context scheming strategies, revealing varying levels of deceptive actions under specific prompts. This article sheds light on this and highlights the growing need to address AI behaviors as models become more capable and autonomous.
5. The 2025 AI Revolution: 10 Breakthroughs That Will Change Your Life
In 2024, researchers have unlocked mind-blowing LLM capabilities that could shift your career, business, or even personal life into overdrive. This author analyzed research papers, articles, and blogs and compiled an in-depth list of 10 game-changing innovations in AI and how to use them to advance in 2025.
Repositories & Tools
1. Agentarium is an open-source framework for creating and managing simulations populated with AI-powered agents.
2. VITA-1.5 is an open-source interactive omni-multimodal LLM.
3. TorchTitan is a native PyTorch library that provides large model training.
4. RetroLLM merges retrieval and generation by integrating them into a single auto-regressive decoding process.
Top Papers of The Week
1. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
This paper introduces DeepSeek LLM, a project dedicated to advancing open-source language models guided by the scaling laws. The model is pre-trained on a dataset with 2 trillion tokens. The research also conducted supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models.
2. An Automatic Graph Construction Framework based on Large Language Models for Recommendation
This paper introduces AutoGraph, an automatic graph construction framework based on LLMs for recommendation. It uses LLMs to infer user preference and item knowledge and employs vector quantization to extract the latent factors from the semantic vectors. The latent factors are then incorporated as extra nodes to link the user/item nodes, resulting in a graph with in-depth global-view semantics.
3. Process Reinforcement through Implicit Rewards
This paper presents PRIME (Process Reinforcement through Implicit Rewards), an open-source solution for online RL with process rewards, to advance the reasoning abilities of language models beyond imitation or distillation. The model trained on PRIME achieves 26.7% pass@1 on AIME 2024, surpassing GPT-4o and Qwen2.5-Math-7B-Instruct.
4. Implicit Grid Convolution for Multi-Scale Image Super-Resolution
The paper proposes a single model for multi-scale image super-resolution, replacing multiple scale-specific models. It introduces Implicit Grid Convolution (IGConv) to integrate SPConv at all scales, reducing training resources by one-third while maintaining performance. IGConv+ further improves results with spectral bias reduction, achieving a 0.25dB PSNR improvement at Urban100×4 with reduced costs.
5. Memory layers at scale from Meta
Memory layers employ a trainable key-value lookup mechanism to add parameters without increasing FLOPs, complementing dense feed-forward layers by providing a cheap way to store and retrieve information. Models augmented with these memory layers outperform both larger dense models (despite having double the compute) and mixture-of-expert models when matched for compute and parameters, especially on factual tasks. A fully parallelizable implementation scales up to 128B memory parameters for training on 1 trillion tokens, surpassing base models of up to 8B parameters.
6. Byte Latent Transformer: Patches Scale Better Than Tokens
The Byte Latent Transformer (BLT) changes LLM architecture by encoding bytes into dynamically sized patches, enhancing scalability, inference efficiency, and robustness. This approach improves model performance without a fixed vocabulary through FLOP-controlled scaling. BLT allocates more resources where data complexity increases, outpacing tokenization-based models’ performance and efficiency at fixed inference costs.
Quick Links
1. Rubik AI releases Sonus AI, a new suite of models designed to meet your diverse needs. Sonus-1 achieves high scores on benchmarks like MMLU, MATH-500, and HumanEval. The Pro models, in particular, excel in reasoning and coding, seemingly outdoing other top-tier platforms.
2. In a post on his personal blog, OpenAI CEO Sam Altman said they are beginning to turn Open AI’s aim to “superintelligence.” Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we can do on our own and, in turn, massively increase abundance and prosperity.
3. In a tweet Sam Altman also disclosed that its $200 per month ChatGPT Pro subscription is currently loss-making at OpenAI due to higher than expected usage.
4. Deepseek: The Quiet Giant Leading China’s AI Race More background on Deepseek, the Chinese AI startup we have recently discussed for its impressive R1 and v3 model launches. Funded by High-Flyer, Deepseek focuses on foundational AI technology, open-sourced its models, and has sparked price wars by significantly reducing inference costs.
Who’s Hiring in AI
AI Engineer @Rev.io (USA/Remote)
Senior LLM/RAG Engineer @Data Meaning (LATAM/Remote)
AI Engineer / Coder @Animoca Brands Limited (Hong Kong)
Data Engineering Intern @Super.com (Toronto, Canada/Remote)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.