TAI #143: New Scaling Laws Incoming? Ilya's SSI Raises at $30bn, Manus Takes AI Agents Mainstream
Also, Alibaba QwQ-32B, Mistral OCR, and Microsoft's in-house LLM efforts
What happened this week in AI by Louie
As Ilya Sutskever’s Safe SuperIntelligence (SSI) secures another $2bn round at a hefty $30bn valuation, speculation has grown around what he is working on and whether he will discover yet more groundbreaking scaling laws for AI. While another scaling breakthrough would be exciting, an alternative, pragmatic pathway to progress AI capabilities continues to emerge — building advanced agents on top of existing foundation models. China-based startup Monica is proving precisely this point with Manus, their invite-only multi-agent product, which has rapidly captured attention despite not developing their own base LLM. Instead, Manus stitches together Claude 3.5 Sonnet and custom fine-tuned open-source Qwen models, paired with specialized tools and sandboxes, to autonomously tackle real-world complex tasks.
Manus’s architecture neatly divides into two key highly specialized layers: the “planner,” powered by fine-tuned Qwen models optimized for strategic reasoning and task decomposition, and the “executor,” driven by Claude 3.5 Sonnet alongside a diverse set of 29 dedicated sub-agents. This system demonstrates remarkable capabilities by seamlessly integrating code execution, web browsing, multi-file code management, and interactive frontend generation — features reminiscent of recent advanced tools like Cursor, OpenAI’s Operator and Deep Research agents, and Claude’s Artifact UI. Manus’s success emerges from coherently assembling these previously separate functionalities into a unified agent framework, unlocking greater autonomy and practical utility. Its GAIA benchmark performance reflects this clearly: scoring an impressive 86.5% on simpler “Level 1” questions which easily surpasses OpenAI Deep Research’s result (74.3%). Even on more complex, “Level 3” multi-step tasks, Manus leads notably, achieving 57.7% versus OpenAI Deep Research’s 47.6%.
Yet, despite Monica’s innovation using existing models, even more could be unlocked with improvements to base model intelligence. Ilya Sutskever, previously at Google and OpenAI, has been intimately involved in many of the major Deep Learning and LLMs breakthroughs in the past 10–15 years.
Ilya Sutskever’s SSI’s 5x valuation increase to $30bn in less than six months has raised speculation on what he has been working on (in heavy secrecy, reportedly requiring job candidates to leave phones in a Faraday cage before entering its offices). Ilya has consistently been central to major breakthroughs in deep learning scaling laws and training objectives for LLMs, making it plausible he’s discovered yet another one. Indeed, clues from recent interviews suggest precisely this. Ilya himself in September mentioned discovering a “different mountain to climb”, hinting at a new scaling law. “Everyone just says ‘scaling hypothesis’,” he noted pointedly. “But scaling what?”.
Ilya first demonstrated GPU-driven neural network scaling with AlexNet in 2012 alongside Geoffrey Hinton and Alex Krizhevsky, paving the way for dramatically accelerating model depth, performance, and computational intensity. While he didn’t invent the next-token prediction objective (which was a much earlier technique) or the transformer architecture introduced in 2017, he laid essential groundwork for transformers with sequence-to-sequence (seq2seq) models. He also crucially pushed OpenAI’s strategic decision to massively scale next-token prediction using GPUs and transformers, thus pushing data scaling bottlenecks (and corresponding useful compute scaling) to the scale of the entire internet. Most recently, Ilya’s foundational contributions to “test-time compute” reportedly laid the groundwork for development into Q* and o1 by Jakub Pachocki and Szymon Sidor. This approach led to a new training objective — predicting full solutions to verifiable problems — and introduced both a new training scaling regime (reinforcement learning with verifiable rewards or RLVR) and new inference-time scaling laws.
If Ilya is indeed onto yet another new scaling mechanism — and SSI’s rapid valuation jump seems to suggest investors’ belief — this would mark quite a breakout from the many years we spent focused only on the next token prediction objective and scaling just pre-training data and parameters. Scaling both the new RLVR training method and corresponding inference time tokens alone might well be sufficient for approaching AGI-like capabilities across many human standalone tasks (particularly together with Agent pipelines and LLM Developers using reinforcement fine tuning to customize models to different tasks). New training objectives on the other hand could accelerate this and also unlock entirely new types of intelligence and categories of AI capability.
Why should you care?
The convergence of new scaling paradigms and advanced agent architectures suggests an approaching tipping point. Companies like Monica with Manus demonstrate how effectively existing models can be recombined to produce substantial leaps in real-world task performance. At the same time, breakthroughs from Ilya and SSI, or indeed any of the AI labs or even individual researchers, may fundamentally alter what we even think of as scalable AI, setting the stage for a far broader spectrum of intelligence capabilities. For developers and entrepreneurs alike, this dual innovation track — practical agent integration versus groundbreaking foundational shifts — offers compelling paths forward. While waiting for the next great leap, significant competitive advantages can still be gained today by intelligently leveraging and refining existing tools into specialized agents. But make no mistake: if Ilya is indeed pioneering another new scaling law, AI’s landscape may soon be reshaped once again.
— Louie Peters — Towards AI Co-founder and CEO
This issue is brought to you thanks to NVIDIA GTC:
Join Us at NVIDIA GTC — The AI Event of the Year!
NVIDIA GTC is back, and it’s shaping up to be one of the biggest AI events of the year! Running from March 17 to 21 in San Jose, CA, GTC will bring together developers, researchers, and business leaders to explore cutting-edge advancements in AI, accelerated computing, and data science.
There’s a packed agenda, including:
Keynote by NVIDIA CEO Jensen Huang — covering AI agents, robotics, and the future of accelerated computing
The Rise of Humanoid Robots — exploring how AI is pushing robotics forward
AI & Computing Frontiers with Yann LeCun and Bill Dally — a deep dive into where AI is headed
Industrial AI & Digitalization — how AI is transforming industries in the physical world
Hands-on Workshops & Training Labs — practical sessions on AI, GPU programming, and more
Our CTO, Louis-François Bouchard, will be attending, so if you’re around, let’s connect!
📅 March 17–21
📍 San Jose, CA & Online
Hottest News
1. Alibaba Released Its QwQ-32B Model Based on High Scale Reinforcement Learning Techniques
Alibaba’s Qwen team has introduced QwQ-32B, a 32-billion-parameter AI model designed for advanced reasoning, coding, and math problem-solving. Because of reinforcement learning, it performs on par with larger models like DeepSeek R1. QwQ-32B is open-source under Apache 2.0 and available on Hugging Face and ModelScope.
Andrew Barto and Richard Sutton, pioneers of reinforcement learning, have won the 2024 Turing Award for their groundbreaking contributions to AI. Their work laid the foundation for modern AI systems like chatbots, autonomous vehicles, and personalized recommendations. Their work also bridged AI and neuroscience, revealing insights into dopamine’s role in human and machine learning.
3. Microsoft Reportedly Ramps Up AI Efforts To Compete With OpenAI
Microsoft is developing its own AI reasoning models called MAI, to reduce reliance on OpenAI and enhance its AI offerings. It is reportedly training much larger models relative to its more famous synthetic data-focused Phi series. These new models have been tested as potential replacements for OpenAI’s technology in Microsoft’s 365 Copilot system. Additionally, Microsoft plans to unveil future developments for its Copilot AI companion at a special event on April 4th, marking its 50th anniversary.
4. China’s Second DeepSeek Moment? Meet Manus, the First General AI Agent
Manus, developed by Chinese startup Monica, is an autonomous AI agent designed to handle complex tasks independently. Since its beta launch on March 6, 2025, it has generated significant buzz, with some comparing its impact to DeepSeek. Available by invitation only, it has sparked excitement among users eager to test its capabilities.
5. Mistral AI Introduced Mistral OCR
Mistral launched Mistral OCR, a multimodal OCR API that converts PDFs into AI-ready Markdown files, facilitating easier AI model ingestion. It outperforms competitors in complex and non-English documents and integrates them into RAG systems. Mistral OCR is available on its API platform and cloud partners, offering on-premise deployment for sensitive data handling.
6. Google Search’s New ‘AI Mode’ Lets Users Ask Complex, Multi-Part Questions
Google enhances its search experience by introducing expanded AI-generated overviews and a new “AI Mode.” The AI overviews will now cover a broader range of topics and be accessible to more users, including those not logged into Google. The experimental “AI Mode,” currently available to Google One AI Premium subscribers, offers a search-centric AI chatbot experience, providing generated answers based on Google’s search index.
7. Microsoft Dragon Copilot Provides the Healthcare Industry’s First Unified Voice AI Assistant
Microsoft launched Dragon Copilot, a unified AI voice assistant for healthcare. Designed to alleviate clinician burnout and streamline documentation, Dragon Copilot aims to improve efficiency and patient experiences while supporting healthcare workers across various settings with its advanced speech and task automation capabilities, rolling out in select regions.
Five 5-minute reads/videos to keep you learning
1. Starter Guide for Running Large Language Models LLMs
This article is a practical guide to running LLMs, covering key considerations like balancing model size and dataset requirements using scaling laws such as Chinchilla. It also highlights the importance of proper dataset preprocessing — like tokenization and cleaning — to improve efficiency.
2. What Changed in the Transformer Architecture
This article explores key improvements in Transformer architecture since 2017, focusing on efficiency and scalability. It covers the shift from sinusoidal positional encodings to Rotary Positional Embeddings (RoPE) for better handling of long sequences, the adoption of pre-layer normalization for more stable training, and the introduction of Grouped-Query Attention (GQA) to reduce computational costs.
3. AI’s Butterfly Effect: Early Decisions Matter More Than You Think
Based on insights from Polya’s Urn Model, this article shows how an initial random bias can have lasting effects on an AI system’s learning trajectory. The insights derived from Polya’s Urn Model deepen our understanding of the interplay between chance and choice and encourage a more thoughtful approach to managing data biases and long-term trends in complex systems.
This article explores diffusion-based LLMs, a novel approach to text generation that refines noisy data into structured outputs. It discusses how these models differ from traditional autoregressive LLMs, their potential benefits in reducing biases and improving efficiency, and their challenges in real-world applications.
5. AI Is Killing Some Companies, yet Others Are Thriving — Let’s Look at the Data
This article explores how AI-powered search and chatbots are reshaping the digital landscape, hitting some companies hard while leaving others untouched. It looks at why platforms like WebMD, G2, and Chegg are losing traffic as AI delivers instant answers, while sites like Reddit and Wikipedia remain strong. It also argues that user-generated content and community-driven platforms may have a built-in advantage in an AI-dominated world.
6. DeepSeek-V3/R1 Inference System Overview
The article provides an overview of DeepSeek’s inference system for their V3 and R1 models, focusing on optimizing throughput and reducing latency. It also discusses strategies to address these challenges such as increased system complexity due to cross-node communication and the need for effective load balancing across Data Parallelism (DP) instances.
Repositories & Tools
1. MetaGPT is an AI framework that acts like a software team, breaking down a simple request into detailed project plans, code, and documentation.
2. Light R1 introduces Light-R1–32B, a 32-billion-parameter language model optimized for mathematical problem-solving.
Top Papers of The Week
1. START: Self-taught Reasoner with Tools
The paper introduces START, a self-taught reasoning LLM that integrates external tools. This integration allows START to perform complex computations, self-checking, and debugging, addressing limitations like hallucinations found in traditional reasoning models. It uses Hint-infer (prompting tool use) and Hint-RFT (fine-tuning with filtered reasoning steps) to enhance accuracy. START, built on QwQ-32B, outperforms its base model and rivals top-tier models on math, science, and coding benchmarks.
2. Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Researchers have introduced Predictive Data Selection (PreSelect), a method enhancing language model pretraining by using fastText-based scoring for efficient data selection. Models trained on 30 billion tokens selected with PreSelect outperform those trained on 300 billion vanilla tokens, reducing compute needs tenfold. PreSelect also surpasses other methods, like DCLM and FineWeb-Edu, in 3 billion parameter models.
3. Unified Reward Model for Multimodal Understanding and Generation
UnifiedReward, a novel model for multimodal understanding and generation assessment, improves image and video preference alignment. By training on a large-scale human preference dataset, UnifiedReward facilitates pairwise ranking and pointwise scoring.
4. Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
Babel introduces an open multilingual large language model that covers the top 25 languages and supports over 90% of the global population. Babel employs a layer extension technique and elevates performance with two variants: Babel-9B for efficient use and Babel-83B, which sets new standards. Both variants demonstrate superior multilingual task performance compared to similar open LLMs.
5. Large-Scale Data Selection for Instruction Tuning
The paper examines large-scale data selection for instruction tuning and testing methods on datasets of up to 2.5M samples. It finds that many selection techniques underperform random selection at scale, while a simple representation-based method (RDS+) is both effective and efficient.
Quick Links
1. Google debuts a new Gemini-based text embedding model. Google claims that Gemini Embedding surpasses the performance of its previous embedding model, text-embedding-004, and achieves competitive performance on popular embedding benchmarks. Compared to the previous model, Gemini Embedding can accept larger chunks of text and code simultaneously and supports over 100 languages.
2. Cohere released a multimodal “open” AI model called Aya Vision. It can perform tasks such as writing image captions, answering questions about photos, translating text, and generating summaries in 23 major languages. Cohere is also making Aya Vision available for free through WhatsApp.
3. Anthropic has launched an upgraded Anthropic Console that lets everyone in your company collaborate on AI. The updated platform also introduces “extended thinking controls” for Claude 3.7 Sonnet, allowing developers to specify when the AI should use deeper reasoning while setting budget limits to control costs.
Who’s Hiring in AI
Data Scientist — Python @Motion Recruitment Partners (Florida, USA)
ML Engineer @Numerator (Remote/India)
Software Engineer, AI Decisioning @Hightouch (Remote/North America)
Gen AI Consultant @Capco (Pune, India)
Natural Language Processing (NLP) Intern @IMO Health (Hybrid/Texas, USA)
Junior Data Scientist Intern @INTEL (Hybrid/Singapore)
Software Engineer, GenAI Enablement @Principal Financial Group (Multiple US Locations)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.