TAI #128: Scaled Self-Driving Arriving via Different Routes? Waymo vs FSD v13
Also, the QwQ-32B reasoning model, progress on distributed training, and more!
What happened this week in AI by Louie
This week, in China, Alibaba quickly followed Deepseek with an impressive o1 reasoning model competitor, QwQ-32B-Preview. We also saw great progress on distributed and decentralized LLM training with a 15bn parameter model trained by Nous Research over the internet, and a 10bn parameter, 1Trn token training run from Prime Intellect across 30 compute contributors in 5 countries. Distributed training has many challenges, but if it can be solved efficiently it holds high hopes for allowing pooled resources to help open source projects compete but also to reduce the need for ever larger single location training clusters (which come with logistics and infrastructure challenges such as energy supply). Also this week, Tesla launched its FSD V13 supervised self-driving software to its first external customers — now allowing cars to drive themselves from parked location to parked location anywhere in the US (with driver supervision and intervention) upon the press of a single button. This v13 model came with a new architecture, 4.2x data scaling, 5x training compute scaling (50k GPU cluster), and 2x reduced photon to control latency. It also reduced necessary driver interventions by 500% (according to Elon) relative to the most recent FSD v12.5.5.3 software and the company’s commentary suggests it is now at a ~500x improvement year to date.
Self-driving cars have had many false starts, but both Waymo and Tesla FSD v13, with very different architectures, methods, and strategies have made huge leaps recently. They may debatably now be on the home straight of “the march of the 9s” towards achieving scalable human-level Robotaxi driving safety and convenience. This “march of 9s” means incrementally solving decimal points of reliability — e.g., 99.99% vs. 99.9% probability of no accident per mile (we think LLM models and customized RAG/fine-tuning/agent pipelines are also on a similar march towards achieving greater reliability on subtasks). But is the final stretch 4 months or 4 years away, and will data, compute, sensor, human developer, economic, or architecture bottlenecks get in the way?
Waymo now delivers 100,000 paid weekly driverless trips in large regions of 3 US cities from a fleet of ~700 cars and is now driving more than 1 million miles per week. The company achieves accident rates significantly safer than the average human (2x plus), but trips do take longer, and it can’t always take the optimum route (some edge cases are avoided rather than solved). The company has driven 10s of millions of miles in the real world and billions in simulation. The company recently announced its 6th generation design which comes with 13 cameras, four lidars, and six radar units. It also just raised $6bn for expansion.
Tesla, on the other hand, does not yet offer driverless trips, but it has control of full sensor and driver data from ~7 million cars with its self-driving hardware suite. The latest generation contains 9 5-megapixel cameras, an unused radar, and Tesla’s custom-designed inference computer. Well over 2 billion miles have been driven by its cars’ AI brain on driver-supervised FSD mode but its customer fleet is likely driving over 80 billion miles annually — all of which can be used to collect data for model training. That vast majority of driving data is uninteresting, so the key is that the cars can recognize interesting scenarios or priority problem cases and only then send sensor and driver action data back to HQ. Tesla can then solve new edge cases simply by requesting fleet data collection for these scenarios and using it to retrain their model.
Waymo and Tesla AI both have very impressive engineering teams, but the companies’ very divergent strategies often lead to confusion. The core difference between the two strategies often gets mischaracterized as LIDAR sensors (Waymo) vs. No LIDAR cameras only (Tesla). The reality is instead a bet on needing a real-world Data Heavy solution (Tesla — millions of cars needed to collect enough data) vs betting that a real-world Data Lite Solution (Waymo — hundreds of cars needed to collect data) will suffice. The other core differences arise from this; 1) solve using a customer-owned fleet and customer data (Tesla) vs. using a company-owned fleet and company-employed driver data (Waymo), 2) Solve via incremental improvements on a driver assistance solution (Tesla) vs. solve via a jump to full autonomy (Waymo) and 3) solve via an end to end deep learning model solution (Tesla) vs. solve via a mix of multiple deep learning models, complex sensors and human rules and code (Waymo).
The key rationale for betting you need billions of miles of real-world data is that this is needed for finding the 1 in 50 million miles edge cases you don’t know you don’t know, and for feeding deep learning scaling laws. Once you have bet on general driving being a very data-heavy problem, you are constrained to choose an affordable sensor suite, you can integrate into consumer-owned cars ($30–60k car price) so you can then use this fleet to collect the data needed at zero capital cost per car (vs. instead $100bns + R&D investment and huge test driver cost if normal car customers can’t afford the cars and so you had to buy the fleet yourself).
If you instead think you only need millions of miles of real-world data to solve driving (combined with more sensors and the imagination of software engineers to think up and simulate and solve all edge cases) — then you can instead afford to invest capital to build your own (much smaller) car fleet where you can afford a much larger budget for sensors ($100–200k total cost per car).
Where does each company lead? Broadly I think Waymo leads in Driverless intervention and accident rate, regulatory approval, rollout of full robotaxi in 3 cities, and Sensor suite redundancy. Tesla leads in: Data access, data mining pipeline, car fleet size, geographic reach, speed of progress, Driving convenience, scalability, economics, end to end Deep Learning.
Will Waymo’s approach of steady incremental scaling of its own fleet city by city win the race? Did Tesla make the right choice to solve scalability and economics first and driving interventions second?
Waymo’s main roadblocks are likely to be manual work needed to scale to new cities, routes, and conditions, economic adoption of a more expensive solution, and access to a large enough dataset of diverse real-world edge cases. Tesla already has a huge fleet of affordable cars and is aiming to reach human-level driving interventions across the US in Q225 but they have hit dead ends in progress on previous architectures several times before. What roadblocks could get in the way of Tesla FSD’s recent momentum? Several things could:
A) Tesla car inference compute (FLOPS or memory for weight file/model size) — its AI4 chips have 3–5x FLOPS and 2x RAM vs. the previous AI3/HW3, which they are just now starting to utilize — but the next AI5 computer due in 2H25 should have 10x more FLOPS again — if larger models are needed this compute may be needed.
B) Reducing signal to noise in intervention data — it will get increasingly hard to tell signal from noise in car fleet intervention data where most interventions are from humans taking excess caution before waiting to see the car’s response to difficult scenarios. Tesla will have a harder job collecting data on the real priority unsolved problems from its fleet.
C) Progress in model distillation. Large neural networks are more capable but more expensive and slower to run. Many LLMs now use model distillation where larger “teacher” models are used to teach smaller and more affordable “student” models and impart some of their capabilities. I think Tesla is very likely using a similar approach given their huge 50k+ GPU training clusters capable of training huge models but very constrained inference compute budget for the final models deployed in the cars. Model distillation has made a lot of progress but further breakthroughs here could be needed to get the most out of larger training clusters.
D) Architecture — Tesla has largely fully rewritten its architecture from scratch 3–4 times already since 2016. Is another rewrite needed? Are incremental improvements needed? Do we just need more data and compute now?
E) Regulation? Once driving is actually solved regulatory approval can always add further delays and will vary by region. If there is extremely strong evidence from billions of miles of potential for significant numbers of lives saved, however, I see a lot of pressure for this to be accelerated in slow-moving jurisdictions.
Whoever wins the race to millions of fully driverless cars- the fact that two companies are now getting much closer and via very different routes increases the odds that one of them will finally get to a scalable solution!
Why should you care?
I often see the importance of self-driving cars go underappreciated, and it gets dismissed as a novelty luxury technology. But there is much more to it. Traffic deaths are the most obvious problem addressed; 1.2 million people die in traffic accidents each year, and over 10 million suffer life-changing impacts. Many young people are also impacted by this relative to most diseases. This is a huge scale of suffering that can eventually be avoided.
The second key benefit is electric vehicle acceleration; and this doesn’t just matter for climate change but also for national energy security and trade balances. Robotaxi EVs will have 5x + utilization vs. passenger-owned cars and are an economic no-brainer to scale quickly (the much longer vehicle life and lower operating costs make EVs much more favorable vs. petrol for robotaxis). This all means it will very likely rapidly accelerate replacing the global car fleet of fossil fuel miles driven with electric miles driven, and less cars are needed to be built to deliver it.
The third key benefit is transport freedom; many people’s lives are impacted by not affording a car or not being able to drive. Robotaxi EVs, once scaled, should cost 2–4x less per mile than owning your own car and will give many more people access to flexible transport.
— Louie Peters — Towards AI Co-founder and CEO
Hottest News
1. Alibaba Releases an ‘Open’ Challenger to OpenAI’s o1 Reasoning Model
Alibaba’s Qwen team has released QwQ-32B-Preview, an open-source AI model comprising 32 billion parameters specifically designed to tackle advanced reasoning tasks. It excels in benchmarks like GPQA, AIME, MATH-500, and LiveCodeBench. QwQ-32B-Preview is “openly” available under an Apache 2.0 license, meaning it can be used for commercial applications. This model followed quickly after Deepseek’s R1 reasoning model which is also due to be open sourced.
2. Nous Research Is Training an AI Model Using Machines Distributed Across the Internet
Nous Research announces the pre-training of a 15B parameter language model over the internet, using Nous DisTrO and heterogeneous hardware. Nous is livestreaming the pre-training process on a dedicated website showing how well it is performing on evaluation benchmarks as it goes along and also a simple map of the various locations of the training hardware behind the exercise, including several places in the U.S. and Europe.
3. PRIME Intellect Releases INTELLECT-1 (Instruct + Base)
PRIME Intellect has released INTELLECT-1 (Instruct + Base), the first 10-billion-parameter language model collaboratively trained across the globe. This model demonstrates the feasibility of using decentralized, community-driven resources for training advanced LLMs. PRIME Intellect utilized their PRIME framework, specifically designed to overcome the challenges of decentralized training, including network unreliability and the dynamic addition or removal of compute nodes.
4. ElevenLabs’ New Feature Is a NotebookLM Competitor for Creating GenAI Podcasts
ElevenLabs introduced GenFM, a feature enabling AI-generated multispeaker podcasts, on the ElevenLabs Reader iOS app. Supporting 32 languages, GenFM uses YouTube videos or texts to create podcasts with natural human elements.
5. OpenAI’s Sora Video Generator Appears To Have Leaked
A group leaked OpenAI’s Sora video generator access, protesting alleged pressure on testers and inadequate compensation. They set up a Hugging Face project to generate short, watermarked videos. OpenAI temporarily suspended access, stating that Sora remains in the “research preview, “ emphasizing voluntary participation and safety.
6. AI2 Releases New Language Models Competitive With Meta’s Llama
Ai2 introduced OLMo 2, a new family of 7B and 13B models trained on up to 5T tokens. By refining training stability, adopting staged training processes, and incorporating diverse datasets, the researchers bridged the performance gap with proprietary systems like Llama 3.1. OLMo 2 leverages improvements in layer normalization, rotary positional embeddings, and Z-loss regularization to enhance model robustness.
7. Andrew Ng’s Team Releases ‘Aisuite’: A New Open Source Python Library for Generative AI
Andrew Ng’s team has released a new open-source Python library for Gen AI called AI Suite. This library addresses the interoperability issue and simplifies the process of building applications that utilize large language models from different providers. The library introduces a standard interface that allows users to choose a “provider: model” combination, enabling an easy switch between different language models without needing to rewrite significant parts of the code.
Uber is using gig workers to get into the AI labeling business. The company’s new “Scaled Solutions” division claims it can connect businesses to “nuanced analysts, testers, and independent data operators” using its platform. It’s an extension of an internal team with members based in the US and India that do new feature testing and other tasks like converting restaurant menus to Uber Eats selections.
Five 5-minute reads/videos to keep you learning
1. Build a Chat-With-Document Application Using Python
In this article, you will build your own chat-with-document application in Python using certain packages. The author walks you through installing the packages, extracting text from the PDF file, splitting the text into documents, compiling the data, creating the retrieval model, configuring the language and retrieval models, creating the RAG module, and more.
2. I Built an OpenAI-Style Swarm That Runs Entirely on My Laptop. Here’s How.
The article explains how to create a multi-agent AI system inspired by OpenAI’s swarm concept. The system uses modular AI agents to handle specific tasks collaboratively, running entirely on local hardware for efficiency. It highlights a practical approach to decentralized AI, enabling advanced capabilities without requiring extensive resources.
3. Faster Text Generation With Self-Speculative Decoding
This blog post explores the concept of self-speculative decoding, its implementation, and practical applications using the Hugging Face transformers library. You’ll learn about the technical underpinnings, including early exit layers, unembedding, and training modifications. It also offers code examples, benchmark comparisons with traditional speculative decoding, and insights into performance trade-offs.
Microsoft developed Medprompt last year, a novel approach to maximize model performance on specialized domains and tasks without fine-tuning. By leveraging multiphase prompting, Medprompt optimizes inference by identifying the most effective chain-of-thought (CoT) examples at run time and drawing on multiple calls to refine the output. This blog discusses prompting strategies to make the most of o1-preview models, other factors to consider, and directions for run-time strategies.
5. Generative AI vs. Predictive AI: What’s the Difference?
Generative AI is not predictive AI. Predictive AI is its own class of artificial intelligence, and while it might be a lesser-known approach, it’s still a powerful tool for businesses. This article examines the two technologies and the key differences between each. It also shares specific use cases for each approach.
Repositories & Tools
1. AI Suite makes it easy for developers to use multiple LLM through a standardized interface.
2. Langflow is a low-code app builder for RAG and multi-agent AI applications.
3. Rerun is building the multimodal data stack to model, ingest, store, query, and view robotics-style data.
4. Keep is an open-source alert management and AIOps platform.
Top Papers of The Week
1. Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
This paper examines how preference-based learning enhances language models (LMs). It identifies four critical components: preference data, learning algorithms, reward models, and policy training prompts. The study reveals that high-quality preference data contributes most significantly to model performance, improving instruction-following and truthfulness by up to 8%. While PPO slightly outperforms DPO in some metrics, such as mathematical tasks and general domains, scaling reward models yields modest improvements beyond these areas.
2. MinerU: An Open-Source Solution for Precise Document Content Extraction
MinerU is an open-source solution for high-precision document content extraction. It leverages the sophisticated PDF-Extract-Kit models to extract content from diverse documents effectively and employs finely-tuned preprocessing and postprocessing rules to ensure accuracy. MinerU consistently achieves high performance across various document types, enhancing the quality and consistency of content extraction.
3. Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Mooncake is the serving platform for Kimi, an LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. It balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs).
4. DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration
This paper introduces DrugAgent, a multi-agent framework aimed at automating machine learning programming in drug discovery. It incorporates domain expertise by identifying specific requirements and building domain-specific tools. A preliminary case study shows DrugAgent’s potential to overcome key limitations LLMs face in drug discovery.
5. SWIFT: A Scalable Lightweight Infrastructure for Fine-Tuning
This paper presents SWIFT, a customizable infrastructure for large models with the support of over 300+ LLMs and 50+ MLLMs. SWIFT is an open-source framework that provides comprehensive support for fine-tuning large models, particularly systematic support for MLLMs. It also integrates post-training processes such as inference, evaluation, and model quantization to facilitate fast adoptions.
Quick Links
1. Elon Musk files for injunction to halt OpenAI’s transition to a for-profit. The motion is the latest salvo in Musk’s legal battle with OpenAI, which, at its core, accuses the company of abandoning its original nonprofit mission to make the fruits of its AI research available to all.
2. Meta introduced SPDL, a new data-loading solution for AI model training. SPDL is a framework-agnostic data-loading solution that utilizes multi-threading, which achieves high throughput in a regular Python interpreter (built without a free-threading option enabled).
Who’s Hiring in AI
Anthropic AI Safety Fellow @Anthropic (San Francisco, CA, USA)
Sr. Manager, Machine Learning (GenAI) @PayPal Inc. (New York, NY, USA)
Software Engineer @Sand Tech Holdings Limited (Remote)
Frontend Developer @Gusto, Inc. (Multiple Locations/Remote)
Lead React Developer (Capital Markets) @Capco (Poland/Remote)
Senior Software Engineer — Market Data @Alpaca (USA)
AI Coding Tutor (Full-Time or Part-Time) @xAI (Remote)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.