#115: LLM Adoption Taking Off? OpenAI API Use Up 2x in 5 Weeks, LLama at 350m Downloads.
Also, Cohere's Command R, new Qwen2 models, HelixFold3, OpenAI Strawberry news, and more!
What happened this week in AI by Louie
This week, we saw several new LLMs released, including an upgrade to Cohere’s Command R and new Qwen2 models. Outside of LLMs, we also saw HelixFold3 from Baidu, an open-source version of Deepmind’s AlphaFold 3 Biomolecular Structure Prediction model. In the LLM world, our eyes were also on new evidence of adoption acceleration. Amid new fundraising news, OpenAI confirmed more than 200 million weekly ChatGPT users and that API usage has doubled in just 5 weeks since the launch of its cheaper GPT-4o mini model. We have seen this ourselves as we moved our jobs board for AI professionals onto GPT-4o mini and have already processed over 15,000 new AI jobs with this pipeline. META also disclosed its Llama model family has reached 350 million model weight downloads on Hugging Face, with 20 million in just the last month. Access to Llama models via API at its cloud partners grew 10x from January to July.
Nvidia still stands at the heart of the Generative AI revolution, and this week, it reported $26.3 billion in first-quarter data center revenue, which is up 154% year on year. They also disclosed several interesting data points, such as 45% of their data center revenue from cloud service providers and more than 50% coming from consumer, internet, and enterprise companies. They also estimated that inference workloads have driven more than 40% of NVIDIA’s data center revenue in the past year. Additionally, they expect sovereign AI revenue to reach over $10 billion this year and expect their software, SaaS, and support revenue to approach a $2 billion annual run rate by the end of the year. It is important to note that not all of NVIDIA’s huge acceleration in GPU sales for AI workloads is due to LLMs and Generative AI. In fact, a large driver has been companies moving recommender system models (e.g., social media feeds) from CPUs onto GPUs. A large amount of GPU usage is also for data processing at large tech companies and not necessarily content generation.
Why should you care?
Two big questions come up again and again in AI over the past year: Will open-weight LLMs or closed API LLMs get the most adoption? Will end-user revenue or return on investment from cost savings from internal AI-assisted workflows grow to justify the huge investment in GPUs? We think the answer to the first question is that both open and closed LLMs have their place, and each offers different strengths and trade-offs. META was able to release the ~$100m Llama 3.1 model weights for free. It remains to be seen if they will still make $1–2bn training cost models available for free, but their signaling here is positive currently.
Overall, the current pace of adoption of both open and closed LLMs looks very healthy! While we are still waiting for a flagship next-generation model (maybe the first with the next level of compute budget will be OpenAI’s Strawberry or Orion, or maybe Claude 4.0 or Gemini 2.0), progress in LLMs in 2024 has been on inference cost reductions. This lower cost enables many more LLM use cases and is beginning to grow LLM’s return on investment. Meanwhile, large tech companies also get great immediate value from recommender systems and data processing.
However, we are still extremely early in widescale enterprise adoption of LLMs. We see huge value for even today’s LLMs, particularly for internal enterprise workflows. External consumer-facing products require some extra guardrails as it is hard to train your end users on LLM weaknesses — but here, too, we see many high-value use cases. Often, a large barrier is still staff training and staff imagination — people do not yet know how to use LLMs effectively and diligently and work with their sometimes unreliable responses. They also do not know exactly how to put LLMs to work in their current daily workflows. It takes a long time for enterprises to explore, test, and develop new technologies. Still, in particular, it takes a lot of LLM pipeline customization to get to the reliability threshold needed for mass rollout. We think it is often going to be best to put these customized LLM products in the hands of employees where the use cases are more clearly defined, and results are more reliable. For this reason, Towards AI is focussing on helping to teach the LLM customization stack of RAG, Prompt Engineering, Fine-Tuning, and Agents, and we have a new, extremely in-depth practical course coming soon!
— Louie Peters — Towards AI Co-founder and CEO
This issue is brought to you thanks to GrowthSchool:
Use AI to 10X your productivity & efficiency at work (free bonus)
Still struggling to achieve work-life balance and manage your time efficiently? Join this 3-hour Intensive Workshop on AI & ChatGPT tools (usually $399) but FREE for the first 100 Mindstream readers.
Save your free spot here (seats are filling fast!) ⏰
An AI-powered professional will earn 10x more. 💰
An AI-powered founder will build & scale his company 10x faster 🚀
An AI-first company will grow 50x more! 📊
Want to be one of these people & be a smart worker? Free up 3 hours of your time to learn AI strategies & hacks that less than 1% people know!
🗓️ Tomorrow | ⏱️ 10 AM EST
👉 Hurry! Click here to register (FREE for First 100 people only) 🎁
Hottest News
1. New Details on OpenAI’s “Strawberry” Are Out — and There’s a New Model Called Orion
Finally, more details on OpenAI’s much-hyped “Strawberry” model (previously Q*) emerged in an Information article. Strawberry was shown to US national security this summer and is due for release this fall. Strawberry can solve novel math problems and has been trained on programming tasks. It also shows progress in language-related challenges like complex word puzzles. OpenAI is reportedly employing “test-time computation” to enhance Strawberry’s problem-solving abilities, allowing the model more time to process complex queries. A key focus now is on model distillation: OpenAI is working on a smaller, faster, cheaper version of Strawberry for integration into ChatGPT. Strawberry is also being used to generate synthetic training data for “Orion,” apparently their next flagship LLM.
2. Meta’s Llama Models Hit 350 Million Downloads, Leads Open-Source AI
Meta’s Llama series of AI models has become the fastest-growing open-source family of models, securing 350 million downloads globally on Hugging Face, with 20 million made last month alone. In addition to model weight downloads, Llama is also accessed via API through Meta’s cloud partners, such as Amazon Web Services (AWS), Microsoft’s Azure, Databricks, Dell, Google Cloud, Groq, NVIDIA, IBM Watsonx, Scale AI, Snowflake, and more.
The PaddleHelix research team at Baidu has released their AlphaFold 3 replication under an open-source noncommercial license called HelixFold3 for Biomolecular Structure Prediction. HelixFold3 achieves an accuracy comparable to AlphaFold 3 in predicting the structures of conventional ligands, nucleic acids, and proteins.
4. Cohere Released an Improved Version of the Command R Series
Cohere released an improved version of Command R and Command R+ models with improvements across coding, math, reasoning, and latency. The improved version of Command R has demonstrated material gains across the board and is now on par with the prior version of the much larger Command R+.
5. Qwen Announces Qwen2-VL and Open-Source Access for Qwen2-Vl-2B and Qwen2-VL-7B
Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model families. Qwen2-VL can understand videos over 20 minutes, support multiple languages, and be integrated with mobile phones, robots, etc. They also open-sourced Qwen2-VL-2B and Qwen2-VL-7B with an Apache 2.0 license and released the API of Qwen2-VL-72B!
6. Magic Labs Released LTM-2-Mini, a 100 Million Token Context Window Model
Magic has developed a Long-Term Memory (LTM) model capable of handling up to 100 million tokens in context, which vastly outperforms existing models in terms of memory efficiency and processing power. They have also addressed the challenge of enhancing AI models’ ability to process and reason with ultra-long contexts during inference by introducing a new evaluation tool called HashHop. The LTM-2-mini model, trained using this method, shows promising results in handling up to 100 million tokens, demonstrating its ability to reason over large contexts far more efficiently than traditional models. The generative AI coding startup also landed a $320M investment from Eric Schmidt, Atlassian, and others.
7. Apple and Nvidia in Talks To Invest in OpenAI
Apple and Nvidia are in talks to invest in OpenAI as part of a new deal that would value the AI company at $100 billion. Thrive Capital would lead the deal and might also include Microsoft.
8. Cerebras Introduced Cerebras Inference
Cerbras announced Cerebras inference that delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, 20x faster than NVIDIA GPU-based hyperscale clouds. Notably, Cerebras stands alone in offering immediate responses at a rate of 450 tokens per second on the 70B model.
9. Google Is Rolling Out Gems and Imagen 3 to Gemini Advanced
Google’s Gemini has launched Gems, a feature for custom task-oriented Gemini chatbot versions similar to OpenAI’s GPTs, for Advanced subscribers to boost productivity and creativity. It is also rolling out Imagen 3, an advanced image generation tool.
10. Claude Artifacts Are Now Generally Available
Claude.ai introduces Artifacts, a new feature for enhancing creativity and collaboration in real-time project development. It is now available across all user plans on mobile and desktop. The tool supports a range of tasks, including coding, prototyping, and data visualization, and it has gained popularity, with millions created since its June preview release.
11. OpenAI, Anthropic Sign Deals With US Govt for AI Research and Testing
AI startups OpenAI and Anthropic have signed deals with the United States government for research, testing, and evaluation of their artificial intelligence models. The first-of-their-kind agreements come at a time when companies are facing regulatory scrutiny over the safe and ethical use of AI technologies.
Seven 5-minute reads/videos to keep you learning
1. Andrej Karpathy’s Experience With Cursor and Claude Sonnet
Andrej Karpathy highlights the efficiency gains in coding using VS Code Cursor and Sonnet 3.5 alongside GitHub Copilot, indicating a move towards “half-coding” with AI completions. He notes a substantial reduction in manual coding and expresses difficulty in reverting to pre-AI coding methods used three years prior.
2. AI Companies Are Pivoting From Creating Gods to Building Products
AI companies are pivoting from theoretical development to delivering market-fit AI products, tackling cost, reliability, and privacy issues to improve commercial potential. There remains a complex path ahead, with significant investment and ongoing efforts needed to overcome technical and societal challenges in integrating AI into consumer markets.
3. Key Insights Into the Law of Vision Representations in MLLMs
This article summarizes the paper “Law of Vision Representation in MLLMs”. The primary idea is to control the variables within the MLLM and, by only changing the vision representation, identify two factors, cross-modal Alignment and Correspondence, closely related to the model’s performance on downstream tasks.
4. What I’ve Learned Building MLOps Systems for Four Years
The author reflects on four years of experience building MLOps systems, discussing the challenges of implementing ML in real-world applications like energy and healthcare and the fusion of software engineering with ML operations. The piece also explores the evolving roles and identities in the tech field, specifically distinguishing between MLOps Engineers and ML Engineers in the industry.
Tool use overcomes many of LLMs’ core limitations. However, problems arise when you try to implement tool use. Documentation is often sparse, inconsistent, and even contradictory. One approach to supporting tool use is to extend chat templates to support tools. The authors walk through the challenges of creating a templating system and how they solved them.
6. The Hidden Risks of Relying on Third-Party AI: Why Your Software Stack Deserves a Second Look
While third-party AI services offer powerful capabilities, they also introduce a subtle but significant risk: the potential for unexpected changes that can disrupt your carefully crafted software ecosystem. This article explores why controlling your AI stack is crucial and how private language models (LLMs) might be the solution you’ve been overlooking.
7. OpenAI’s New Model, Strawberry, Explained
In this essay, the author dives into the anticipated Strawberry model and why it matters. And answers why models like ChatGPT fail miserably if you ask them how many “r”s are in the word “strawberry.”
Repositories & Tools
1. GPT Engineer helps build apps for the web 10x faster.
2. MLE-Agent is a pairing LLM agent for machine learning engineers and researchers.
3. Julius AI is an AI data analyst who analyzes datasets, creates visualizations, and even trains ML models with only a prompt.
4. Kotaemon is an open-source RAG-based tool for chatting with your documents.
5. GameNGen is a research game engine powered by a neural model, enabling real-time interactions in complex environments with extended trajectories.
6. Fabric provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
7. PromptMage simplifies the process of creating and managing complex LLM workflows.
Top Papers of The Week
1. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
This paper outlines a technique to include error correction data directly in the pretraining stage to improve reasoning capabilities. The resulting model outperforms alternatives trained in error-free data.
2. Writing in the Margins: Better Inference Pattern for Long Context Retrieval
The paper presents “Writing in the Margins” (WiM), a technique that boosts large language models’ performance on long-sequence retrieval tasks by prefilling a chunked key-value cache, enabling improved segment-wise inference. This method enhances reasoning accuracy by 7.5% and aggregation task F1-scores by 30% without the need for fine-tuning.
3. Diffusion Models Are Real-Time Game Engines
This paper presents GameNGen, a game engine powered by diffusion models and interactions with real environments over long trajectories. GameNGen can simulate a DOOM game in over 20 frames in a single TPU.
4. Efficient LLM Scheduling by Learning to Rank
The paper introduces a novel scheduling method for LLMs using a learning-to-rank approach to predict output lengths, which improves scheduling efficiency. This approach reportedly cuts latency by 2.8x in chatbots and boosts throughput by 6.5x in synthetic data generation tasks.
5. xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
This paper presents xGen-VideoSyn-1, a text-to-video (T2V) generation model. It employs a novel approach for generating high-fidelity videos from text descriptions, using compressed representations to boost the synthesis process in terms of efficiency and quality.
6. Memory-Efficient LLM Training with Online Subspace Descent
The paper introduces the Online Subspace Descent optimizer that enhances memory efficiency during LLM training by utilizing online PCA instead of SVD to update projection matrices. It is supported by the first convergence guarantees for this method and is compatible with major optimizers. Experiments on LLaMA models with the C4 dataset show that it surpasses other low-rank methods in perplexity and approaches full-rank baseline performance in downstream tasks.
7. Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
This paper introduces MOHAWK, a new method for distilling knowledge from a large, pre-trained transformer model (teacher) to a smaller, subquadratic model (student) like a state-space model (SSM). It can distill a Mamba-2 variant (Phi-Mamba) using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens. Using less than 1% of the training data typically required, Phi-Mamba significantly outperforms all existing open-source non-transformer models.
Quick Links
1. Amazon’s revamped Alexa will be powered by Anthropic’s Claude, according to Reuters. The report also said that the initial versions of Amazon’s smarter, subscription-based voice assistant that used the company’s own AI proved insufficient, often struggling with words and responding to user prompts.
2. MIT developed a new algorithm that solves complicated partial differential equations by breaking them into simpler problems, potentially guiding computer graphics and geometry processing. This framework can help better analyze shapes and model complex dynamical processes.
3. Elon Musk recently supported California’s controversial AI safety bill, SB 147. The bill focuses on the ethical and safety aspects of AI technologies and is opposed by other top tech industry players. Musk’s endorsement of the bill has sparked widespread discussions.
4. Open AI rolled out enhanced controls for File Search in the Assistants API to help improve the relevance of your assistant’s responses. You can now inspect the search results returned by the tool and configure their rankings.
Who’s Hiring in AI
Machine Learning Research ICT4 @Apple (Cupertino, CA, USA)
Data & Analytics Intern @Santander US (Boston, MA, USA)
Generative AI Platform Architect @The Walt Disney Company (Orlando, FL, USA)
Campus Graduate — 2025 Data Science Finance Summer Internship @American Express (New York, NY, USA)
Applied AI/Machine Learning Senior Associate @JPMorgan Chase (Jersey City, NJ, USA)
Jr. Data Engineer @MSys Inc. (Remote)
Presales Consultant — Data & AI @SoftwareONE (Santo Domingo, Dominican Republic)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.