#108: Conflicting Developments in the AI Regulation Debate
Also, FlashAttention-3, imminent new LLMs, OpenAI's "Strawberry" project, CRAG, MambaVision, and more!
What happened this week in AI by Louie
The ongoing debate over AI regulation gained focus again this week, with both positive and negative newsflow for the future of open-source AI. There is an inevitable tension between innovation and regulatory efforts to ensure AI safety. Do open-source AI and open-weight models aid safety by distributing power and allowing more eyes to investigate and address LLM safety risks? Or is it a safety risk as it makes it easier for bad actors to adapt these models to their purposes? Are some lobbying for AI safety primarily concerned with safety or establishing barriers to competition? Proposed regulations like California’s SB 1047 are at the center of this discussion, which has sparked considerable concern among AI researchers and developers. Antitrust concerns over big tech’s dominance in AI were also in focus, with Microsoft giving up its OpenAI board seat.
Andrew Ng added his voice to this debate, expressing deep concerns about California’s SB 1047, which continues to progress through the assembly. Ng argued that the bill’s vague and complex requirements could stifle innovation and disproportionately harm open-source contributors. He advocated for regulations focused on specific AI applications rather than broad, ambiguous mandates.
With unhelpful timing, the U.S. Department of Justice announced a successful operation to disrupt a Russian government-backed AI-enabled propaganda campaign. While LLM-assisted propaganda so far has perhaps been less widespread than feared, and in any case, we haven’t solved human propaganda either, this highlights the potential for AI to be used in harmful ways and adds risks of stricter regulations.
Contrasting with the potential open-source AI regulatory crackdown in California, the Federal Trade Commission (FTC) expressed strong support for open-weight and open-source AI models. The FTC noted that these models can drive innovation, reduce costs, and increase consumer choice, providing significant public benefits. However, the FTC also acknowledged the challenges and potential risks associated with open-source AI, calling for careful consideration of their impact on the market and consumers.
Finally — Donald Trump’s choice of Senator J.D. Vance as his vice-presidential candidate is potentially significant for AI regulation. Vance has voiced strong opinions on the risks associated with AI bias, particularly criticizing what he perceives as a left-wing bias in AI models. His comments highlighted fears that biased AI models could corrupt the information economy and emphasized the importance of open-source models as a solution to counteract these influences.
Why should you care?
The future of AI regulation is a key issue for the future pace of AI innovation and the concentration of power as AI adoption grows. For developers, understanding and navigating these regulations will be essential. The fear of legal repercussions could deter innovation, limit the release of new AI technologies, and particularly impact individuals and smaller startups. For the public, these developments underscore the importance of AI governance in protecting against misuse while fostering innovation that can benefit society.
As AI continues to evolve, the decisions made now regarding its regulation will have long-lasting impacts. Stakeholders across the board must engage in these discussions to ensure a balanced approach that promotes both safety and innovation. The open-source community, in particular, faces a pivotal moment that will determine its role in the future of AI. Let’s stay informed and proactive in shaping a future where AI can be both safe and innovative!
— Louie Peters — Towards AI Co-founder and CEO
If you enjoy reading our newsletter — please forward it to friends and colleagues! We try to add value by filtering and adding our thoughts to the overwhelming AI newsflow each week, and help sharing our content is always appreciated.
We are excited to announce our latest ‘shortcut’ video series on LLMs and GenAI research, an extension of our partnership with O’Reilly.
What’s Inside? Access to 10-minute introductory videos on the latest concepts and research in LLMs.
LLMs [Series 1]: Learn the latest approaches and techniques, including RAG, MoE models, building multimodal models, and improving LLM performance.
Generative AI Research Papers [Series 2]: Explore cutting-edge research in Generative AI with easy-to-understand explanations of the latest papers.
If you are an O’Reilly subscriber, you can watch the video series or read our latest book for free. Or sign up for a 10-day free trial here!
Hottest News
1.FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
FlashAttention-2 is widely used by most libraries to accelerate Transformer training and inference, already making attention faster on GPUs by minimizing memory reads/writes, but it has yet to take advantage of the latest H100s. FlashAttention-3 achieves a 1.5–2x speedup, reaching up to 740 TFLOPS on FP16 and nearly 1.2 PFLOPS on FP8. This increases GPU utilization to 75% of the theoretical maximum on H100s, up from 35% for FlashAttention-2. We think this is an important step in reducing the cost of the next generation of LLMs and may trigger the start of some major training runs.
2. New Leading 8k Output Tokens Available for Sonnet 3.5 in Claude API
Anthropic doubled the ‘maximum output’ token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API — we think this is significant as, despite the huge progress in expanding input token context windows (to 2 million+), output tokens can still be a constraint for many applications (such as translation and conversion tasks). Anthropic also made fine-tuning available for Haiku 3.0 in Amazon Bedrock. The fine-tuning API is currently available in preview.
3. Excitement Growing for Imminent New LLM Releases: LLama 3 405B and New Models in LMSYS Arena
The Information reported that META will release LLama 3 405B on July 23rd. We also saw three more new models appear for testing in LMSYS Arena; ‘upcoming-gpt-mini’, ‘column-u’, and ‘column-r’. This is where GPT4o was first secretly tested shortly before release, but it is unclear which company or companies the new models come from.
4. Microsoft Gives Up Observer Seat on OpenAI Board
Microsoft has stepped down from its observer seat on OpenAI’s board, which OpenAI noted reflected confidence in OpenAI’s trajectory under CEO Sam Altman. The move streamlines Microsoft’s relationship with OpenAI and we think it is likely motivated in part to address and reduce antitrust concerns regarding Microsoft’s influence over the company. OpenAI will not offer future observer roles, preferring direct partnership interactions, as with Microsoft and Apple.
5. OpenAI Secret Project “Strawberry” Aims To Boost AI Reasoning Power
Project Strawberry is OpenAI’s latest effort to improve AI reasoning. While the exact details are kept under wraps, it’s reportedly a significant leap forward in LLM capability. The project aims to enable AI models to plan ahead, understand the world more like humans do, and easily tackle complex multi-step problems. A different source also noted that internal models at OpenAI had scored over 90% on a MATH dataset (championship math problems).
6. OpenAI unveils five-level scale to AGI, aims to reach level 2 soon
OpenAI has created an internal five-level scale to track its large language models’ progress toward AGI. Potentially related to its project “Strawberry” above, it is reportedly on the cusp of achieving Level 2. Level 2 is “Reasoner,” demonstrating human-like problem-solving and characterized by advanced logic and reasoning. Level 3 is AI Agents that work on tasks and actions for days at a time.
7. Meta Researchers Distill System 2 Thinking Into LLMs, Improving Performance on Complex Reasoning
In a new paper, researchers at Meta FAIR present “System 2 distillation,” a technique that teaches LLMs complex tasks without requiring intermediate steps. In this research, Meta integrated System 2’s intricate reasoning methods (such as Chain-of-Thought) into the faster System 1 processes in LLMs.
Five 5-minute reads/videos to keep you learning
1. In-Depth Understanding of Vector Search for RAG and Generative AI Applications
This article focuses on vector search in RAG; it discusses why we need a vector search in RAG applications and how vectors and vector databases work. It also explores what makes Azure AI Search a good retrieval system and how it integrates.
This post shows how to build efficient prompts for your applications. It uses Amazon Bedrock playgrounds and Anthropic’s Claude 3 models to demonstrate how to build efficient prompts by applying simple techniques. It also talks about the anatomy of a prompt and presents an in-depth prompt example for Retrieval Augmented Generation.
3. Principles of Reinforcement Learning: An Introduction With Python
This article introduces fundamental principles and offers a beginner-friendly example of reinforcement learning. It explains the key terms in RL, its steps, algorithms, and implementation in Python.
4. Preventing Prompt Injection in OpenAI: A Case Study With Priceline’s OpenAI Tool “Penny”
This article suggests steps to mitigate prompt injections. It also proposes solutions like testing a better model, fully adapting a list of known patterns, running adversarial finetuning, and more.
5. A Data Leader’s Technical Guide to Scaling Gen AI
This article will cover three actions that data and AI leaders can consider to move from gen AI pilots to scaling data solutions. It focuses on how organizations can strengthen the quality and readiness of their data, examines how organizations can use gen AI to build better data products, and explores key data-management considerations.
Repositories & Tools
1. Storm is an LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
2. MobileLLM optimizes sub-billion parameter language models for on-device use cases.
3. LightRAG is a modular library like PyTorch for building LLM applications like chatbots and code generation, featuring a RAG pipeline.
4. Tabby is a self-hosted AI coding assistant offering an open-source and on-premises alternative to GitHub Copilot.
Top Papers of The Week!
1. CRAG — Comprehensive RAG Benchmark
This paper introduces the Comprehensive RAG Benchmark (CRAG), a factual question-answering benchmark of 4,409 question-answer pairs, and mock APIs to simulate web and Knowledge Graph (KG) search. It contains diverse questions across five domains and eight question categories. It reflects varied entity popularity from popular to long-tail and temporal dynamisms ranging from years to seconds.
2. MambaVision: A Hybrid Mamba-Transformer Vision Backbone
This paper proposes a novel hybrid Mamba-transformer backbone. The work redesigns the Mamba formulation to enhance its capability for efficient modeling of visual features. The MambaVision models achieve a new State-of-the-Art (SOTA) performance in terms of Top-1 accuracy and image throughput.
3. MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
MJ-Bench is a new benchmark for evaluating multimodal reward models that provide feedback on text-to-image generation technologies, such as DALLE-3 and Stable Diffusion. It tests models on criteria such as alignment, safety, image quality, and bias. Notably, the benchmark found that closed-source VLMs like GPT-4o excel in providing effective feedback.
4. Distilling System 2 into System 1
This work examines the integration of System 2’s intricate reasoning methods (such as Chain-of-Thought) into the faster System 1 processes in LLMs. By employing self-supervised learning, the authors have improved System 1 performance and lowered computation costs by embedding System 2’s reasoning capabilities into System 1, suggesting a more efficient approach to handling complex reasoning in AI.
Quick Links
1. Towards AI recently tested Launchpad by Latitude.sh, a container-based GPU cloud for inference and fine-tuning. Launchpad’s notable feature is its advanced, high-level dedicated container-based GPUs, capable of handling the significant computational demands of AI workloads.
2. Our friends at Mira have come out of stealth and announced their $9m seed raise. Mira is building decentralized AI infrastructure. They abstract AI infrastructure into “Flows”, a new AI building block that combines models, data & compute into a specific instruction set. Developers leverage Flows to minimize overhead & contributors publish diverse Flows on Mira, creating an ecosystem of AI resources. Mira already has over a dozen teams leveraging their Flow Market, contributing complex AI products across various sectors. Get early access to their platform here!
3. Amazon’s AI-powered shopping assistant, Rufus, is now available for all U.S. customers in the Amazon mobile app. The AI chatbot has been trained on Amazon’s product catalog, customer reviews, community Q&As, and other public information.
4. AWS launched App Studio to build internal enterprise applications from a written prompt. Amazon defines enterprise apps as having multiple UI pages that can pull from various data sources, perform complex operations like joins and filters, and embed business logic.
5. Patronus AI unveiled Lynx, an open-source model designed to detect and mitigate hallucinations in LLMs. Lynx outperforms industry giants like OpenAI’s GPT-4 and Anthropic’s Claude 3 in hallucination detection tasks, representing a significant leap forward in AI trustworthiness.
6. Intel Capital Backs AI Construction Startup That Could Boost Intel’s Own Manufacturing Prospects Intel Capital is leading a $15 million investment into Buildots, a company that uses AI and computer vision to create a digital twin of construction sites. Buildots, which uses AI and computer vision to create digital twins of construction sites, has now raised $121 million.
Who’s Hiring in AI!
Senior Machine Learning Engineer, Generative AI, AGI Inference Engine @Amazon (Poland)
Senior Software Engineer — Frontend, Generative AI @Scale AI (New York, NY, USA)
Solutions Architect, Generative AI Specialist @NVIDIA (USA/Remote)
Senior Generative AI Developer @Varicent (Canada/Remote)
Senior ML/AI Researcher — Game AI @Regression Games (USA/Remote)
Data Entry Specialist @Capri Healthcare Ltd (UK/Remote)
Technical Consultant @Salesforce (USA/Remote)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are enjoying our latest book, Building LLMs for Production, could you take a moment to drop an honest review?