What happened this week in AI by Louie
The ongoing race between open and closed-source AI has been a key theme of debate for some time, as has the increasing concentration of AI research and investment into transformer-based models such as LLMs. As we have discussed, there have been some signs of open-source AI (and AI startups) struggling to compete with the largest LLMs at closed-source AI companies. This is driven by the need to eventually monetize to fund the increasingly huge LLM training costs. This is only clearer with this week’s news of Microsoft and OpenAI planning a >$100bn 5 GW AI data center for 2028. This would be its 5th generation AI training cluster. They are currently part way through Gen 3 deployment, while Gen 4 is due in 2026.
Clearly, only a few companies, let alone open-source projects, will be able to compete with budgets like this. But a key question arises — is this amount of training compute really going to be needed, and if so, are LLMs really the right path forward? So far, LLM capability improvements have been relatively predictable with compute and training data scaling — and this likely gives confidence to plan projects on this $100bn scale. However, the AI community has also been making a lot of progress in developing capable, smaller, and cheaper models. This can come from algorithmic improvements and more focus on pretraining data quality, such as the new open-source DBRX model from Databricks. In the closed source world, many people have been impressed with the performance of Claude Haiku and Gemini Pro 1.5, comparable to much larger and more expensive models such as GPT-4.
Another question for the future of LLMs and open-source AI is the relative importance of fine-tuning vs. in-context learning now that we have cheaper closed models with huge context windows. For example, using Claude Haiku with 20+ examples in context can be much more affordable than using a fine-tuned GPT3.5 (and in some cases even cheaper than some open-source fine-tuned models, depending on deployment costs). So far, a lot of focus has been on fine-tuning with open-source LLM projects. On the other hand, new developments in techniques such as model merging (see story below from Sakana) can provide a new avenue for affordable development and improvement of open-source models.
Why should you care?
We think it is very important that a lot more people can contribute to AI development and decision making than just those who work at a handful of closed AI labs. Hence, we are focused on making AI more accessible and releasing AI learning materials and courses! While it may look increasingly difficult to compete with budgets at the cutting edge of LLMs, open-source AI remains a huge amount that can continue to develop, from data set curation to RAG stacks, fine-tuning, agent pipelines, and model merging. We also hope to see breakthrough new model architectures and techniques potentially invented by open-source projects or academia. These could supplant or combine with the current leading LLMs to allow for training cluster infrastructure costs far below $100bn!
- Louie Peters — Towards AI Co-founder and CEO
Hottest News
1. Databricks Launches DBRX, A New Standard for Efficient Open Source Models
Databricks introduced DBRX, an open, general-purpose LLM. It is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters, of which 36B are active on any input.
2. AI21’s Releases Jamba, A Groundbreaking SSM-Transformer Model
AI21 Labs unveils Jamba, combining Mamba SSM with traditional Transformer architecture. This hybrid model significantly enhances throughput and efficiency, offering a groundbreaking 256K context window. Jamba triples throughput on long contexts compared to Mixtral 8x7B and supports up to 140K context on a single GPU.
3. X’s Grok Chatbot Will Soon Get an Upgraded Model, Grok-1.5
X.ai has announced an upgraded version of its AI model, Grok-1.5. This new model is expected to power the Grok chatbot on X. As per the official blog, Grok-1.5 shows improved performance over the previous Grok-1 model on math and coding benchmarks. Grok-1.5 will soon be available for early testers on X and may introduce new features.
4. Microsoft, OpenAI Plan $100 Billion Data-Center Project, Media Report Says
OpenAI and Microsoft are working on a massive $100 billion data center project called “Stargate,” set to launch in 2028. The Information reported that Microsoft would likely finance the project. The success of Stargate hinges on OpenAI’s next major upgrade, expected early next year.
5. Sakana AI Evolutionary Models
Sakana released its new Evolutionary Model Merge technique, enabling developers and organizations to create and discover new models through cost-effective methods. Sakana has released an LLM and a vision-language model (VLM) created through Evolutionary Model Merge. Instead of relying on human intuition, Evolutionary Model Merge automatically combines the layers and weights of existing models to create and evaluate new architectures.
6. OpenAI’s Voice Cloning AI Model Only Needs a 15-Second Sample To Work
OpenAI has developed a voice cloning technology called Voice Engine, which can create a synthetic voice based on a 15-second clip of someone’s voice. Currently, OpenAI is offering limited access to the platform due to concerns over the potential for misinformation. “These small-scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI said in its blog post.
Five 5-minute reads/videos to keep you learning
1. Building A Multilingual NER App with HuggingFace
This is a guide on creating an end-to-end NLP project using a RoBERTa-base model with the transformers library. It also has instructions on how to build an app using the Gradio library and monitor it in the Comet library. You can also find the notebook used in the blog.
2. Explainable AI: Thinking Like a Machine
XAI, or explainable AI, has a tangible role in promoting trust and transparency and enhancing user experience in data science and AI. This article builds on the work of the XAI community. It brings together conflicting literature on what XAI is. It also explores some techniques for building glass-box or explainable models and how XAI can be implemented in enterprises and projects.
3. OpenAI’s Official Prompting Guide
This is a prompt engineering guide with strategies to get the most out of your prompts. It also has example prompts showing the capabilities of the models.
4. Anatomy of OpenAI’s Developer Community
This is a dataset of all posts and discussions collected from OpenAI’s Developer Community. The forum is a great resource for understanding the general sentiment of developers, identifying common problems users face, and gathering feedback on OpenAI products. This dataset primarily focuses on API, GPT Builders, Prompting, Community, and Documentation.
5. Finetune Mixtral 8x7B with AutoTrain
This blog shows how to fine-tune Mixtral 8x7B on your own dataset using AutoTrain. It requires very little coding and provides steps for fine-tuning the model locally or with custom hardware.
Repositories & Tools
1. AIOS is an LLM Agent Operating System. It optimizes resource allocation and context switch across agents and enables concurrent execution of agents.
2. NanoDL is a low-resource custom LLM development tool that accelerates the development of custom transformer models and LLMs.
3. Llm-answer-engine contains the code and instructions needed to build an answer engine that leverages Groq, Mistral AI’s Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI.
4. Retell is a conversational voice API for LLMs.
5. Typer is a library for building CLI applications. Based on Python type hints.
Top Papers of The Week!
1. Tutorial on Diffusion Models for Imaging and Vision
This tutorial explains how diffusion models work, focusing on their application in generating images and videos from text. It breaks down the sampling mechanism that improves these models over earlier approaches. It discusses the essential ideas underlying the diffusion models.
2. AIOS: LLM Agent Operating System
This paper presents AIOS, an LLM agent operating system that embeds large language models into operating systems. It optimizes resource allocation, facilitates context switch across agents, enables concurrent execution, provides tool service for agents, and maintains access control.
3. Long-form Factuality in Large Language Models
This research uses GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics to benchmark a model’s long-form factuality in open domains. It proposes a Search-Augmented Factuality Evaluator (SAFE) that uses LLM agents as automated evaluators for long-form factuality.
4. OpenVoice: Versatile Instant Voice Cloning
This paper introduces OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. This addresses two open challenges in the field: Flexible Voice Style Control and Zero-Shot Cross-Lingual Voice Cloning.
5. Gecko: Versatile Text Embeddings Distilled from Large Language Models
This paper presents Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging distilled knowledge from LLMs into a retriever. On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size.
Quick Links
1. Amazon bets $150 billion on data centers in the coming 15 years. The company plans to expand existing server farm hubs in northern Virginia and Oregon and expand into new precincts, including Mississippi, Saudi Arabia, and Malaysia.
2. Hume AI introduced the Empathic Voice Interface, enabling developers to integrate an emotionally intelligent AI voice into applications across health and wellness, AR/VR, and customer service call centers. It has raised $50M in funding for the debut and continued development of this new flagship product.
3. SambaNova announces the new AI Samba-CoE v0.2, which already beats Databricks DBRX. It is optimized for general-purpose chat and runs at 330 tokens per second for a batch size of one.
Who’s Hiring in AI!
Machine Learning Engineer, Cohere For AI @Cohere (Remote)
Research Engineer Intern @Jumio (Lenexa, KS, USA)
Tech Lead, AI/ML @Forge Global (New York, NY, USA)
Data Operations Analyst @Firmex (Remote)
Engineering Manager, Machine Learning (App Ads) @Reddit (Remote)
Senior Data Scientist, Product Analytics @Onfido (Remote)
Senior Machine Learning Research Engineer @Lirio (Remote)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #93 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.