TAI #126; New Gemini, Pixtral, and Qwen 2.5 model updates; Towards AI's From Beginner to LLM Developer course!
What happened this week in AI by Louie
This was a relatively quiet week in AI. As we discussed last week, conversations continued around potentially disappointing results from scaling training compute in the next generation of LLMs and, consequently, the increasing importance of inference-time compute scaling. Several new models were released, including an impressive update from Google Gemini, which took the top spot on the Chatbot Arena. Mistral also launched its new open-source 124B multimodal Pixtral model, now best in class for some use cases. Additionally, Alibaba open-sourced its efficient and powerful Qwen2.5-Coder series, featuring models ranging from 0.5B to 32B parameters.
The biggest news of the week, however, was, of course, the launch of Towards AI’s own new offering: our paid course, From Beginner to LLM Developer. This is the product of a year of work from our ~15-course writers and has resulted in the most in-depth LLM developer course out there, with 85+ lessons and many, many code notebooks! We brought together everything we learned from building over the past two years to teach the full stack of LLM skills. It is a one-stop conversion course for software developers, machine learning engineers, data scientists, aspiring founders, or AI/Computer Science students- junior or senior alike. We think many millions of LLM developers will be needed to build reliable customized products on top of foundation LLMs and achieve mass Generative AI adoption in companies. We want to help you or your friends and colleagues lead this new field. We can also customize this course for corporate customers to your specific teams and industry, so please share it internally! (you can also reach out to louis@towardsai.net for an affiliate deal)
We cover the full stack of learning to build on top of foundation LLMs — from choosing a suitable LLM application to collecting data, iterating on many advanced techniques (RAG, fine-tuning, agents, audio, caching, and more), integrating industry expertise, and deploying. Right now, this means working with Python, OpenAI, Llama 3, Gemini, Perplexity, LlamaIndex, Langchain, Hugging Face, Gradio, and many other amazing tools (we are unaffiliated and will introduce all the best LLM tool options). It also means learning many new non-technical skills and habits unique to the world of LLMs. The only skill required for the course is some knowledge of Python (or basic programming). Our students will create a working product, which we certify, and we also provide instructor support in our dedicated Discord channel. This could become the seed of a startup, a new tool at your company, or a portfolio project for landing an LLM Developer job.
As a new discipline, we think the LLM Developer role is still very poorly understood. It is not often appreciated how the role differs from Software Development and Machine Learning and just how critical non-technical skills are to building great LLM products. We laid out our thesis and definition of this new role and why it is going to be in huge demand going forward in our blog post: why become an LLM Developer? We also explain why we think technical development on top of foundation LLMs is here to stay and will always provide better results for specific tasks in specific industries relative to out-of-the-box foundation models and no code customization.
You can find all the lesson titles, syllabus, and more information via a free preview on the course page linked below.
— Louie Peters — Towards AI Co-founder and CEO
Hottest News
1. Mistral Introduced Mistral Chat and Pixtral Large
Mistral introduced two major updates this week, with Pixtral Large and Mistral Chat, a platform that can now search the web — with citations in line. Pixtral Large is a 124 billion-parameter multimodal model built on top of Mistral Large 2. This model, released with open weights, aims to make advanced AI more accessible.
2. Qwen2.5-Coder Series: Powerful, Diverse, Practical
Open-sourced Qwen2.5-Coder series launches, featuring models from 0.5B to 32B, rivaling GPT-4o in coding performance. Qwen2.5-Coder-32B-Instruct excels in code generation, repair, and reasoning across 40+ languages. It aligns with human preferences through Code Arena benchmarks, providing a robust, scalable tool for developers.
3. ChatGPT Beat Doctors at Diagnosing Medical Conditions
A study showed that doctors could improve diagnoses with the help of ChatGPT. Doctors who did the project without AI got an average score of 74%, doctors who used AI got an average score of 76%, and ChatGPT itself got an average score of 90%. The study asked 50 doctors, and some doctors were given OpenAI’s ChatGPT to help them make their decisions; others went without AI.
4. Nous Research Introduced the Forge Reasoning API Beta and Nous Chat
Nous Research introduces two new projects: the Forge Reasoning API Beta and Nous Chat, a simple chat platform featuring the Hermes language model. The Forge Reasoning API makes the deployment of advanced reasoning processes more feasible in real-time applications. The Hermes language model has been known for its capabilities in understanding context and generating coherent responses, but the Forge Reasoning API takes these capabilities further.
5. Google Gemini Unexpectedly Surges to №1 Over OpenAI
Google has claimed the top spot in the popular lmarena benchmark on many subcategories with its latest experimental model. The model, dubbed “Gemini-Exp-1114,” now available in the Google AI Studio, matched OpenAI’s GPT-4o overall performance.
6. OpenAI’s Tumultuous Early Years Revealed in Emails From Musk, Altman, and Others
The lawsuit between Elon Musk and OpenAI revealed many emails between Elon Musk, Sam Altman, and others during OpenAI’s early days. Musk, who recently sued OpenAI, argued it deviated from its mission by prioritizing profit. Emails show Musk initially supported for-profit plans and suggested merging OpenAI with Tesla, but co-founders resisted, citing conflicts with OpenAI’s vision. The disagreement over governance and funding led to Musk’s departure and ongoing disputes.
7. Amazon Ready To Use Its Own AI Chips, Reduce Its Dependence on Nvidia
Amazon is set to launch Trainium 2 AI chips, aiming to reduce reliance on Nvidia and cut costs for Amazon Web Services (AWS) customers. These chips promise efficiency and savings, attracting users like Anthropic and Databricks. This move highlights a growing trend of tech giants developing custom chips to drive AI growth.
8. OpenAI Launches ChatGPT Desktop Integrations, Rivaling Copilot
OpenAI has launched ChatGPT desktop integrations for Mac OS and Windows, targeting daily workflow integration. Users can now open third-party apps directly, similar to GitHub Copilot’s integrations. Initially available on Mac, Windows users gain Advanced Voice Mode and screenshot capabilities.
9. Releasing the Largest Multilingual Open Pretraining Dataset
Pleias released Common Corpus, the largest open multilingual dataset for training language models, containing over 2 trillion tokens of permissibly licensed data. Available on HuggingFace, the dataset promotes compliance with regulations like the EU AI Act and ensures high quality by filtering out harmful content. Common Corpus includes diverse resources such as books, legal documents, and academic papers, supporting over 30 languages.
Five 5-minute reads/videos to keep you learning
Agentic Retrieval-Augmented Generation (RAG) integrates AI agents into RAG pipelines. It enhances retrieval, reasoning, and tool use. Unlike traditional RAG, it offers multi-step retrieval and validation, leveraging tools like web search and APIs. Agentic RAG’s adaptable architecture ranges from single to multi-agent systems.
2. The Future of Programming: Copilots vs. Agents
With the launch of GitHub Spark, Microsoft intensifies the AI code-generation race, aiming to capture a share of 30 million developers. This article explores the landscape of AI coding systems, focusing on copilots like Codeium, Cursor, and GitHub Copilot.
3. Is AI Progress Hitting a Wall?
A wave of recent articles proclaims the death of deep learning. Leaked reports suggest OpenAI’s new model Orion finished training without showing nearly the improvement GPT-4 achieved over GPT-3. So, is AI progress slowing down? This article argues why not.
4. OpenAI Reveals New “Operator” AI Agent
This article summarizes the latest developments in AI, and this week covers OpenAI’s new “Operator” AI agent, Google’s Gemini AI App for iPhone, YouTube’s AI Music Remixing Feature, Perplexity’s AI-Generated Ads, and more.
5. A Modern Approach To The Fundamental Problem of Causal Inference
This article analyzes the fundamental problem of causal inference from a statistical perspective. From this point of view, the problem arises from treating all tested hypotheses as independent of each other. This way of acting is wrong because when we generate a series of hypotheses, we build a composite hypothesis that will tend to adapt and, therefore, give a random correlation to each of our data series.
Repositories & Tools
1. Browser Use connects your AI agents with the browser.
2. Perplexica is an AI-powered search engine. It is an open-source alternative to Perplexity AI.
Top Papers of The Week
1. LLaVA-o1: Let Vision Language Models Reason Step-by-Step
LLaVA-o1 merges stage-by-stage reasoning like the “o1” OpenAI’s model to vision language models like LLaVA. The LLaVA-o1–100k dataset trains on diverse visual question answering and reasoning annotations. LLaVA-o1 autonomously processes stages: summarization, visual interpretation, reasoning, and conclusion.
This work devises “precision-aware” scaling laws for both training and inference. It proposes that training in lower precision reduces the model’s “effective parameter count,” allowing us to predict the additional loss incurred from training in low precision and post-train quantization. It unifies the scaling laws for post and pretraining quantization to arrive at a single functional form.
3. Garak: A Framework for Security Probing Large Language Models
This paper argues that it is time to rethink what constitutes “LLM security” and pursue a holistic approach to LLM security evaluation. To this end, it introduces Garak (Generative AI Red-teaming and Assessment Kit). This framework can identify vulnerabilities in a target LLM or dialog system. It probes an LLM structurally to discover potential vulnerabilities.
4. Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Researchers evaluated 17 leading LLMs on complex retrieval tasks. They found many models are thread-safe, handling multiple information threads effectively. However, performance declines as context length grows, revealing a shorter effective context limit.
5. AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems
The paper presents AutoGen Studio, a no-code developer tool for prototyping, debugging, and evaluating multi-agent workflows built upon the AUTOGEN framework. It offers a web interface and a Python API for representing LLM-enabled agents using a declarative (JSON-based) specification.
Quick Links
1. Nvidia is open-sourcing the BioNeMo Framework, a toolkit of programming resources, libraries, and AI models designed for drug discovery. This release equips academic labs and biotech companies with advanced tools for protein design, small molecule generation, and even custom model development.
2. Anthropic has launched a Prompt Improver, a tool that applies best practices in prompt engineering to automatically refine existing prompts. This feature is especially valuable for developers working across different AI platforms, as prompt engineering techniques can vary between models.
3. Elon Musk has updated his fraud, breach of contract, and racketeering lawsuit against OpenAI to make antitrust claims against Microsoft, accusing the two companies of attempting to “monopolize the generative AI market.” The amended complaint names Microsoft as a new defendant.
4. The Rabbit R1 can now use AI to redesign the device’s entire interface based on a prompt. Some examples shared by Rabbit CEO Jesse Lyu show a Legend of Zelda-inspired interface, another made in the style of Windows XP, and one using a “dark green scanline background.”
Who’s Hiring in AI
Associate AI Project Specialist @Appen (Remote)
PhD Intern — Machine Learning Engineer @Pacific Northwest National Laboratory (Springfield, IL, USA)
Software Engineer (Python, AWS, SQL) @NBC Universal (Remote)
Principal Software Engineer — AI @Microsoft Corporation (Remote)
Research Expert @SAP (Walldorf, Germany)
GenAI Jr Data Scientist Entry @Nestle (Esplugues de Llobregat, Spain)
Generative AI Software Engineer (Innovation Lab) @Citigroup (London, United Kingdom)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.