This AI newsletter is all you need #95
What happened this week in AI by Louie
This week was busy with significant model upgrades to GPT-4 Turbo, Grok 1.5, and Google Gemma model series, as well as a new open-source model from Mistral and Gemini Pro 1.5 going into wider release.
Anticipation has been building for a new GPT-5 or GPT-4.5 mode release from OpenAI for some time, particularly since Anthropic’s Claude 3 Opus model release debatably moved it into first place for many tasks. In the end, we didn’t get GPT-4.5; instead, we got a relatively understated GPT-4 Turbo update. The model has moved out of preview with Vision capability added. While the update was light on details, the company noted significant upgrades in coding and mathematical tasks, and many benchmark tests are showing it reclaimed the leaderboard. Perhaps this was indeed the hyped 4.5 update, but OpenAI chose to somewhat downplay its significance on fears it would not jump far enough ahead of Claude Opus to justify the anticipation. Gemini Pro 1.5’s wider release was also significant in the closed LLM world, and we expect it will open up experimentation with many more use cases for its 1 million context window, particularly whole code repository tasks. Elon Musk’s Xai also released its first multimodal model with Grok 1.5 vision update, with impressive performance on many vision benchmarks.
The open-source LLM world was also active this week with Mistral’s new 8x22 billion parameter Mixture of Experts Mixtral model with 64k context. This new model has already been used to make many performant open-source fine-tuned models. While broadly less capable than Mistral’s recent closed-source Mistral large, it is reassuring that Mistral is also continuing with open-source model releases. Similarly, Google also released new models to its open-source Gemma family this week with CodeGemma and RecurrentGemma and an upgrade to the core Gemma model. These models are also less performant than the company’s closed-source models but nevertheless have strong capabilities and provide a lot of potential for experimentation and building cheaper production models for specific use cases.
Why should you care?
So far, in 2024, we have seen many great incremental model improvements, which we expect will enable the use of LLMs in commercial products, particularly as part of RAG or Agent pipelines. We have not seen capability leaps to the same extent as with GPT-4 last year, causing some to speculate on whether LLMs are hitting a plateau. However, we think we are yet to see the results of the true next-generation models and massive AI chip investments over the past six months. We are glad to see progress in both open and closed-source LLMs but are increasingly noting a trend of open-source model releases coming out only after closed-source versions and with lower capability. A key question for open source model competitiveness as budgets for training closed foundation models escalate will be how well small models trained from scratch compare to smaller models that have been reduced in size from much larger foundation models. At Baidu’s AI conference this week, Robin Li noted that smaller models derived from dimensionality reduction in ERNIE 4.0 perform significantly better at the same size as open-source models. So far, we still see a great value proposition for fine-tuned open-source models for many applications, and we hope this will be sustained!
- Louie Peters — Towards AI Co-founder and CEO
Upcoming Community Event
This week, in our Learn AI Together Discord community, we have a paper walkthrough for the research paper on Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Join us for a deeper insight into the prominent research in AI.
Join us next Saturday, April 27, for the RAG paper walkthrough on the Learn AI Together Discord server!
Hottest News
1. Meta Confirms That Its Llama 3 Open Source LLM Is Coming in the Next Month
Meta confirmed the initial release of Llama 3 — the next generation of its large language model used to power generative AI assistants — within the next month. It is intended to outperform its predecessors and compete with OpenAI’s GPT-4 and will debut with two preliminary versions before launching a comprehensive multimodal iteration in the summer.
2. GPT-4 Turbo With Vision Is Now Generally Available in the API
OpenAI has made the GPT-4 Turbo with Vision generally available in the API and ChatGPT. The new GPT-4 Turbo, now with vision capabilities, supports vision requests via JSON mode and function calls, with knowledge updated until December 2023.
3. Apple Plans To Overhaul Entire Mac Line With AI-Focused M4 Chips
Apple is nearing the production of M4 computer processors for its Mac models. It will have AI processing capabilities. The company aims to release the updated computers late this year and early next year.
4. Gemini 1.5 Pro Now Available in 180+ Countries
Gemini 1.5 Pro has launched globally, offering cutting-edge native audio understanding and upgraded features such as a File API, system instructions, JSON mode for developers, and advanced audio/video modalities, including video quiz capabilities. The update also introduces a highly performant text embedding model.
5. Grok 1.5 Gets a Vision Upgrade
x.AI has introduced Grok-1.5V, an advanced multimodal AI model with enhanced capabilities for analyzing visual data, including text, charts, and images. Grok-1.5V will be available soon to our early testers and existing Grok users.
Five 5-minute reads/videos to keep you learning
1. Vision Language Models Explained
Vision language models (VLMs) are multimodal AI systems that interpret images and text. This blog post provides a comprehensive guide to vision language models, including how they function, how to choose the right model, how to use them for inference, and how to fine-tune them easily.
2. AI-Powered Search: Embedding-Based Retrieval and RAG
Replacing a traditional search architecture with AI-powered search requires replacing bag-of-words representations with embeddings and implementing retrieval-augmented generation (RAG). This blog post explains the main ideas of embedding-based retrieval and RAG, including the basics, primary types and techniques, and pitfalls.
3. Building Reliable Systems Out of Unreliable Agents
The article presents methods for developing dependable AI systems by employing unreliable agents. It details steps involving prompt engineering, performance optimization, eval systems, data-driven fine-tuning, and Retrieval Augmented Generation (RAG), with a notable strategy of utilizing complementary agents to boost system dependability.
4. Speech to Text: Leaderboard & Comparison
In this leaderboard and comparative analysis, Artificial Analysis has analyzed speech-to-text models and hosting providers across different characteristics, including their word error rate (lower is better), speed, and price. It lists providers like OpenAI, Azure, Amazon Transcribe, and Google.
5. Measuring the Persuasiveness of Language Models
New research demonstrates that the persuasiveness of Anthropic AI models increases with each generation, with the latest model, Claude 3 Opus, matching the convincingness of human-generated arguments. This blog post shares the methods for studying the persuasiveness of AI models in a simple setting.
Repositories & Tools
1. Llm.c project focuses on developing a minimalist GPT-2 training framework using C/CUDA to eliminate heavy dependencies like PyTorch or CPython.
2. Storm is an LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
3. Whisper Web supports ML-powered speech recognition directly in your browser.
4. Mojo combines Python syntax and ecosystem with systems programming and metaprogramming features.
5. Bitskout helps create plugins that read and extract data from documents and emails and works with tools like Asana, Zapier, or Power Automate.
Top Papers of The Week!
1. Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
The work presents a method for scaling LLMs to handle infinitely long inputs while maintaining bounded memory and computational requirements. It introduces Infini-attention, an attention mechanism integrating compressive memory with local masked attention and long-term linear attention within a Transformer block.
2. Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Apple researchers have developed Ferret-UI, an advanced multimodal large language model (MLLM) specifically designed for improved interpretation and interaction with mobile user interface (UI) screens.
3. RULER: What’s the Real Context Size of Your Long-Context Language Models?
Recognizing the limitations of the needle-in-a-haystack’s deep understanding assessment, researchers have developed the RULER benchmark. This new benchmark offers more intricate evaluations by allowing customization of sequence lengths and task complexities, introducing different needle types and quantities, and adding more challenging task categories such as multi-hop tracing and aggregation.
4. Rho-1: Not All Tokens Are What You Need
The authors analyze token importance in language model training, uncovering varying loss patterns among tokens. This research leads to the development of RHO-1, a new language model that employs Selective Language Modeling (SLM) to focus on training with tokens that are more beneficial for the model rather than treating all tokens with equal importance.
5. AutoCodeRover: Autonomous Program Improvement
This paper proposes an automated approach for solving software issues to achieve program improvement autonomously, such as fixing bugs and adding features. In AutoCodeRover, LLMs are combined with code search capabilities, leading to a program modification or patch.
Quick Links
1. Cohere Introduced Rerank 3, a new foundation model for enterprise search and retrieval. Rerank 3 can boost search performance or reduce the cost of running RAG applications with negligible latency impact.
2. Google’s new Arm-based CPU, dubbed Axion, will support Google’s AI workloads before it rolls out to business customers of Google Cloud “later this year.” The Axion chips are already powering YouTube ads, the Google Earth Engine, and other Google services.
3. AI startup Symbolica raises $31 million to develop AI systems to compete with OpenAI. Symbolica has created a framework that will enable it to develop alternatives to the Transformer architecture.
Who’s Hiring in AI!
Platform Engineer @Captur (London/Remote)
Azure Cloud AI Engineer (Chatbot) @Per Scholas (Remote)
Junior Full Stack Developer @Infinitive Inc (McLean, VA, USA)
Generative AI Engineer, Senior @Booz Allen (Remote)
Entry Level Data Scientist @Upen Group Inc (Irving, TX, USA)
Junior Data Scientist @Intellipro Group Inc (Remote)
Freelance Data Visualization Specialist @The Motley Fool (Remote)
LLM Application Developer @Curt Landry Ministries (Remote)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #95 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.