This AI newsletter is all you need #77
What happened this week in AI by Louie
This week in AI, the news was dominated by new Large Language Model releases from Google (Gemini) and Mistral (8x7B). The highly divergent approach to the model announcements (press events and demo videos vs. model weight torrent links in a single tweet) reinforced the different ethos and approach to model releases (closed API vs. open source). Both models were highly significant — with Google announcing the first GPT-4 level LLM competitor — while Mistral released a highly capable open-source Mixture of Experts model.
Google’s Gemini model brought some impressive capabilities and benchmark scores together with some controversy. The model is particularly strong on multimodality, with better results than GPT-4 on most of these benchmarks, while relative performance on text and code benchmarks was closer and more mixed. In particular, the multi-modality is more ingrained in Gemini, while GPT-4 often accesses external models such as Dalle-3. “The visual encoding of Gemini models is inspired by our own foundational work on Flamingo … with the important distinction that the models are multimodal from the beginning and can natively output images using discrete image tokens. However, the details of how this was implemented remain unclear.”
Gemini’s two smaller models are already rolled out in many Google products. However, the most interesting and capable Ultra model is still subject to further testing. The Gemini model release met with some backlash after it became clear that a video demo of multimodal video analysis capabilities appeared misleading. While we think this was an embarrassing and unnecessary mistake — it doesn’t detract from some great work from Google and Deepmind engineers on what looks like a cutting-edge model.
Mistral’s quiet tweet torrent release of its 8x7B sparse mixture of expert models (SMoE) contrasted sharply with the controversy at Google. While this is not the first open-source MoE model, it is the most capable and comprehensive. The tests are already showing some impressive capabilities compared to much larger models. MoE models are an important development relative to the previously dominant dense transformer architectures with potential benefits in training, inference efficiency, and capabilities. It is worth noting that GPT-4 is widely believed to be MoE, while Gemini architecture was not disclosed in detail.
Why should you care?
With the release of Gemini, we are very glad to see a competitor and alternative to GPT-4 that can balance the ecosystem. It is also important to have a highly capable multimodal LLM that can be used as a backup for GPT-4 for product resiliency as LLM models get rolled out further in commercial products.
The introduction of the Mistral model is closer to the GPT 3.5 class than GPT-4. The release is significant because of its potential to drive innovation in the open source space as more people can experiment with fine-tuning and building with Mixture of Experts (MoE) models.
- Louie Peters — Towards AI Co-founder and CEO
Google has introduced Gemini, a new model in three sizes: Ultra, Pro, and Nano. Gemini is natively multimodal and outperforms other models in various academic benchmarks. Notably, Gemini Ultra achieves a groundbreaking score on a multitask language understanding test and excels in image benchmarks without relying on OCR systems.
Mixtral 8x7B is a sparse mixture of expert models (SMoE) with open weights. The model is compatible with multiple languages and has a context of 32k tokens. It can be finetuned into an instruction-following model. Mixtral 8x7B outperforms Llama 2 70B on most benchmarks with 6x faster inference.
European Union officials reached a landmark deal on the world’s most ambitious law to regulate artificial intelligence, paving the way for what could become a global standard to classify risk, enforce transparency, and financially penalize tech companies for noncompliance. Europe’s AI Act ensures that AI advances are accompanied by monitoring and its highest-risk uses are banned.
Google is facing backlash after admitting its flashy Gemini demo video utilized heavy editing and prompts to make the model seem more impressive. As reported by Parmy Olson for Bloomberg, the researchers fed still images to the model and edited together successful responses, partially misrepresenting the model’s capabilities.
StableLM Zephyr 3B, a new 3 billion parameter chat model, is being released as an extension of the StableLM 3B-4e1t model, drawing inspiration from the Zephyr 7B model. It is designed for efficient text generation, particularly in instruction following and Q&A contexts, and has been fine-tuned on multiple datasets using the Direct Preference Optimization algorithm.
What are your thoughts on the Gemini demo? Share it in the comments!
Five 5-minute reads/videos to keep you learning
In 2023, there have been significant advancements in Large Language Models (LLMs) in AI research. This article provides a glimpse into the transformative research in AI, where language models have been refined, scaled down, and even integrated with external tools to tackle a diverse range of tasks.
The latest version of Claude (Claude 2.1) has a 200K token context window, allowing it to recall information effectively. However, it can be hesitant to answer questions based on single sentences injected or out of place in a document. The experiment in this blog uses a prompt technique to guide Claude in recalling the most relevant sentence.
In this article, the author carried out a “needle in a haystack” pressure test of RAG against GPT-4-Turbo’s context window across three key metrics: accuracy, cost, and latency. They benchmarked two distinct RAG pipelines: Llama-Index and OpenAI’s new assistant API retrievals tool. This has shown that RAG performs better at just 4% of the cost.
Google’s blog on Gemini explains how the researchers did multimodal prompting by showing images to the model along with a prompt for it to give the correct answers. It is a good starter guide to understanding what’s possible with Gemini.
This blog post introduces a new architecture for long context, improved training, and inference performance over the Transformer architecture. StripedHyena is designed using our latest research on scaling laws of efficient architectures.
Repositories & Tools
1. MotionDirector can customize text-to-video diffusion models to generate videos with desired motions.
2. Taskade Custom AI Agents is a suite of five AI tools designed to automate routine activities like research, task management, and content creation.
3. Practical-tutorials/ project-based-learning is a curated list of project-based programming tutorials with different primary programming languages.
4. Mamba Chat is a chat language model based on a state-space model architecture. It has substantially better retrieval ability than similarly sized transformers.
Top Papers of The Week!
A study explores the concept of “embedding inversion” to reconstruct complete text from dense text embeddings. Researchers achieve high success rates in generating controlled text using a multi-step method. The study also reveals the potential for extracting sensitive personal data from text embeddings, emphasizing the need for improved privacy measures in machine learning.
The study introduces Mamba, a hardware-aware parallel algorithm that overcomes the inefficiency of Transformers for long sequences in language processing tasks. By implementing selective state spaces, Mamba achieves fast inference, linear scalability, and competitive performance compared to larger Transformers models.
This paper proposes leveraging multi-view depth, representing complex 3-D shapes in a 2D data format with a diffusion model, MVDD. It can generate high-quality, dense point clouds with 20K+ points with fine-grained details.
DiffuSSM is a new model that aims to accelerate diffusion models for generating high-resolution images without sacrificing detail quality. It replaces attention mechanisms with a scalable state space model backbone, improving performance on ImageNet and LSUN datasets while conserving computing resources.
SparQ Attention is a technique that enhances the efficiency of large language models by reducing memory bandwidth needs. It does not require changes to pre-training or fine-tuning and can significantly decrease attention memory requirements without compromising accuracy.
1. X.AI, Elon Musk’s AI startup to raise up to $1 billion in an equity offering. According to the SEC filing, the company has raised nearly $135 million from four investors, with the first sale occurring on Nov. 29.
2. Meta introduced Purple Llama, a new project aiming to level the playing field for building safe and responsible generative AI experiences. It is launching with permissively licensed tools, evaluations, and models for research and commercial use.
3. IBM and Meta have formed the AI Alliance with over 50 founding members and collaborators. This alliance aims to promote AI projects, establish benchmarks, enhance open models, and ensure secure and beneficial AI development.
Who’s Hiring in AI!
Interested in sharing a job opportunity here? Contact email@example.com.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!