This AI newsletter is all you need #85
What happened this week in AI by Louie
This week, attention was on the emerging competition for OpenAI’s GPT-4 and GPTStore in the form of Google’s Gemini Ultra and Hugging Face’s Hugging Chat Assistants, respectively. Meta also made headlines with its latest and largest update to its code generation AI model, Code Llama 70B.
Google released Gemini last year in December, and while Gemini Pro and Nano were made available immediately, Gemini Ultra is set to release on Wednesday. Gemini Ultra will include multimodal capabilities, enhanced coding features, and the ability to analyze documents, files, and data. Google’s technical paper shows that Ultra outperforms GPT-4 in 7 of 8 text benchmark tests. It is the first model to surpass human experts in MMLU (massive multitask language understanding), although it employs COT@32 instead of 5-shot learning. Additionally, Ultra surpasses GPT-4V (vision) in all ten benchmarks for image interpretation. We are excited to see Gemini Ultra tested in the wild and see if it lives up to its benchmark performance! In other Gemini news, Google brought more Gemini Pro features to Bard, incorporating image generation capability powered by the latest Imagen-2 model.
While Google sticks to OpenAI’s closed-source API model release playbook, there were advances in open-source LLMs this week. Hugging Face released Hugging Chat Assistants, its open-source competitor to the GPTStore. With 4,000 Assistants to view/customize and prompts to improve your own Assistant, it is already larger than the GPT store. Hugging Chat Assistants can be powered by various open-source LLMs, including Mistral’s Mixtral or Meta’s Llama 2. We also saw open-source LLM updates from Mistral (“miqu-1–70b” leaked) and Meta’s CodeLlama 70B. CodeLlama 70B scored 53 percent in accuracy on the HumanEval benchmark, performing better than GPT-3.5’s 48.1 percent and closer to the 67 percent mark OpenAI reported for GPT-4.
Why should you care?
We think competition is important, and individuals and companies should have alternatives to OpenAI. GPT-4 has mainly remained unchallenged as the leading LLM model for over a year now. Ultra is the model Google finally claims outperforms OpenAI’s GPT-4, but there was much disappointment last year when the model was announced but not released for external use and testing. Gemini Ultra could provide some much-needed competition to GPT-4, but its performance for practical applications remains uncertain. We also think it is important to see competition to OpenAI’s GPTStore for releasing LLM-based apps and prompts. Hugging Chat Assistants shows how fast the open-source community can catch up to closed-source rivals and joins Poe AI as a GPTStore alternative.
- Louie Peters — Towards AI Co-founder and CEO
Meta has launched Code Llama 70B, a coding AI model comparable to GPT4, in three variations: the base model, a Python-specific version, and an ‘Instruct’ version for interpreting natural language commands. All editions are available for free for research and commercial applications.
Hugging Face announced the third-party, customizable Hugging Chat Assistants. This allows users to create their own AI chatbots for free, offering a similar service to OpenAI’s custom GPT Builder but without the associated costs. The Hugging Chat Assistants can be powered by various open-source LLMs, such as Mistral’s Mixtral or Meta’s Llama 2.
Mistral has recently confirmed that the “miqu-1–70b” Large Language Model, released on HuggingFace and exhibiting performance close to that of GPT-4, is a leaked quantized version of their technology.
OpenAI CEO Sam Altman adopts a cautious tone when discussing AI, recently describing the anticipated GPT-5 as merely “okay” at Davos. This balanced approach suggests a strategic shift towards tempered communication.
Google has updated Bard with image generation abilities using Google’s Imagen 2 model and an improved version of their Gemini Pro language model that supports over 40 languages. These updates allow Bard to generate images from text descriptions and more closely match ChatGPT’s performance.
Five 5-minute reads/videos to keep you learning
Open-source LLMs like Mixtral have reached performance levels, allowing them to serve as central reasoning components in intelligent agents, surpassing GPT-3.5 benchmarks. This article explains the inner workings of ReAct agents and shows how to build them using the ChatHuggingFace class integrated into LangChain.
The Enterprise Scenarios Leaderboard, developed by the Patronus team in partnership with Hugging Face, is a new benchmarking tool designed to assess language model performance across six business-oriented tasks. These tasks include finance, legal issues, creative writing, customer support, toxicity detection, and handling of personally identifiable information (PII), specifically emphasizing enterprise requirements.
To get to LLMs, there are several layers to peel back, starting with the basics of AI and machine learning. This is a foundational article on the evolution of machine learning, covering everything from neural networks to transformers. It primarily focuses on the applications of transformer-based models, with a quick insight into the future.
Almost anyone can poison a machine learning dataset to alter its behavior and output. This article discusses what data poisoning is and why it matters. It also covers key concepts such as data poisoning techniques, detection efforts, and prevention strategies.
The intersection of AI and blockchain has the potential to revolutionize various systems, with AI poised to enhance blockchain’s efficiency and reliability. This post will classify different ways that crypto + AI could intersect, as well as the prospects and challenges of each category.
Repositories & Tools
1. RAGs is a Streamlit app that uses natural language to create a RAG pipeline from a data source.
2. Nomic Embed is an open embedding model with performance similar to OpenAI’s text-embedding-3-small.
3. LLMs-from-scratch is a repository of resources with hands-on experience and foundational knowledge necessary for building LLMs.
4. RawDog is a CLI assistant that responds by generating and auto-executing a Python script.
5. Zerve is a unified developer space for data science and AI teams to explore, collaborate, and build.
Top Papers of The Week!
OLMo is the first entirely open-source LLM whose release includes the model weights and inference code and the training data, training code, and evaluation code. This empowers researchers and developers to use the best and open models to advance the science of language models collectively.
Researchers have developed a method to enhance LLM training by using a smaller instruction-tuned LLM to paraphrase web scrapes, creating a cleaner, structured dataset. This approach has been shown to accelerate pre-training, reduce computational costs, and improve performance, achieving a 3x speed increase, 10% perplexity reduction, and better zero-shot learning capabilities on various tasks.
The LLaVA team has introduced MoE-LLaVA, an open-source, sparse Large Vision-Language Model (LVLM) that leverages a mixture of experts (MoE) to maintain constant computational costs despite a substantial parameter increase. By selectively activating top-k experts for each task, MoE-LLaVA achieves efficient and cost-effective performance.
CRAG introduces a retrieval evaluator to assess and enhance document quality, triggering tailored retrieval actions. It employs web search and optimized knowledge utilization for automatic self-correction. CRAG significantly improves RAG’s performance across diverse datasets, showing a 36.6% accuracy gain.
MMBench is a novel multi-modality benchmark. It develops a comprehensive evaluation pipeline comprising a meticulously curated dataset and a novel CircularEval strategy and incorporates ChatGPT.
1. Amazon announces Rufus, a new generative AI-powered conversational shopping assistant. It aims to simplify product discovery, comparison, and recommendations by leveraging Amazon’s extensive product catalog and a wealth of web-based information.
2. Volkswagen has set up its global AI lab to function as a competence center and incubator, concentrating on generating proofs of concept for automotive innovations and incorporating AI advancements into Volkswagen’s vehicles.
3. The Browser Company is integrating an AI agent into the Arc browser to surf the web and return results without using search engines. The company said only some of these features use LLMs, but they all work to “bring the internet to you.”
4. AI2 introduced OLMo, a 7 billion parameter model outperforming Llama 2 in generative tasks, with comprehensive training data, code, and over 500 checkpoints per model, all under the Apache 2.0 License.
Who’s Hiring in AI!
Interested in sharing a job opportunity here? Contact firstname.lastname@example.org.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!