This AI newsletter is all you need #64
What happened this week in AI by Louie
This week in AI, we were following more model releases in the open-source LLM space, including the recently unveiled Falcon 180B, together with more teasing of upcoming models at the tech giants.
The Falcon 180B has already topped the Hugging Face leaderboard and consists of a huge 180 billion parameters, making it the largest openly available language model to date. Its training involved processing a massive 3.5 trillion tokens concurrently on up to 4096 GPUs, utilizing Amazon SageMaker, consuming approximately 7,000,000 GPU hours in the process. Developed as part of the Falcon family by the Technology Innovation Institute in Abu Dhabi, this model's dataset is primarily composed of web data from RefinedWeb (accounting for 85% of the data), supplemented with a carefully curated blend of conversations, technical papers, and a small fraction of code (around 3%). In terms of performance, Falcon 180B reported beating both Llama 2 70B and OpenAI's GPT-3.5 in terms of Mean Multi-Language Understanding (MMLU) and measuring on par with Google's PaLM 2-Large. Falcon 180B is available in the Hugging Face ecosystem, starting with Transformers version 4.33. However, it's essential to note that commercial use of Falcon 180B is currently subject to stringent conditions, with "hosting use" explicitly excluded. While open source is clearly still some distance from challenging GPT-4 in terms of performance and compute intensity, we expect to see increasing availability of open source models to compete with GPT-3.5 and are excited to see what can be built with the increased flexibility this brings.
Together with developments in open source this week - we also noted several tech giants trying to signal they will compete at the cutting edge of LLMs. According to reports, Meta is expanding its infrastructure to facilitate the training of its new AI model, which it aims to begin training early next year with aims to compete with GPT-4. We also note reports that Apple is also investing heavily in AI to enhance its Ajax model, which it aims to rival ChatGPT. Clearly, all large tech companies are going to wish to compete in the AI race at this point. In our view, competitiveness will come down to who has best-prepared access to scalable compute infrastructure, together with leading machine learning talent over recent years.
- Louie Peters — Towards AI Co-founder and CEO
Hottest News
TIME magazine has released its list of the 100 Most Influential People in AI for 2023. The list features prominent figures such as Dario and Daniela Amodei, Sam Altman, Demis Hassabis, Robin Li, Clément Delangue, Lila Ibrahim, Elon Musk, Geoffrey Hinton, Fei-Fei Li, Timnit Gebru, Yann LeCun, and Yoshua Bengio.
TII has recently launched Falcon 180B, a formidable language model with 180 billion parameters, trained on 3.5 trillion tokens. Outperforming Llama 2 70B and GPT-3.5 in terms of MMLU, Falcon 180B demonstrates great performance and ranks high on the Hugging Face Leaderboard. This model is available for commercial use but has strict terms that exclude "hosting use."
Adept.ai has introduced Persimmon-8B, an open-source LLM with impressive performance and a compact size. Trained on less data, it achieves comparable results to LLaMA2 and offers a fast C++ implementation combined with flexible Python inference.
Hugging Face has introduced the Training Cluster as a service, enabling users to train their models with customizable parameters, token counts, and accelerators. Additionally, it provides cost estimates for training LLMs of different sizes and token counts, ranging from $65k to $14.66M, depending on the model parameters and token count.
NVIDIA has been collaborating closely with leading companies to enhance and optimize LLM inference. These innovations have been incorporated into the open-source NVIDIA TensorRT-LLM, which has been a preferred choice for speed. A version tailored specifically for language models on H100s is now accessible.
Five 5-minute reads/videos to keep you learning
Hugging Face has launched a speech-to-text leaderboard that ranks and evaluates speech recognition models available on its platform. The current top performers are NVIDIA FastConformer and OpenAI Whisper, with an emphasis on English speech recognition. Multilingual evaluation will be included in future updates.
This blog post shows how to utilize AudioLDM 2 in the Hugging Face Diffusers library, covering various code optimizations such as half-precision, flash attention, compilation, and model optimizations. It is also accompanied by a more streamlined Colab notebook that includes all the necessary code.
HuggingFace has introduced GPTQ quantization, which allows the compression of large language models to 2, 3, or 4 bits. This method surpasses previous techniques, preserving accuracy while substantially reducing model size.
This guide explains the steps to build a self-moderated comment response system using OpenAI and LangChain. It involves two models, where the first generates a response and the second modifies and publishes it.
This experiment presents the results of testing 60 models for basic reasoning, instruction following, and creativity. This compilation includes the questions and responses from each model stored in a SQLite database.
Papers & Repositories
Open Interpreter is an open-source implementation of OpenAI's Code Interpreter that provides a natural language interface similar to ChatGPT. It enables running various code types locally, offering interactive terminal chats for controlling computer functions without internet access limitations.
LLMs can be used as optimizers in applications where gradients are not available. Optimization by PROmpting (OPRO) involves the LLM generating new solutions from a prompt, which are then evaluated and used to refine the prompt in a constant optimization cycle. OPRO has shown promising results, outperforming human-designed prompts in prompt optimization tasks.
SLiMe is a novel approach that combines vision-language models and Stable Diffusion (SD) and allows image segmentation at custom granularity using just one annotated sample. It outperforms existing one-shot and few-shot image segmentation methods, as demonstrated in comprehensive experiments.
Researchers have found that the Feed Forward Network (FFN) in Transformers can be optimized, resulting in a 40% reduction in model size while maintaining similar performance. By sharing an FFN across the encoder and removing it from the decoder layers, parameters can be decreased with minimal decrease in accuracy.
The paper introduces Hydra-PPO, a method designed to expedite Reinforcement Learning from Human Feedback (RLHF) by minimizing memory usage. Hydra-PPO reduces the number of models in memory during the PPO stage, allowing for increased training batch size and decreased per-sample latency by up to 65%.
Enjoy these papers and news summaries? Get a daily recap in your inbox!
The Learn AI Together Community section!
Meme of the week!
Meme shared by rucha8062
Featured Community post from the Discord
Duckydub is working on a project called Nexel AI, which simplifies AI-powered automation. It aims to make automation accessible to everyone, regardless of their technical background. Furthermore, it streamlines workflows while ensuring data security. Check it out here and support a fellow community member. Share your AI projects and feedback in the thread here!
AI poll of the week!
Join the discussion on Discord.
TAI Curated section
Article of the week
Build Your First Autocorrection without Machine Learning by Thao Vu
Spell correction is no doubt essential for any written communication. When considering building one, we may quickly come to the one-sizes-fit-all solution: deep learning. However, deep learning is only sometimes the optimal choice. In this article, I would like to introduce “noisy channel”, a classic technique for spell correction, and how you can build your correction module with zero deep learning background.
Our must-read articles
Reinforcement Learning: SARSA and Q-Learning — Part 3 by Tan Pengshi Alvin
Exploring Large Language Models -Part 1 by Alex Punnen
Explaining Attention in Transformers [From The Encoder Point of View] by Nieves Crasto
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Job offers
Senior Software Engineer, Identity @GitHub (Remote)
Staff Engineer - Personalization @Fanatics (Remote)
Data Analyst & Software Integrator Assistant (Active Pooling) @Support Shepherd (Remote)
Sr. Machine Learning Researcher @Casetext (Remote)
Machine Learning Success Manager @Snorkel AI (Remote)
Lead Machine Learning Engineer @Fullscript (Canada/Remote)
Machine Learning Engineer @Hive (San Francisco, CA, USA)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #64 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.