This AI newsletter is all you need #59
What happened this week in AI by Louie
This week changes to Zoom’s terms of service (from March) were brought into focus after fears over their use of customer video data went viral. Zoom’s terms appeared to allow the company largely free reign to use customer’s data to train their machine learning models, but after the controversy, late on Monday night, Zoom updated its terms to specify that “Zoom will not use audio, video or chat customer content to train our artificial intelligence models without your consent.” Since the launch of ChatGPT and the increasing focus on commercializing AI, many companies’ data ownership, copyright, and privacy policies have been in flux. Some companies, such as X/Twitter, have realized they have given away valuable data for free or too cheaply and have cut off access to their data, making it more difficult to scrape or launch lawsuits over its use. Other companies have realized they have not been collecting or making the most of the potentially valuable data they have access to. There is always going to be a fine balance between protecting customers’ privacy and making the most of their data, and we expect these issues to remain high priorities for many CEOs and management teams in the coming months.
- Louie Peters — Towards AI Co-founder and CEO
Hottest News
1. AudioCraft: A Simple One-Stop Shop for Audio Modeling
Meta has released the code and weights for their AudioCraft models, including MusicGen and AudioGen. These models generate music and audio, respectively, based on text-based user inputs. The release also includes the EnCodec decoder, which improves music quality.
2. NASA and IBM Openly Release Geospatial AI Foundation Model for NASA Earth Observation Data
NASA and IBM Research have collaborated to release the HLS Geospatial FM, an open-source geospatial AI model for Earth observation data. This model has shown success in various applications, such as flood mapping, burn scar identification, and predicting crop yields.
Jupyter AI integrates generative AI techniques and provides functionalities such as code generation, error fixing, content summarization, file questioning, and notebook creation from language prompts.
4. RT-2: New Model Translates Vision and Language Into Action
Meta's Robotic Transformer 2 (RT-2) is a vision-language-action model that combines web-scale capabilities with robotic control. It effectively recognizes visual and language patterns, generalizes emergent skills, and successfully leverages web-based data to learn new skills.
5. OpenAI Launches GPTBot With Details on How To Restrict Access
OpenAI has launched a web crawler, GPTBot, to improve its artificial intelligence models. GPTBot will scour the web for data while strictly filtering out any paywall-restricted sources, sources that violate OpenAI's policies, or sources that gather personally identifiable information.
Five 5-minute reads/videos to keep you learning
1. The History of Open-Source LLMs: Better Base Models
Open-source LLMs have evolved to become competitive with proprietary LLMs through advancements in pre-training and model development. Early challenges were overcome by focusing on the importance of pre-training and creating better base models. Recent trends include using larger pre-training datasets and optimizing models for fast inference.
2. Top 10 Open Source LLMs to USE in Your Next LLM Application
This article highlights the top 10 open-source LLMs for the AI field. These LLMs offer customizable solutions, reasoning abilities, multilingual support, natural language understanding, text generation, question-answering, chatbot interfaces, versatility, and robustness.
3. Understanding LLaMA-2 Architecture and Its Ginormous Impact on GenAI
Meta's 77-page paper on LLaMA-2 reveals impressive results, surpassing open-source benchmarks, and competing with GPT3.5. The article explains advancements like Grouper query attention, Ghost Attention, In-Context Temperature re-scaling, and Temporal Perception.
4. AI Researcher Geoffrey Hinton Thinks AI Has or Will Have Emotions
AI researcher Geoffrey Hinton argues that human-like intelligence can only be achieved, and possibly surpassed, through deep learning because it enables machines to narrate hypothetical actions associated with emotions. The view has both supporters and critics in expert circles.
5. Fit Your LLM in a Single GPU With Gradient Checkpointing, LoRA, and Quantization
This article presents three techniques—Gradient Checkpointing, LoRA, and Quantization—to help save GPU memory and avoid memory errors while fine-tuning language models. These techniques involve minimizing layers during training, embedding new trainable parameters, and reducing data precision.
Papers & Repositories
1. microsoft/azurechatgpt: Azure ChatGPT, Private and Secure ChatGPT for Internal Enterprise Use
Microsoft has introduced Azure ChatGPT, a private and secure solution for deploying ChatGPT instances on Azure. It offers built-in privacy guarantees, complete control over accessibility, and the ability to integrate internal data sources and plugins. To facilitate adoption, Microsoft has also developed a Solution Accelerator guide.
2. Tool Documentation Enables Zero-Shot Tool Usage With Large Language Models
A recent study has found that, for LLMs, reading tool documentation is more effective than relying solely on demonstrations for learning to use new tools. Researchers demonstrated this through empirical findings on six vision and language tasks, showing that zero-shot prompts with tool documentation perform just as well as few-shot prompts on benchmarks.
3. PanGu-Coder2: Boosting Large Language Models for Code With Ranking Feedback
This paper proposes a novel RRTF (Rank Responses to Align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark.
4. XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
This paper introduces a new test suite called XSTest to identify eXaggerated Safety behaviors in a structured and systematic way. The test results showed that the Llama2 model by Meta displayed excessive safety behavior, refusing prompts that were harmless but resembled unsafe ones or touching sensitive topics.
5. The Hydra Effect: Emergent Self-Repair in Language Model Computations
A recent study in language models discovered the Hydra effect, where removing one attention layer triggers compensation in another. Additionally, researchers found that late MLP layers downregulate the maximum-likelihood token, even in models trained without dropout.
Enjoy these papers and news summaries? Get a daily recap in your inbox!
The Learn AI Together Community section!
Weekly AI Podcast
In this week's episode of the "What's AI" podcast, Louis Bouchard shares his own journey of pursuing a Ph.D. in AI at Polytechnique Montreal and Mila. Throughout this episode, he provides insights into the admission process, the day-to-day life of a Ph.D. candidate, and the skills you develop along the way. He also delves into the concept of federated learning and how AI can revolutionize the diagnosis of multiple sclerosis. Whether you're considering a Ph.D. in AI or simply curious about the intersection of AI and medicine, this episode is for you. Tune in on Spotify, or Apple Podcasts.
Meme of the week!
Meme shared by archiesnake
Featured Community post from the Discord
Weaver159 has launched a new project called MetisFL, a federated learning framework that enables developers to federate their machine learning workflows and train their models across distributed datasets without having to collect the data in a centralized location. The core of the framework is written in C++ and prioritizes scalability, speed, and resiliency. Currently, the project actively encourages developers, researchers, and data scientists to experiment with the framework and contribute to the codebase. Check it out on GitHub and support a fellow community member. Share your thoughts or contributions in the thread here.
AI poll of the week
Join the discussion on Discord.
TAI Curated section
Article of the week
Fit Your LLM in a single GPU with Gradient Checkpointing, LoRA, and Quantization by Jeremy Arancio
Fine-tuning LLM can be long and tedious. Running out of memory during training can be both frustrating and costly. This article will go through three techniques that you may already use or need to know without understanding how they work: Gradient Checkpointing, Low-Rank Adapters, and Quantization. These techniques will help you avoid running out of memory during your training and save you a lot of time.
Our must-read articles
Ensemble Learning: From Decision Tree to Random Forest by Sandeepkumar Racherla
Modern NLP: A Detailed Overview. Part 4: The Latest Developments by Abhijit Roy
Self-Supervised Learning and Transformers? — DINO Paper Explained by Boris Meinardus
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Job offers
Senior Software Engineer, Smart Contracts @Oasis Protocol Foundation (Remote)
Senior Application Engineer @Mozilla (Remote)
Senior/ Principal Support Engineer - Europe @ClickHouse (Remote)
QA Engineer @ShyftLabs (Toronto, Canada)
Senior Software Engineer - AI Search Personalization @Algolia (Remote)
Senior Infrastructure Engineer @Angi (Remote)
Python Data Engineer @Altoida (Boston, UK)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #59 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.