Discover more from Towards AI Newsletter
This AI newsletter is all you need #70
What happened this week in AI by Louie
This week in AI, we were particularly interested in seeing two new agent models released. Nvidia has unveiled Eureka, an AI agent designed to guide robots in executing complex tasks autonomously. This agent, powered by GPT-4, can independently generate reward functions that surpass the performance of human experts in 83% of tasks, achieving an average enhancement of 52%. The fascinating demo shared by the company illustrates the agent’s ability to train a robotic hand to perform the rapid pen-spinning trick and a human. As mentioned by one of the authors in a blog post, this library utilizes generative AI and reinforcement learning to solve complex tasks.
In other agent news, Adept researchers have introduced a multi-modal architecture for AI agents named Fuyu, with 8 billion parameters. This model adopts a decoder-only architecture capable of processing images and text, simplifying the network design, scalability, and deployment. Additionally, unlike most existing models, it accepts images of varying dimensions, rendering it a valuable addition for use in agents. The model can generate responses for sizable images in just 100 milliseconds. We are excited about the recent progress on AI agents for physical and online applications. While still early in commercialization, agents capable of independently interacting with their environment and executing complex tasks create many opportunities for new AI products and applications.
- Louie Peters — Towards AI Co-founder and CEO
OpenAI’s plans for developing the AI model Arrakis to reduce compute expenses for AI applications like ChatGPT have been halted. Despite this setback, OpenAI’s growth momentum continues, with a projected annual revenue of $1.3 billion. However, they may face challenges with Google’s upcoming AI model Gemini and scrutiny at an AI safety summit.
IBM has developed a brain-inspired computer chip (NorthPole) that significantly enhances AI’s speed and efficiency by reducing the need to access external memory. NorthPole is made of 256 computing units, or cores, each of which contains its own memory.
NVIDIA researchers created an AI agent called Eureka, which can automatically generate algorithms to train robots — enabling them to learn complex skills faster. Eureka-generated reward programs outperform expert human-written ones on more than 80% of tasks.
Adept introduced Fuyu-8B, a powerful open-source vision-language model designed to comprehend and respond to questions regarding images, charts, diagrams, and documents. Fuyu-8B improves over QWEN-VL and PALM-e-12B on 2 out of 3 metrics despite having 2B and 4B fewer parameters, respectively.
Stack Overflow is letting go of 28% of its employees due to advancements in AI technology like ChatGPT. Chatbots like ChatGPT provide efficient coding assistance and heavily rely on content from sites like Stack Overflow. However, an important question arises regarding the sustainability of chatbots that gather data without benefiting their sources.
Five 5-minute reads/videos to keep you learning
This article provides essential numbers and equations for working with large language models (LLMs). It covers topics such as compute requirements, computing optima, minimum dataset size, minimum hardware performance, and memory requirements for inference.
LLaVa-1.5, a smaller yet powerful alternative to OpenAI’s GPT-4 Vision, proves the potential of open-source models for Large Multimodal Models (LMMs). It emphasizes the significance of understanding multimodality in AI, debunking doubts about the feasibility of open-source approaches.
Vision Prompt Injection is a vulnerability that allows attackers to inject harmful data into prompts via images in OpenAI’s GPT-4. This risks system security, as attackers can execute unauthorized actions or extract data. Defending against this vulnerability is complex and may affect the model’s usability.
GPT-4 is rapidly improving its response speed, particularly in the 99th percentile, where latencies have decreased. GPT-4 and GPT-3.5 maintain a low latency-to-token ratio, indicating efficient performance.
A team of researchers from Stanford, MIT, and Princeton has developed a transparency index to evaluate the level of transparency in commercial foundation models. The index, known as the Foundation Model Transparency Index (FMTI), assesses 100 different aspects of transparency, and the results indicate that there is significant room for improvement among major foundation model companies.
Papers & Repositories
BitNet is a 1-bit Transformer architecture designed to improve memory efficiency and reduce energy consumption in large language models (LLMs). It outperforms 8-bit and FP16 quantization methods and shows potential for effectively scaling to even larger LLMs while maintaining efficiency and performance advantages.
HyperAttention is a novel solution that addresses the computational challenge of longer contexts in language models. It outperforms existing methods using Locality Sensitive Hashing (LSH), considerably improving speed. It excels on long-context datasets, making inference faster while maintaining a reasonable perplexity.
This paper introduces a new framework called Self-RAG. It is an enhanced model that improves Retrieval Augmented Generation (RAG) by allowing language models to reflect on passages using “reflection tokens.” This improvement leads to better responses in knowledge-intensive tasks like QA, reasoning, and fact verification.
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. It utilizes a ViT model trained with contrastive objectives, which allows it to excel in multimodal benchmarks.
DeepSparse is a robust framework that enhances deep learning on CPUs by incorporating sparse kernels, quantization, pruning, and caching of attention keys/values. It achieves GPU-like performance on commonly used CPUs, enabling efficient and robust deployment of models without dedicated accelerators.
Enjoy these papers and news summaries? Get a daily recap in your inbox!
The Learn AI Together Community section!
Meme of the week!
Meme shared by sikewalk
Featured Community post from the Discord
G.huy created a repository containing code examples and resources for parallel computing using CUDA-C. It provides beginners with a starting point to understand parallel computing concepts and how to utilize CUDA-C to leverage the power of GPUs for accelerating computationally intensive tasks. Check it out on GitHub and support a fellow community member. Share your feedback and questions in the thread here.
AI poll of the week!
TAI Curated section
Article of the week
The RAG (Retrieval Augmented Generation) architecture has been proven efficient in overcoming the LLM input length limit and the knowledge cutoff problem. In today’s LLM technical stack, RAG is among the bedstones for grounding the discussed application on local knowledge, mitigating hallucinations, and making LLM applications auditable. This article discusses some of the practical details of RAG application development.
Our must-read articles
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Interested in sharing a job opportunity here? Contact email@example.com.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
Thanks for reading the Towards AI Newsletter! Subscribe for free to receive new posts and support my work.