This AI newsletter is all you need #98
What happened this week in AI by Louie
This week, we were again watching developments in AI-powered robots and LLMs. A collaboration between Nvidia and UPenn released DrEureka, a new open-source model that uses an LLM agent to write code for training robots in simulation and then write further code to transfer to real-world deployment. Tesla also released a video update on the progress of its Optimus robot, demonstrating many robots being trained on different tasks via human teleoperation. The neural net is run end to end (meaning camera and sensor data in, joint control sequences out) and two bots have started early testing at real factory workstations.
In the world of LLMs, we were impressed with Deepseek v2, a new 236bn open-source Mixture of Experts model from Deepseek — a Chinese company. The model was trained on 8.1 trillion tokens and has 21bn activated parameters. The model scores 77.8 on the MMLU benchmark and 81.1 on HumanEval. It is offered via API at $0.14/m token input and $0.28/m output. While we are often skeptical about the sustainability of API pricing (given it can be a customer acquisition cost), the model performance vs active parameters is very impressive. The company noted benefits from advances in Multi-head Latent Attention ( better attention with efficient inference) and a novel sparse architecture for reduced training costs. Perhaps the US AI chip export ban to China is having some impact on driving focus on innovations and efficiency!
Separately in the LLM world this week — eyes have been on mysterious new models being tested at chat.lmsys.org; “gpt2-chatbot”, followed several days later by “im-a-good-gpt2-chatbot” and “im-also-a-good-gpt2-chatbot”. Speculation that these were new models being tested by OpenAI was fueled significantly by cryptic tweets from Sam Altman — always adept at keeping OpenAI in focus! The original GPT-2 had 1.5bn parameters — so maybe this is a hint that OpenAI is testing out a new, much smaller model (vs. GPT-4 or GPT-3). The company has been known to test out new ideas in models at a smaller scale — and use scaling laws to predict performance at a higher parameter/ training token count, so perhaps this is the case here.
Why should you care?
AI robotics has accelerated significantly in the past year, and now, many different teams are using very different strategies and architecture pathways to advance capabilities. While it is still very hard to predict the pace of progress from here, we think AI-powered robotics can have a huge impact and become significantly more powerful than traditional human-coded robotics. On this topic, this week, RethinkX released a thoughtful blog post on the potential impact of humanoid robots: This time, we are the horses: the disruption of labor by humanoid robots.
- Louie Peters — Towards AI Co-founder and CEO
This issue is brought to you thanks to Latitude.sh:
Introducing Launchpad, the most powerful container GPU platform to date.
Launchpad leverages every piece of hardware to launch your models, bringing an unrelenting performance that makes other container GPU tools feel just too slow. With Latitude.sh, you can fine-tune and deploy your machine learning models with hourly billing starting at $1.32/hr.
Containers are provisioned on ultra-fast NVMe drives and leverage an unrestricted high-throughput network that gives you access to speeds of up to 100 Gbps.
Deploy today:
- NVIDIA’s L40S GPU (48GB) @ $1.32/hour
- NVIDIA’s H100 Tensor Core GPU (80GB) @ $2.10/hour
Scale your ML inference and fine-tuning workloads with Latitude’s Launchpad!
Hottest News
1. OpenAI CEO Sam Altman Says GPT-4 Is the Dumbest AI Model You’ll Ever Have To Use Again
During a recent appearance at Stanford University, OpenAI’s Sam Altman said that GPT-4 is the most rudimentary AI that users will encounter as the company progresses towards more sophisticated models like GPT-5, which is expected to feature enhanced abilities such as video generation. He foresees AI developing into highly efficient assistants that effortlessly perform tasks and provide solutions.
2. GitHub Launches Copilot Workspace
GitHub has launched Copilot Workspace, a comprehensive developer environment that facilitates the entire coding process, including planning, coding, testing, and deployment, through natural language commands. This offers AI industry professionals an integrated solution for streamlining development workflows.
3. Amazon Q, a Generative AI-Powered Assistant for Businesses and Developers
Amazon is doubling down on enterprise AI with the release of its AI chatbot Q. The chatbot acts as an assistant for Amazon Web Services (AWS) users, learning from a company’s data and workflows so employees can ask questions about their business.
4. A Mysterious “gpt2-Chatbot” AI Model Suddenly Appears on the LMSYS Leaderboard
A mysterious AI model named gpt2-chatbot, displaying GPT-4.5-like capabilities, has emerged on lmsys.org, prompting speculation of it being an unofficial OpenAI test for their next iteration. Key identifiers such as response quality, OpenAI-specific traits, and rate limits suggest a high level of sophistication, potentially hinting at a discreet benchmarking initiative by OpenAI.
5. A ChatGPT Search Engine Is Rumored To Be Coming Next Week
OpenAI is rumored to be launching a ChatGPT-based search engine, potentially at “search.chatgpt.com,” aiming to rival Google by integrating a chatbot feature with traditional search results. This reflects the industry trend of AI potentially revolutionizing standard web search methods.
AI Job Listing: Startup CTO role
Our friends are recruiting a CTO for a venture-backed stealth startup committed to revitalizing SMEs through digital innovation. The role is Remote and open to candidates globally. In this pivotal role, you’ll help architect the technical vision, manage the deployment of AI-powered solutions, and lead a top-tier technology team to transform SMEs. If you’re a tech leader passionate about leveraging AI to drive business success and want to help shape the future of SMEs, please reach out to denis@towardsai.net
Five 5-minute reads/videos to keep you learning
1. Comparison of Llama-3 and Phi-3 using RAG
This guide shares how to create a self-hosted “Chat with your Docs” application that integrates Meta AI’s Llama3 and Microsoft’s Phi3 language models into a Retrieval Augmented Generation (RAG) system. It uses a sophisticated setup that includes custom knowledge bases, document chunking strategies, embeddings, and vector databases to improve user interactions with documents.
2. Advancing AI’s Cognitive Horizons: 8 Significant Research Papers on LLM Reasoning
Recent research in the artificial intelligence domain has focused on augmenting LLMs’ reasoning capabilities. This article summarizes some of the most prominent approaches developed to improve LLMs’ reasoning, including chain-of-thought prompting, strategic and knowledge enhancements, and integration with computational engines.
3. Is Prompting the Future of Coding?
This article explores how AI is changing the way we interact with computers. It touches upon how “prompt-gramming” — coding by prompting LLMs — is emerging as a new programming language, how it will massively lower the barrier to programming a computer, and more.
4. Build a RAG Discord Chatbot in 10 Minutes
This video shows how to get a Discord chatbot running in under 10 minutes. The bot can answer any question about your data, which can be a class textbook, company data, or anything else you want. The bot can either use your OpenAI key to leverage GPT models or open-source models with any HuggingFace-hosted models.
5. A Comprehensive Guide for Getting Started with Huggingface
This tutorial explores the HuggingFace platform, including its components, the Open LLM Leaderboard, how to access it, and how to utilize it for different purposes.
Repositories & Tools
1. Secret Llama is a private, browser-based chatbot that leverages Llama 3 and Mistral models. It is designed to run independently without server dependencies.
2. PLLaVA is a parameter-free method for extending image models to video models, designed to overcome issues like performance saturation and prompt sensitivity.
3. OpenUI lets you describe UI using your imagination and brings your thoughts to life.
4. Candle is a minimalist ML framework for Rust focusing on performance (including GPU support) and ease of use.
5. Dify is an open-source LLM app development platform that combines AI workflow, RAG pipeline, agent capabilities, model management, observability features, and more.
Top Papers of The Week!
1. Better & Faster Large Language Models via Multi-token Prediction
This research introduces an improved training method for large language models that predicts multiple future tokens simultaneously, demonstrating increased sample efficiency and performance in code and natural language tasks. This multi-token prediction method achieves faster inference speeds, up to three times quicker, without increasing training time.
2. Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Prometheus 2 is an open-source language model evaluator that improves upon earlier models by offering various assessment capabilities, including direct assessments, pairwise rankings, and custom evaluation criteria. It aims to provide evaluation results that better match human judgment and can be tailored to assess standard and proprietary language models like GPT-4.
3. ChatQA: Building GPT-4 Level Conversational QA Models
This paper introduces ChatQA, a family of conversational question-answering (QA) models that obtain GPT-4 level accuracies. It uses a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from LLMs.
4. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
This is the official paper for phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens. Its overall performance, as measured by academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5. The authors propose an instruction hierarchy to prioritize trusted instructions over others, enhancing the model’s robustness.
This paper reports a mixture of expert strategy to create fine-tuned LLMs using a deep layer-wise token-level approach based on low-rank adaptation (LoRA). X-LoRA model can be easily implemented for any existing LLM without modifying the underlying structure.
Quick Links
1. CoreWeave raises $1.1B to expand its GPU cloud infrastructure network. The company said it will use the new investment to fuel growth across all areas of its business and expand into new geographic regions.
2. Anthropic rolls out an iOS app for Claude. It can act as a chatbot, and users can also upload photos straight to the app for “image analysis.” The Claude app will be free to all users of the Claude AI models, including free users.
3. Microsoft’s first Responsible AI Transparency Report outlines the steps the company took to release responsible AI platforms last year. The company says it created 30 responsible AI tools in the past year, grew its responsible AI team, and required teams making generative AI applications to measure and map risks throughout the development cycle.
Who’s Hiring in AI!
Full stack developer- AI Studio @Vonage (Israel)
AI Prompt & Language Specialist (ENGLISH) @Keywords Studios (Remote)
Machine Learning Scientist @Amazon (Berlin, Germany)
GPU Computing Capacity Optimization Engineer @NVIDIA (US/Remote)
Java Developer @Capco (Bangalore, India)
AI Engineer Intern @Radar Roster (Remote)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #98 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.