What happened this week in AI by Louie
This week, we had a broad range of AI developments, from a new LLM software developer agent (Devin) to new open-source models (such as Grok), new humanoid robot demos (Figure AI), and potential new major distribution partnerships (Google Gemini with Apple). This week, Nvidia is also holding its GTC conference, and it revealed its new Blackwell platform and B100 chip, with many improvements compared to the H100 — the current workhorse of the LLM industry.
Cognition, an applied AI lab focused on reasoning, introduced Devin, an AI-designed autonomous software engineer. Devin is equipped with common developer tools within a sandboxed compute environment, including the shell, code editor, and browser. It can plan and execute complex engineering tasks requiring thousands of decisions. It can recall relevant context at every step, learn how to use unfamiliar technologies, train and fine-tune its own AI models, and autonomously find and fix bugs in codebases. Devin is currently in early access, and you can request access by filling out this form.
In other news, anticipation has been building for the next generation of LLMs, particularly GPT-4.5/5 and Gemini Ultra 2.0. Sam Altman recently discussed OpenAI’s next model plans on a Lex Fridman podcast, but it is unclear if GPT-5 will arrive this year or if we will first get GPT-4.5. Perhaps this depends on whether or not they believe the latest model and training run get performance to justify the GPT-5 hype! Altman’s comments suggest a gradual rollout of various advancements preceding the launch of a GPT-5-like model, “We’ll release in the coming months many different things before GPT-5-like model called that, or not called that, or a little bit worse or a little bit better than what you’d expect from a GPT-5, I think we have a lot of other important things to release first.” During the interview, Altman also denied attempting to raise $7 trillion to boost his GPU chip supply.
Finally, in potentially significant news, Apple is reportedly discussing with Google about using Gemini for generative AI features on iPhones, a potential game changer for Google’s LLM platform. While Apple is already working on bringing several AI features to its upcoming iOS 18 operating system based on its own AI models, these enhancements are likely to focus on features that run on its devices rather than being delivered from the cloud. While negotiations are ongoing, the outcome could potentially reshape the landscape of AI integration in consumer devices, with implications likely to be revealed in the coming months. Together with Gemini’s possible integration into Android, this would greatly expand the model’s distribution and offer great potential for model feedback and iteration to improve the product.
Why should you care?
Devin is a great example of the power of the current generation of LLMs when combined with well-thought-through agent workflows, tailored UI/UX, and improvements to reasoning and planning. Nevertheless, anticipation is building for the next generation of LLMs using the latest algorithmic breakthroughs and the huge number of H100 and TPU chips rolled out last year.
While GPT-5 may not be imminent, it sounds like Open AI has many updates in the pipeline. As for GPT-5, as Atman said, “I’m excited about being smarter….I think the really special thing happening is that it’s not like it gets better in this one area and worse in others. It’s getting better across the board.” It remains to be seen whether LLMs can sustain their momentum and justify the hype.
- Louie Peters — Towards AI Co-founder and CEO
This issue is brought to you by TruEra:
Are you building LLM apps? Then, you need LLM Observability — the ability to test, evaluate, and monitor your apps.
In the past 18 months, thousands of developers have tinkered with LLM apps, but very few of those experiments ever make it to production. Join TruEra for this webinar on how you can use LLM Observability to create and monitor high-performance GenAI apps…fast!
At the webinar on March 27th, you will learn:
What is LLM Observability? And how does it help you build better apps faster?
What kind of testing should you be doing?
How should you monitor your app in production, and what metrics matter?
Hottest News
1. Introducing Devin, the First AI Software Engineer
Devin is an AI-designed autonomous software engineer from Cognition, created to augment coding teams. It has strategic capabilities for complex challenges and integrates with dev tools for iterative development. Devin outperformed in the SWE bench, showing proficiency in learning and debugging. It autonomously resolved 13.86% of real-world GitHub issues, advancing over earlier AI models.
2. Anthropic Has Introduced Claude 3 Haiku
Anthropic introduced Claudee 3 Haiku, their fastest and most cost-effective AI model yet. It features strong vision capabilities and top performance on industry benchmarks. Haiku is designed for a wide range of enterprise applications and can process 21K tokens (~30 pages) per second for prompts under 32K tokens.
3. X.ai Announced Grok-1 Model: Open-Source and 314B Parameters
X AI recently announced the release of its Grok-1 large language model (LLM) under the open-source Apache 2.0 license, allowing users to access the model’s source code and weights for commercial and private uses. This 314 billion parameter model utilized a Mixture-of-experts (MoE) architecture. This is the raw base model from Grok-1’s pre-training phase, not the refined Grok AI Assistant.
4. Figure 01, World’s First Commercially Available General-Purpose Humanoid Robot
Open AI has partnered with robotics firm Figure to create a humanoid robot powered by its GPT family of language models. Figure 01 is the world’s first commercially available general-purpose humanoid robot to interact with its environment and acquire new knowledge over time. Its purpose is to address labor shortages in industries by providing an efficient and adaptable workforce.
5. World’s Most Extensive AI Rules Approved in EU Despite Criticism
The European Parliament has passed a comprehensive AI Act for risk-based regulation of artificial intelligence, mandating stringent consumer protections and maintaining human oversight. With its implementation set for 2025, the legislation is expected to influence global tech firms and potentially set a precedent for future international AI regulations.
Five 5-minute reads/videos to keep you learning
The new “Multi-Needle + Reasoning” benchmark highlights the limitations of LLMs with long contexts. It shows that while LLMs perform well when retrieving single facts from extensive data (the “Needle in a Haystack” scenario), their efficiency declines when tasked with finding multiple facts and reasoning about them. This article walks through benchmark usage and discusses results on GPT-4.
2. Enhancing RAG-Based Application Accuracy by Constructing and Leveraging Knowledge Graphs
Graph Retrieval Augmented Generation (Graph RAG) is gaining traction in data retrieval. It utilizes graph databases to enhance the context of information. This blog demonstrates how tools like Neo4j can help you add the graph construction module to LangChain.
3. How To Use LLMs Locally With Ollama and Python
This article will walk you through using Ollama, a command-line tool that allows you to download, explore, and use large language models (LLM) on your local PC with GPU support. You will also learn to use Ollama’s commands via the command line and in a Python environment.
4. Fine-Tune Mixtral-8x7B Quantized With AQLM (2-Bit) on Your GPU
This is a step-by-step guide for fine-tuning Mixtral-8x7B quantized with AQLM using only 16 GB of GPU RAM. It also discusses optimizing the fine-tuning hyperparameters to reduce memory consumption further while maintaining a good performance.
5. Sora, Groq, and Virtual Reality
The words virtual reality, augmented reality, mixed reality, and Metaverse have been used for decades, both in science fiction and in products, to describe what Apple is calling spatial computing. This blog post traces the various terms and dives into these concepts.
Repositories & Tools
1. AnswerDotAI/rerankers library provides tools for reranking documents. It can be integrated into your retrieval pipeline.
2. Full Stack FastAPI Template is a modern web application template created using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS, and more.
3. Skyvern AI automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows.
4. LiteLLM allows users to call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, etc.]
Top Papers of The Week!
1.MM1: Methods, Analysis & Insights From Multimodal LLM Pre-Training
Apple’s research team has unveiled MM1, a series of state-of-the-art multimodal AI models capable of processing visual and linguistic information. The MM1 family includes a 30 billion parameter model demonstrating superior few-shot learning abilities and excels in multimodal tasks such as Visual Question Answering (VQA) and image captioning.
2. Unlocking the Conversion of Web Screenshots Into HTML Code With the WebSight Dataset
The paper presents WebSight, a synthetic dataset of 2 million HTML and screenshot pairs designed to improve vision-language models (VLMs) in web development tasks, such as translating UI screenshots to HTML code. The authors demonstrate the VLM’s enhanced performance on this dataset and contribute to the AI community by open-sourcing WebSight, encouraging further research in applying VLMs to web development.
3. Uni-SMART: Universal Science Multimodal Analysis and Research Transformer
The rapid expansion of scientific articles presents a challenge for thorough literature analysis. LLMs offer a potential solution with their summarization capabilities but struggle with the multimodal elements in scientific content. Uni-SMART (Universal Science Multimodal Analysis and Research Transformer) has been developed to comprehend and analyze complex multimodal data in scientific literature.
4. Stealing Part of a Production Language Model
This paper introduces the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2. With API access, these attacks recover a transformer model’s embedding projection layer (up to symmetries). For under $20, this attack extracts the entire projection matrix of OpenAI’s Ada and Babbage language models.
5. MoAI: Mixture of All Intelligence for Large Language and Vision Models
This paper presents a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models. MoAI operates through two newly introduced modules: MoAI-Compressor (for aligning and condensing the output) and MoAI-Mixer (to blend visual, auxiliary, and language features).
Quick Links
1. Towards AI curated a list of The Top 13 AI-powered CRM Platforms, including their features, pros, cons, pricing, and user/expert opinions.
2. Nvidia is in advanced negotiations to acquire Run:ai, an AI infrastructure orchestration and management platform. The deal’s value is estimated at hundreds of millions of dollars and could reach $1 billion.
3. Apple acquired the AI startup DarwinAI, specializing in vision-based tech for observing components during manufacturing and improving efficiency. While Apple and DarwinAI haven’t announced this deal, several startup team members joined Apple’s machine learning teams in January.
4. Midjourney debuts a feature for generating consistent characters across multiple AI-generated images. As the feature progresses and is refined, it could take Midjourney further from being a cool toy or ideation source into more of a professional tool.
Who’s Hiring in AI!
Our friends at Barnacle Labs are hiring! Barnacle Labs is building solutions that make a difference in the world. They’re the team behind NanCI from the US government’s National Cancer Institute (NCI) — an app that “uses AI to find the scientific content and connections that help cancer researchers build a successful and fulfilling career to end cancer as we know it.” As a company, they’re exclusively focused on GenAI, where they help clients spot opportunities and build solutions that change things. Join them!
ML Ops Engineer @Barnacle Labs (Remote / London, UK)
Full Stack Engineer @Barnacle Labs (Remote / London, UK)
Senior Machine Learning Engineer — Large Language Models & Generative AI @Apple (Cupertino, CA, USA)
Lead Software Engineer — Quantum Computing and Algorithms @PASQAL (Remote/Freelancer)
Customer Success Technical Leader for AI and Cloud @INTEL (Arizona, USA)
Advanced AI Full Stack Engineer @McAfee, LLC (Remote)
Machine Learning Engineer @Adobe (San Jose, CA, USA)
Binance Accelerator Program — Data Analyst (Big Data) @Binance (Remote/Internship)
Data Scientist — 1 Year Fellowship @Tech Impact (Wilmington, DE, USA)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!
This AI newsletter is all you need #91 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Thank you for leading us to the latest AI advice that I would like to share with our circle of friends. They are passionate about AI and the next 10 years are a period of rapid growth for https://www.sexdolltech.com/ AI sex dolls cheap.
In terms of being smarter, I'm thrilled...The fact that it's not like things improve in some areas while getting worse in others is, in my opinion https://space-barclicker.com , the most remarkable thing happening.