TAI #122; LLMs for Enterprise Tasks; Agent Builders or Fully Custom Pipelines?
Also; Nvidia Nemotron-70B, Ministral 3B & 8B, IBM Granite 3.0, Allegro, Movie Gen, and more.
What happened this week in AI by Louie
This week, the focus on customizing LLMs for enterprises gained further momentum with Microsoft’s announcement of Copilot Studio agents, following Salesforce’s launch of AgentForce last month. These no/low-code tools allow businesses to build AI-powered agents that handle customized tasks, similar to OpenAI’s GPT store, but with added flexibility and integration for specific business workflows and tool use. We think LLM agent abilities have also been enhanced by OpenAI’s new o1 model series which allows the LLM to use more inference tokens and spend more time planning its actions.
We have long argued that while LLMs are getting very close to adding huge productivity gains across much of the economy, “out of the box” foundation models generally fall short of the reliability threshold needed for productivity gains on enterprise tasks. After factoring the time spent getting ChatGPT or Claude working for your task and then diligently checking and correcting its answers the productivity gains are quickly reduced. This problem can be partially solved by task customization and internal data access via No-code Agent/GPT Builder platforms and we think these will be very valuable products and stepping stones for many companies. However, we still believe a lot more flexibility and hard work is needed to truly optimize an LLM for specific datasets, companies, and workflows and get the best possible reliability and user experience. This work can’t be fully achieved with no-code tools and without knowledge of how LLMs work — but it is also a different skill set to both traditional software development and machine learning engineering. For this reason at Towards AI we are focused on training the new discipline of “LLM Developers”; creating a workforce to build customized LLM pipelines on top of foundation LLMs for a huge array of different specific workflows or products.
Microsoft is introducing customizable autonomous agents through Copilot Studio, moving from private to public preview starting next month. These agents, designed to perform complex tasks independently, are embedded within Microsoft’s existing productivity tools like Dynamics 365. Through Copilot Studio, organizations can now build agents tailored to their specific business needs using a low-code interface. The platform integrates enterprise data and LLM models, allowing these agents to automate workflows, interact with other systems, and operate based on defined triggers and business logic. Microsoft is also introducing ten new autonomous agents within Dynamics 365, specifically targeting areas like sales, service, finance, and supply chain management. These agents can perform tasks such as prioritizing sales leads or optimizing supply chain operations, enabling teams to focus on more strategic work.
Several companies have adopted Microsoft’s copilot agents. McKinsey & Company created an agent that reduces client onboarding time by 90% and administrative tasks by 30%, while Thomson Reuters uses an agent to cut some legal due diligence tasks by 50%. At Microsoft, these AI tools are also driving internal success, with one sales team increasing revenue per seller by 9.4% and closing 20% more deals. Additionally, the use of agents has led to a 21.5% boost in marketing conversion rates and a 42% improvement in HR self-service accuracy.
Why should you care?
LLM-driven productivity can be extremely lucrative for the entrepreneurs and companies that build or adopt them. It has the power to truly transform the global economy, but a lot of work on customization is needed to get there! And this creates a huge opportunity for software developers and AI experts to join in the effort. To access the potential benefits of LLMs we need to build reliable products on top of these models. This includes bringing in industry expertise and developing a thorough understanding of the use case and problems you are solving in your specific workflow, company, or industry niche. Foundation Model capabilities will get better but we expect your custom pipeline, custom dataset, tailored RAG pipeline, hand picked fine-tuning and multi-shot task examples, custom evaluations, custom UI/UX and thorough understanding of your use case and customer’s problems will always add incremental reliability and ease of use. This will create the best in class product for your niche and will create the most value for companies.
We think agent builder platforms such as Microsoft Copilot Studio agents and Enterprise adoption of Chatgpt are likely to be a taster and stepping stone for companies towards exploring the huge benefits of LLMs. This will accelerate adoption of LLMs at enterprise and trigger development of in-house customized LLM tools as well as exploration of the best in class third party LLM pipelines for specific workflows and industries. If you want to take part in this opportunity; we are very close to releasing the most comprehensive practical LLM python Developer course out there; From Beginner to Advanced LLM Developer — 85+ lessons progressing all the way from dataset collection and curation to deployment of a working advanced LLM pipeline. More about this next week!
— Louie Peters — Towards AI Co-founder and CEO
Hottest News
1. Nvidia Just Dropped a New AI Model on the Level of OpenAI’s GPT-4
Nvidia has introduced the Llama-3.1-Nemotron-70B-Instruct AI model, outperforming GPT-4 in some benchmarks. The model used ~20,000 prompt responses from human and synthetic data to make the model more helpful. This continues Nvidia’s expansion from GPU manufacturing to also providing more external AI software and models.
2. Mistral Releases New AI Models Optimized for Laptops and Phones
Mistral has introduced “Les Ministraux,” a series of AI models, Ministral 3B and Ministral 8B, optimized for edge devices like laptops and phones. These models focus on privacy-first, low-latency applications such as on-device translation and local analytics. Available for download or via Mistral’s cloud platform, they reportedly outperform competitors in AI benchmarks. This launch follows Mistral’s recent $640 million funding round, signaling continued expansion in AI offerings.
3. INTELLECT–1: Launching the First Decentralized Training of a 10B Parameter Model
INTELLECT-1 introduces the first decentralized training of a 10-billion-parameter AI model using the OpenDiLoCo method, aiming to democratize AI development by reducing centralized control and enhancing open-source access. The project focuses on optimizing communication and compute efficiency, inviting public contributions to advance open-source AI capabilities.
4. OpenAI & Microsoft Reportedly Hire Banks To Renegotiate Partnership Terms
OpenAI and Microsoft are renegotiating their partnership terms, with Goldman Sachs and Morgan Stanley advising, to redefine Microsoft’s stake and governance in OpenAI after its transition to a benefit corporation. This follows OpenAI’s efforts to secure cheaper cloud services and a $10 billion deal with Oracle. Despite a projected $5 billion loss this year, OpenAI aims for profitability by 2029 with reported internal forecast revenues of $100 billion.
5. IBM Debuts Open-Source Granite 3.0 LLMs for Enterprise AI
IBM is expanding its enterprise AI business with the launch of the third generation of Granite LLMs. A core element of the new generation is the continued focus on real open-source enterprise AI. IBM is also ensuring that models can be fine-tuned for enterprise AI with its InstructLab capabilities. The new models include general-purpose options with a 2 billion and 8 billion Granite 3.0. There are also Mixture-of-Experts (MoE) models that include Granite 3.0 3B A800M Instruct, Granite 3.0 1B A400M Instruct, Granite 3.0 3B A800M Base, and Granite 3.0 1B A400M Base.
6. Former OpenAI CTO Mira Murati Is Reportedly Fundraising for a New AI Startup
Mira Murati, the OpenAI CTO who announced her departure last month, is raising VC funding for a new AI startup, according to Reuters. This startup will reportedly focus on building AI products based on proprietary models and could raise more than $100 million in this round.
7. Rhymes AI Released Allegro: Advanced Video Generation Model
Rhymes AI announces the open-source release of Allegro, an advanced text-to-video model. It can generate detailed 6-second videos at 15 FPS with 720x1280 resolution, can be interpolated to 30 FPS with EMA-VFI. It is a 175M parameter VideoVAE and a 2.8B parameter VideoDiT model that supports multiple precisions (FP32, BF16, FP16) and uses 9.3 GB of GPU memory in BF16 mode with CPU offloading.
Five 5-minute reads/videos to keep you learning
Dario Amodei, CEO of Anthropic, emphasizes the transformative benefits of AI in his essay, advocating for a balanced discourse that acknowledges both risks and positive impacts. He envisions AI enhancing health, economics, and governance, stressing the importance of managing risks to achieve significant societal advancements.
2. How To Build a Custom Text Classifier Without Days of Human Labeling
This post explains how to build a text classification model combining LLMs and human feedback, drastically reducing the time to deploy a supervised model for a specialized use case. It also highlights how to autolabel a dataset by defining its fields, labels, and annotation guidelines, human review and improvement of LLM suggested labels, train a specialized SetFit model, and compare it against an LLM few-shot classifier.
3. AI Is Confusing — Here’s Your Cheat Sheet
The AI industry is filled with jargon, and it can be challenging to understand what’s actually happening with each new development. This article compiles a list of some of the most common AI terms. It explains what they mean and why they’re important.
The article explores the challenges of mitigating human biases in language model datasets for fine-tuning LLMs. It dives into the complex interplay between confounds and various biases during this process, discussing how human biases can inadvertently shape AI models and impact their fairness and accuracy.
5. What Geoffrey Hinton’s Nobel Prize Means for the AI World
The article discusses Geoffrey Hinton’s potential Nobel Prize in the context of AI. It also dives into the broader impact this accolade could have on the perception of AI research, its ethical considerations, and its future direction.
Repositories & Tools
1. Tabled library utilizes the Surya tool to detect and extract tables from various document types, converting them into markdown, CSV, or HTML formats.
2. Mini Omni 2 is an omni-interactive model that can understand image, audio, and text inputs and has end-to-end voice conversations with users.
3. Phidata is a framework for building agentic systems that can be used to build intelligent agents, run those agents as a software application, and optimize your agentic system.
4. CoTracker is a model for tracking any point (pixel) on a video.
Top Papers of The Week
1. Movie Gen: A Cast of Media Foundation Models
Meta’s Movie Gen introduces advanced foundation models for generating high-quality 1080p HD videos with synchronized audio. These models support instruction-based editing, personalized video creation, text-to-video synthesis, and video-to-audio generation, with the largest model featuring 30 billion parameters.
2. Aria: An Open Multimodal Native Mixture-of-Experts Model
This paper introduces Aria, an open multimodal native model with best-in-class performance across various multimodal, language, and coding tasks. Aria is a mixture-of-expert model with 3.9B and 3.5B activated parameters per visual and text token, respectively. It outperforms Pixtral-12B and Llama3.2–11B and is competitive against the best proprietary models on various multimodal tasks.
3. Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
The paper introduces a World-model-augmented (WMA) web agent designed to address the limitations of current large language models in web navigation tasks, particularly their lack of predictive “world models.” By simulating action outcomes, the WMA agent enhances decision-making.
4. Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
Meta Decision Transformer (Meta-DT) is an innovative approach in offline meta-reinforcement learning that integrates transformer architecture with world model disentanglement for improved task representation. It enhances generalization using a context-aware world model and trajectory-based prompts.
5. LightRAG: Simple and Fast Retrieval-Augmented Generation
LightRAG incorporates graph structures into text indexing and retrieval processes. It employs a dual-level retrieval system that enhances comprehensive information retrieval from low-level and high-level knowledge discovery. Additionally, integrating graph structures with vector representations facilitates efficient retrieval of related entities and their relationships, significantly improving response times while maintaining contextual relevance.
6. DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Only a few retrieval heads are essential for processing long contexts and need full attention on all tokens. In contrast, the remaining streaming heads focus on recent tokens and do not require full attention. Based on this insight, the paper introduces DuoAttention. This framework only applies a full KV cache to retrieval heads while using a lightweight, constant-length KV cache for streaming heads, reducing LLM’s decoding and pre-filling memory and latency without compromising its long-context abilities.
Quick Links
1. Meta has released an AI tool called the ‘Self-Taught Evaluator’ that can assess and improve the accuracy of other AI models without any human intervention. The ‘Self-Taught Evaluator,’ introduced in a paper, described how the model would follow a similar “chain of thought” method that OpenAI’s o1 model uses to ‘think’ before it responds.
2. UCLA researchers introduced SLIViT, a deep-learning framework that teaches itself quickly to automatically analyze and diagnose MRIs and other 3D medical images — with accuracy matching that of medical specialists in a fraction of the time.
3. Adobe’s Firefly Video Model is launching across a handful of new tools, including some right inside Premiere Pro that will allow creatives to extend footage and generate video from still images and text prompts.
4. Xai released its long awaited API. The API currently only has one model “grok-beta”. It costs $5 per million input tokens and $15 per million output tokens.
Who’s Hiring in AI
Machine Learning Systems Engineer, RL Engineering @Anthropic (Bay Area, CA, USA)
PhD AI/ML Engineering Internship — Recommender Systems & Search @LinkedIn (Mountain View, CA, USA)
AI Technical Writer and Developer for Large Language Models @Towards AI Inc (Remote)
Technical Writer, AI/ML Docs @Amazon (Seattle, WA, USA)
Senior Fullstack Engineer, Conversations AI @Postscript (Remote/USA)
R&D Engineering Intern @New York Times (New York, NY, USA)
Intern, AI Research Scientist — 3D Generation @Autodesk (London, UK)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.