Towards AI #103: Apple integrates GenAI

Also, Qwen2, "Kling" text-to-video, Buffer of Thoughts, and more!

Jun 11, 2024

What happened this week in AI by Louie

While the week started with some impressive new open model releases in China (Qwen2 LLM and Kling text-to-video model), anticipation was always building towards Apple’s WWDC keynote and AI announcements. As with any Gen AI production use case, Apple had to decide which features to build into its products first, how to implement them, and choose between many tradeoffs. This includes what the feature’s user benefits from relative to risks (like hallucinations and reputational damage from viral failure cases). Do we prioritize capability or latency/cost? (which can involve a decision between on-device and cloud models of various sizes). Do we use open-source models, in-house models, or external closed models? How do we balance user privacy and data security relative to ease of use and potential collection of future training and human feedback data? Towards AI can help with these questions, by the way, with our customized Generative AI courses and consultancy!

Apple has chosen to get started with three tiers of intelligence for different features: 1) a small on-device in-house 3BN parameter LLM, 2) a larger server-based in-house LLM (which look a little above GPT3.5 level) with inference on Apple silicon and many new privacy and security features and 3) a ChatGPT integration with Siri for access to the capabilities of GPT-4o. Many of Apple’s first features are geared around smarter search (including a semantic understanding of media), prioritization of alerts and emails, transcription, summary, writing, and image tools. There are also hints of more agentic capabilities with Siri enabled to take actions in and across apps.

Why should you care?

With Apple’s 1 billion users and often trend-setting products, we think Apple’s AI choices are important for the direction of the whole industry. While the integration of ChatGPT into Siri seems like a big win for OpenAI — we do not think the relationship feels exclusive. Apple stated they would also later integrate Google’s Gemini model, and we think its new “App Intents API” and ability to connect third-party apps to Siri will likely lead to an open playground of third-party LLM models and products being integrated to various degrees. At the same time, however, data security and privacy with the often highly personal data stored within your iPhones and Macs are much easier to manage with on-device models or Apple private cloud models (though we still expect skepticism on how safe your data is here), so we expect pressure towards vertical integration for many features and capabilities. In any case, we think Apple’s late entry into the Generative AI and long overdue revamp of Siri will provide a lot of opportunities for AI and LLM developers going forward!

—Louie Peters — Towards AI Co-founder and CEO

Hottest News

1. Qwen2 Released

The Qwen2 series is an advancement over the Qwen1.5, introducing five enhanced AI models with new features such as support for 27 additional languages and improved coding and mathematics functions. The standout Qwen2–72B offers superior safety and can comprehend lengthy contexts of up to 128K tokens. These models are available on Hugging Face and ModelScope.

2. Mistral Launches Fine-Tuning Tools To Make Customizing Its Models Easier and Faster

Mistral introduced mistral-finetune for developers who want to fine-tune Mistral’s open-source models on their infrastructure. The codebase is built on the LoRA training paradigm and facilitates serverless fine-tuning. Users can try it by registering on their la Plateforme.

3. OpenAI Is Rebooting Its Robotics Team

OpenAI is reinstating its robotics division, focusing on creating AI models for robotic applications in collaboration with external robotics companies. This is a strategic pivot from producing in-house hardware to empowering humanoid robots through partnerships, as evidenced by investments in entities like Figure AI.

4. OpenAI and Google DeepMind Workers Warn of AI Industry Risks in Open Letter

A group of current and former employees from prominent artificial intelligence companies, including OpenAI and Google DeepMind, have issued an open letter calling for increased transparency and protections for whistleblowers within the AI industry. The letter, which calls for a “right to warn about artificial intelligence,” is one of the most public statements about the dangers of AI.

5. Chinese Company Kuaishou Releases Kling

Chinese short-video app Kuaishou has launched a text-to-video service similar to OpenAI’s Sora. The Kling AI Model, in the trial stage, can process text into video clips up to 2 minutes long with 1080p resolution, supporting various aspect ratios.

Five 5-minute reads/videos to keep you learning

1. LLM Merging Competition: Building LLMs Efficiently through Merging

The article introduces a competition that challenges participants to integrate multiple fine-tuned LLMs to improve their performance and adaptability to novel tasks. Competitors will utilize pre-trained expert models with up to 8 billion parameters from the Hugging Face Model Hub, available under research-friendly licenses. The competition aims to minimize the costs and challenges of training LLMs from the ground up by utilizing existing models.

2. Scholars: AI Isn’t “Hallucinating” — It’s Bullshitting

We know AI models hallucinate, but scholars Michael Townsend Hicks, James Humphries, and Joe Slater from the University of Glasgow argue that these inaccuracies are better understood as “bullshit.” This article explains why these inaccuracies might be better described as bullshit.

3. Claude’s Character

It is essential to train AI models to have good character traits and to continue to have these traits as they become more extensive. This article from Anthropic explains the process behind crafting the personality of its Claude AI model, using ‘Character Training’ to help instill curiosity, thoughtfulness, and diverse viewpoints.

4. Extracting Concepts from GPT-4

Researchers have employed sparse autoencoders to break down GPT-4’s neural network into 16 million human-interpretable features, allowing for enhanced comprehension of AI processes. In this post, Open AI explains it further. They have also shared a paper detailing their experiments and methods.

5. Token-wise Influential Training Data Retrieval for Large Language Models

This article introduces RapidIn, a framework designed to efficiently estimate the influence of training data on large language models (LLMs) by compressing gradient vectors into low-dimensional representations called RapidGrads. RapidIn addresses challenges related to scalability, computational efficiency, and handling massive datasets.

Repositories & Tools

1. Vectorize, built for RAG, turns unstructured data into perfectly optimized vector search indexes.

2. Spreadsheet is all you need: a nanoGPT pipeline packed in a spreadsheet created to understand how GPT works.

3. Replicate allows you to run and fine-tune open-sourced AI models using an API.

4. transformers.js allows you to run the transformers directly in your browser.

5. Build your own X is a compilation of well-written, step-by-step guides for re-creating technologies like AR, Bots, Torrent, etc., from scratch.

Top Papers of The Week!

1. Open-endedness is Essential for Artificial Superhuman Intelligence

This paper argues that open-endedness — the ability to create new, learnable ‘artifacts’ — is the key to achieving artificial superhuman intelligence (ASI). It provides a concrete formal definition of open-endedness through the lens of novelty and learnability. It also examines the safety implications of generally capable open-ended AI.

2. Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Seed-TTS encompasses advanced autoregressive and non-autoregressive text-to-speech models capable of generating human-like speech with emotional variability, speaker similarity, and naturalness. It also showcases proficiency in end-to-end speech generation and editing through a diffusion-based architecture.

3. Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

This paper introduces Buffer of Thoughts (BoT), a thought-augmented reasoning approach for enhancing large language models’ accuracy, efficiency, and robustness. They use a meta-buffer to store a series of informative high-level thoughts called thought-template and then retrieve a relevant thought-template for each problem and adaptively instantiate it with specific reasoning structures.

4. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

This paper analyzes the structured relationship between Transformers and state-space models (SSMs) using matrix analysis, introducing a theoretical framework that connects the two. It also presents an improved architecture, Mamba-2, which builds on its predecessor by being significantly faster (2–8 times) and maintaining comparable performance in language modeling tasks.

5. Matching Anything by Segmenting Anything

This paper proposes MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. MASA learns instance-level correspondence through exhaustive data transformations, and leverages object segmentation from the Segment Anything Model (SAM).

Quick Links

1. Stability AI has launched Stable Audio Open, an AI model that generates sound from text descriptions using royalty-free samples geared towards non-commercial use. The model was trained using around 486,000 samples from free music libraries, such as Freesound and the Free Music Archive.

2. AI-powered search startup Perplexity is facing accusations of plagiarizing content from news outlets like Forbes, CNBC, and Bloomberg through its Perplexity Pages feature. While Perplexity includes small logos linking to the sources, the posts do not mention the publications by name.

3. Hugging Face and Pollen Robotics created an open-source robot. Pollen designed the humanoid robot and partnered with Hugging Face to train it to do various household tasks and safely interact with humans and dogs.

Who’s Hiring in AI!

GenAI and AIML Solutions Architect @Amazon (Courbevoie, France)

Technical Curriculum Developer (Data & AI) @Databricks (Remote)

Data Scientist — LLM @TikTok (Vancouver, Canada)

Product Manager — Data Integrations @One Model (Australia/Remote)

Lead Python Developer — 2045628 @Gameface Associates Ltd (London / Hybrid)

Computer Programmer / AI Manager @Monster Group (York, UK)

Specialist in Large Language Models and NLP for an Architectural Firm — KTP Associate @Newcastle University (Newcastle upon Tyne, UK)

Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.

If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!

This AI newsletter is all you need #103 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Towards AI Newsletter

Discussion about this post

Ready for more?