TAI #159: China's Open-Model Offensive vs. Meta's $multi-billion Gamble on AI Talent Acquisition

Also, Google's Gemma 3n, Tencent's Hunyuan A13B, Baidu's ERNIE 4.5, and the high price of trade secrets.

Louie Peters

Towards AI

, and

Louis-François Bouchard

Jul 01, 2025

What happened this week in AI by Louie

This week felt like a tale of two AI strategies unfolding in parallel. In China, the open-source movement gained additional momentum as Baidu joined the trend and followed Tencent and Alibaba with strong new models. Meanwhile, in the US, as discussed last week, the AI race has escalated into a high-stakes talent war, with a seemingly desperate Meta spending billions to buy its way back to the frontier. The contrast highlights a growing divergence in how the world’s major tech ecosystems are approaching the future of AI.

In the Chinese AI industry, the majority of leading players employ an open weight model strategy, and more players are becoming increasingly competitive. Tencent released Hunyuan-A13B, an efficient 80B parameter Mixture-of-Experts (MoE) model with 13B active parameters. It boasts a 256K context window and a novel “fast and slow thinking” mode, allowing it to toggle between rapid responses and more deliberative, chain-of-thought style reasoning. Not to be outdone, Baidu joined the open weight movement and unveiled its ERNIE 4.5 family, a suite of ten multimodal models headlined by a 424B parameter MoE (47B active). Its key innovation is a heterogeneous modality architecture that uses dedicated parameters for different data types (text, image) within the same model, improving cross-modal understanding without compromising text performance. This follows Alibaba’s recent release of Qwen-VLo, a unified multimodal model that not only understands but also generates and edits images with impressive instruction-following via a progressive, left-to-right generation process.

While the US is often characterized by its closed labs, Google also contributed a significant open model this week with Gemma 3n. This new mobile-first, multimodal family (E2B and E4B effective parameters) introduces several architectural innovations. At its core is the MatFormer architecture, a nested transformer that enables “elastic inference” and allows developers to create custom-sized models. To manage memory, Gemma 3n uses Per-Layer Embeddings (PLE), which offloads a large portion of the model’s 8B total parameters to the CPU, leaving a more manageable 4B footprint on the accelerator. It’s a formidable on-device model, with the E4B variant becoming the first sub-10B model to surpass a 1300 ELO score on LMArena.

Meta’s recent strategy feels like a high-stakes scramble to regain its footing. After the lackluster release of Llama 4, the company is on an unprecedented spending and hiring spree to overhaul its AI efforts, now consolidated under the “Meta Superintelligence Labs” banner. The centerpiece is the nearly $15 billion deal for a 49% stake in Scale AI, widely seen as an acqui-hire of its CEO, Alexandr Wang, to be Meta’s new Chief AI Officer.

Meta’s audacious talent grab has continued this week, and he’s now joined at Meta by former GitHub CEO Nat Friedman and, likely soon, SSI CEO Daniel Gross after a buyout of their VC fund. Beyond AI leadership, it has also had success in its research talent raid on its rivals. The list of recent hires reads like a who’s-who of the people who built the very models Meta is chasing. From OpenAI, Meta has poached Trapit Bansal (pioneered RL on chain-of-thought with o1), Shuchao Bi (co-creator of GPT-4o voice), Huiwen Chang (co-creator of GPT-4o’s image generation), and Shengjia Zhao (co-creator of ChatGPT, GPT-4, and all mini models). From Google, they’ve hired Jack Rae (pre-training tech lead for Gemini 2.5) and Pei Sun (post-training for Gemini). They even snagged Joel Pobar, a key inference lead from Anthropic.

Meta’s aggressive recruitment highlights a unique feature of the Silicon Valley ecosystem: the general absence of non-compete agreements. This makes it remarkably easy for companies to hire not just talent, but also the invaluable, unwritten trade secrets they carry — knowledge about data recipes, training infrastructure, and architectural dead ends from rival labs. This makes the calculus for how much an employee with inside knowledge is worth incredibly complex. From one perspective, Meta’s huge outlays may actually be a bargain. Meta’s models have been underperforming relative to their investment in GPUs, made all the more clear by Deepseek’s success early this year. Acquiring this institutional knowledge could allow them to bypass years of costly trial-and-error, better utilise their existing compute investment, and catch up in the critical AI race.

Why should you care?

The AI labs are clearly bifurcating into two distinct ecosystems, each with its own way of shaping the competitive field. China is increasingly defined by its open-weight model strategy, where major players like Tencent, Alibaba, Deepseek, and now Baidu release powerful models publicly. This approach fosters broad-based innovation and effectively prevents any single domestic company from achieving a monopoly on advanced AI.

The US, in contrast, appears to be a largely closed-model ecosystem, but the reality is more complex. Due to the absence of non-competes in Silicon Valley, intellectual property flows with surprising freedom. A core architectural breakthrough or a novel training objective can be communicated in a short conversation, making top talent the primary carriers of trade secrets. This has created a perpetual “merry-go-round” where leading labs — OpenAI, Google, Anthropic, and now Meta — continuously poach talent and ideas from one another.

Until now, Meta has been the primary champion of open-weight models among the US frontier labs, making it an outlier. While Google releases strong open models like Gemma, they are strategically positioned below their state-of-the-art Gemini series. OpenAI’s long-promised open-weight model is likely to take the same approach. But can Meta’s open-source ethos survive its own multi-billion-dollar spending spree? As the company gambles vast sums to close the gap with its rivals, the question arises whether it can justify giving away the fruits of such an expensive effort, especially if it succeeds in reaching the frontier.

Ultimately, the outcome is that both systems — China’s explicit openness and the US’s de facto knowledge sharing via talent mobility — make a single global AI monopoly less likely. This shifts the competitive dynamics. These two systems of idea sharing are ultimately great news for LLM users and developers, as they create a competitive flywheel that drives down prices and pressures labs to offer more developer-friendly features; if one company holds back on fine-tuning, another will surely use it as a selling point.

If breakthrough ideas can’t be hoarded for long, the decisive advantages become the factors that are hardest to replicate: access to the most compute and the best proprietary data. The ability to not just train the largest models, but to do so more frequently, remains a critical edge in the race to the frontier.

— Louie Peters — Towards AI Co-founder and CEO

Hottest News

1. Alibaba Qwen Team Releases Qwen-VLo

Alibaba’s Qwen team has released Qwen-VLo, a new model in the Qwen family designed to unify multimodal understanding and generation within a single framework. Qwen-VLo supports generating, editing, and refining visual content from text, sketches, and commands across languages, as well as through step-by-step scene construction. While the architecture details aren’t fully disclosed, the model likely builds on Qwen-VL’s Transformer base, with improvements in cross-modal attention, adaptive fine-tuning, and structured spatial and semantic grounding.

2. Tencent Introduces Hunyuan-A13B Open-Source LLM

Tencent has open-sourced Hunyuan-A13B, a 13B-active-parameter Mixture-of-Experts model from a total of 80B. The model matches the performance of o1 and DeepSeek on standard benchmarks and is optimized for long-text reasoning and agentic tool use. Alongside the model, Tencent released two datasets: ArtifactsBench, which supports visual and interactive code evaluation, and C3-Bench, designed to surface agent-specific vulnerabilities and interpretability challenges.

3. Baidu Open Sources ERNIE 4.5 Models

Baidu has released the ERNIE 4.5 family of models, achieving strong performance across instruction following, world knowledge, and multimodal reasoning. Trained with PaddlePaddle and achieving 47% MFU on its largest model, the family comprises 10 models: dense and MoE variants, with the largest model reaching 424B total parameters (47B active). The models show competitive results across text and multimodal tasks.

4. Google Released a Full Version of Gemma 3n

Google has launched the full release of Gemma 3n, a Matryoshka Transformer–based model built for on-device multimodal processing. Using Per-Layer Embeddings (PLE), Gemma 3n can improve quality without exceeding device memory limits. The E2B and E4B versions, with 5B and 8B parameters, respectively, are optimized for elastic inference across CPUs and accelerators, enabling fast and efficient deployment on edge devices.

5. Google DeepMind Releases AlphaGenome

DeepMind has introduced AlphaGenome, a new deep learning framework designed to predict the regulatory consequences of DNA sequence variations across a wide spectrum of biological modalities. Accepting sequences up to 1 megabase, AlphaGenome generates base-level predictions across modalities like splicing, chromatin access, and gene expression. It utilizes a transformer-based U-Net architecture and processes 131 KB chunks in parallel on TPUv3 hardware for detailed, context-aware outputs.

6. Anthropic Introduced Economic Futures Program

Anthropic launched its Economic Futures Program, a new initiative that supports research on AI’s impacts on the labor market and global economy and develops policy proposals to prepare for the shift. The program will focus on three main areas: providing grants to researchers investigating AI’s effect on labor, productivity, and value creation.

7. Creative Commons Debuts CC Signals, a Framework for an Open AI Ecosystem

Non-profit Creative Commons announced the launch of a new project, CC signals, which will allow dataset holders to detail how their content can or cannot be reused by machines, as in the case of training AI models. It aims to provide a legal and technical solution that establishes a framework for dataset sharing, intended for use between those who control the data and those who utilize it to train AI.

Six 5-minute reads/videos to keep you learning

1. Build Agentic Systems Using Reasoning LLMs

This piece walks through the foundations of building agentic systems with reasoning LLMs like Claude 3.7 and o3. It covers core design patterns, including Agentic RAG, LLM-as-a-Judge, and hybrid prompting, while also addressing known model limitations and how to work around them.

2. Hybrid Model-Based RL for Intelligent Marketing: Dyna-Q Meets Transformer Models and Bayesian Survival Priors

This article presents a hybrid system that combines Bayesian modeling, Transformers, and reinforcement learning to optimize coupon distribution in e-commerce. A Bayesian survival model estimates repurchase probability, a Transformer forecasts the profit impact of issuing a coupon, and a Dyna-Q agent refines the strategy using simulated outcomes — the result: a daily list of optimized offers targeting high-value customers with better precision.

3. MemoryOS MCP + RAG Agent That Can Remember Anything

To address the short memory window of LLMs, this post introduces MemoryOS, a structured memory architecture designed to retain context and preferences over extended interactions. Using a Memory Chunking Planning (MCP) system, it organizes interactions into manageable chunks. Conversations are divided into tiers of short-term, mid-term, and long-term memory. Frequently accessed “hot” segments are analyzed to build persistent user profiles, resulting in more consistent and personalized interactions.

4. Building an AI-Powered Smart Travel Planner with Multi-Agent AI and LangGraph

This guide details the development of an AI travel planner using LangGraph to coordinate multiple specialized agents, powered by Llama 3.x. Each agent handles a distinct task, including itinerary generation, activity suggestions, weather lookup, and live web queries. The piece walks through wiring the system together and building an interactive Streamlit frontend for dynamic, user-driven planning.

5. Data Preprocessing for Effective Machine Learning Models

This article offers a hands-on overview of data preprocessing techniques essential for training ML models. It covers handling missing data through various imputation techniques, such as mean, KNN, and regression, explaining the assumptions for each. It then details numerical feature scaling, comparing normalization, standardization, and robust scaling for different data distributions. It also explores methods for encoding categorical variables, including one-hot, label, and target encoding.

6. Intro To AI Agents And LangGraph

This introductory post explains how to use LangGraph to build LLM-powered agents and workflows. It introduces the graph’s basic components — nodes, edges, and state — and demonstrates how they enable modular, task-specific logic. The example shows how to grant an agent access to external tools, such as retrieving the current time, and briefly explores other architectures, including ReAct and RAG. It concludes with a reminder on the importance of validation for achieving production readiness.

Repositories & Tools

1. Seed Coder is a family of lightweight open-source code LLMs comprising base, instruction, and reasoning models.

2. LMCache is an LLM serving engine extension designed to reduce TTFT and increase throughput, particularly in long-context scenarios.

3. Perplexica is an open-source alternative to Perplexity AI.

Top Papers of The Week

1. MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

MEM1 is a reinforcement learning framework that enables language agents to handle long-horizon, multi-turn tasks without incurring increased memory costs. Instead of storing full histories, it maintains a compact internal state, updating it at each step to integrate new inputs while discarding irrelevant context. Tested on tasks such as web QA and online shopping, MEM1 achieved up to 3.5 times higher performance and used 3.7 times less memory than larger models, while generalizing well to unseen task sequences.

2. Steering Conceptual Bias via Transformer Latent-Subspace Activation

This paper proposes a gradient-refined adaptive activation steering framework (G-ACT) to address the challenge of steering scientific code generation toward specific programming languages in LLMs. It evaluates five causal LLMs on scientific coding prompts. G-ACT clusters per-prompt activation differences into steering directions and uses lightweight per-layer probes that are trained and refined online to select suitable steering vectors. The framework supports concept-level control while ensuring scalability and interpretability, providing a practical method for achieving reproducible behavior in agentic systems that require consistent programming language choices for scientific computing tasks.

3. ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

ProtoReasoning improves reasoning in LLMs by utilizing structured prototype representations, such as Prolog and PDDL. This system includes an automated pipeline to translate problems into these formats, a reliable verification setup using interpreters, and scalable problem synthesis without manual labeling. Models trained with these prototypes showed measurable gains in logical reasoning (+4.7%), planning (+6.3%), general reasoning (+4.0%), and math (+1.0%) tasks.

4. Unified Vision-Language-Action Model

UniVLA is a multimodal model that jointly learns vision, language, and action as discrete token sequences in an autoregressive setup. Designed for learning from large-scale video data, it integrates world modeling to capture causal dynamics and supports transfer to downstream policy learning. UniVLA is especially effective for long-horizon, multimodal tasks.

5. Confidential Inference Systems

This paper outlines Anthropic’s approach to confidential inference, ensuring both model weights and user data remain encrypted throughout inference, only decrypted within a verified, hardware-isolated environment. The system uses a small, trusted loader running in a Trusted Execution Environment (TEE), which decrypts inputs and weights just in time, invokes the accelerator, and re-encrypts the outputs. Integrity is enforced through cryptographic attestation, while input-dependent key release guarantees that only verified, signed code can access secrets. This setup secures model confidentiality, even from infrastructure operators, and prevents data leaks from rogue or compromised components.

Quick Links

1. Google unveils Gemini CLI, an open-source AI tool for terminals. While known for its coding abilities, this version offers lightweight access to Gemini and excels in content generation, problem-solving, deep research, and task management.

2. Manus released an agentic browser called Cloud Browser that can sync login states, access the web, and carry out tasks on the user’s behalf. After a manual first step, it can automate similar workflows without running into login interruptions. It’s now available for trial.

3. OpenAI is utilizing Google’s TPUs to power ChatGPT and other products, marking the first time it has used non-NVIDIA chips. While the most advanced TPUs are reserved for Google’s internal AI teams, companies such as Apple, Safe Superintelligence, and Cohere also utilize Google Cloud TPUs, partly due to their past engineering ties with Google. It remains unclear whether OpenAI will utilize TPUs for training its models.

Who’s Hiring in AI

Tech Lead — Acceleration @Perplexity (San Francisco, CA, USA)

Internal AI Solutions Trainee @Sword Health (Remote)

Big Data Developer @Nexxen (Tel Aviv, Israel)

Senior Software Engineer (f/m/d) — AI/Data @LeanIX Jobs (Berlin, Germany)

Data Project Coordinator, London @Isomorphic Labs (London, UK)

Senior AI/ML Scientist @Leidos (San Diego, CA, USA)

Research Scientist @Humana (Charleston, SC, USA)

AI/ML Engineer (Conversational AI) @Technology & Product (Remote/Japan)

Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.

Towards AI Newsletter

Discussion about this post