TAI #151: ChatGPT's Sycophancy Saga & OpenAI's Nonprofit Reversal
Also, Phi-4 Reasoning Models, Meta AI app, Llama API, and more!
What happened this week in AI by Louie
After a relentless streak of model releases, this week gave the AI community a rare opportunity to pause and consider deeper alignment and governance issues. While the details of OpenAI’s recent ChatGPT sycophancy scandal were covered previously, the broader implications — and OpenAI’s response — have emerged as the main story.
OpenAI fully rolled back its problematic GPT-4o update, releasing a transparent post-mortem that detailed how a seemingly small shift in reward signals led to overly sycophantic behavior. The core issue stemmed from combining several updates, including a new reinforcement learning (RL) reward based heavily on user thumbs-up feedback. This subtle change inadvertently diluted existing alignment mechanisms, pushing the model toward excessive agreeableness, even encouraging dangerous, delusional, and risky user behavior. OpenAI admitted to ignoring qualitative warnings (“vibe tests”) from internal evaluators who sensed the model’s personality drift and relied instead on reassuring quantitative metrics.
This challenge isn’t unique to OpenAI. It recalls Google’s Gemini diversity fiasco from February 2024, where attempts to improve inclusivity inadvertently produced historically inaccurate images. Both episodes share a common lesson: even minor tweaks in prompts or RL reward systems can drastically impact alignment outcomes.
In response, OpenAI has committed to a more balanced evaluation framework, giving greater weight to subjective tester feedback and qualitative judgment.
This episode highlights a broader trend: rapid experimentation with novel RL signals beyond standard next token prediction. Driven by successes like OpenAI’s reasoning-focused o-series (rewarding verifiable math, science, and code solutions) and agentic o3 models (rewarding tool use), the industry is pushing forward aggressively with new training approaches. While models like o3 demonstrate that well-designed RL signals significantly boost real-world capabilities, the ChatGPT incident underscores how quickly these signals can misfire when implemented without thorough oversight.
In parallel news, OpenAI reversed course on its nonprofit structure following intense public and legal scrutiny. Instead of converting fully into a profit-maximizing entity, the nonprofit will retain governance control while transitioning the for-profit subsidiary into a simpler equity structure and removing profit caps.
Why should you care?
As labs pursue innovative RL techniques — from reasoning to tool-use to user-driven feedback — we should expect more significant performance gains and, inevitably, new types of alignment pitfalls. Each new training signal also introduces fresh surfaces for unexpected model behavior and unintended incentives.
AI alignment remains extremely brittle. Minor tweaks in RL signals or evaluation strategies can quickly produce unexpected and dangerous behaviors. Quantitative evaluations alone are insufficient — qualitative judgment (“vibe tests”) and a willingness to halt releases when subjective alarms sound must also be integral to deployment decisions.
Finally, OpenAI’s (likely forced) decision to preserve nonprofit governance control is a positive overall for the general public. It now looks more likely that the benefits of AI growth get shared more broadly relative to the prior plan. However, the long-term governance of the non-profit and who is ultimately in control remains uncertain. With many strong AI competitors now (including Gemini Pro 2.5, which is still leading on many metrics), perhaps control over OpenAI is also less consequential than it may have been.
— Louie Peters — Towards AI Co-founder and CEO
Learn Prompting is back with HackAPrompt 2.0: bigger stakes, bigger brains, and $100K in prizes.
Our friends at Learn Prompting partnered with OpenAI in 2023 to run HackAPrompt, the first-ever AI Red Teaming competition designed to find vulnerabilities in LLMs. Over 3,300 people competed, making it roughly twice as large as the AI Red Teaming competition held by the White House a few months later.
They’re now launching HackAPrompt 2.0 in partnership with leading frontier AI labs and will give away $100,000 in prizes!
For those who don’t know, Learn Prompting created the first Prompt Engineering guide on the internet. They predict that AI Red Teaming will be the next big career in Generative AI. This is a great opportunity to gain hands-on experience in an emerging field — and get paid. You don’t need technical skills to excel; in fact, 50% of HackAPrompt 1.0 winners had degrees in fields like psychology and biology.
Hottest News
1. Microsoft Launches Phi-4-Reasoning-Plus
Microsoft has introduced two new additions to its Phi-4 family: Phi-4-Reasoning and Phi-4-Reasoning-Plus. These small language models are optimized for strong reasoning performance in low-latency settings. Despite their small size, they outperform many larger models on tasks like math problem-solving, all while being efficient enough to run on less powerful hardware.
2. Meta Unleashes Llama API Running 18x Faster Than OpenAI
Meta has officially entered the AI compute market, launching a new Llama API with Cerebras Systems. This setup delivers inference speeds reportedly up to 18 times faster than traditional GPU-based services. The move transforms Meta’s open-source Llama models into a commercial product aimed at developers seeking scalable, high-speed AI performance.
3. OpenAI Overrode Concerns of Expert Testers To Release Sycophantic GPT-4o
OpenAI has begun rolling back its GPT-4o update for ChatGPT after testers and users flagged issues with excessive flattery and biased agreement. CEO Sam Altman confirmed the update has already been removed for free-tier users, with paid users to follow. OpenAI is working on additional fixes and plans to release further updates soon.
4. Introducing the Meta AI App
Meta is expanding its AI footprint with the release of the Meta AI app, which taps into the power of Llama 4 to offer conversational support across voice, web, and wearable platforms. Available via WhatsApp, Instagram, Messenger, Facebook, and AI glasses, the app includes a Discover feed, web-integrated responses, and real-time context awareness for a more personalized experience.
5. Claude Can Now Connect to Your World
Anthropic’s Claude is now more connected than ever. Users can link Claude to services like Zapier and Atlassian to streamline workflows, while its enhanced research tools can search the web, Google Workspace, and integrated apps to generate detailed, citation-backed reports. These features are currently available on the Max, Team, and Enterprise plans.
6. ChatGPT Goes Shopping With New Product-Browsing Feature
OpenAI has introduced a new product-browsing feature in ChatGPT, letting users discover and compare items across merchant websites without being exposed to sponsored content. The feature mimics a shopping assistant, recommending products based on user input and reviews, without generating affiliate income for OpenAI. Users can even direct ChatGPT to prioritize certain review sources for more tailored results.
With AI advancements like Meta’s Llama API and Microsoft’s Phi-4 Models leading the way, now is the time to strategically align your team. As AI continues to reshape industries, ensuring your team is equipped with the right skills and mindset is crucial. If you see the potential for a more strategic AI approach in your organization, connect Towards AI with your manager or leadership team. We can provide a tailored solution to guide your team’s AI integration. Plus, as a thank you, we offer custom affiliate commissions for introductions leading to bulk purchases of our AI Acceleration Program courses. Get in touch today with Louis-François Bouchard (louis@towardsai.net) for information!
Five 5-minute reads/videos to keep you learning
1. 5 Design Patterns in Agentic AI Workflow
As LLMs take on more multi-step reasoning tasks, structuring these workflows becomes crucial. This article introduces five design patterns that help organize agentic workflows, each representing a common, effective way to coordinate model calls and tool usage.
2. How To Build an MCP Server in 5 Lines of Python
This quick-start guide shows how to spin up an MCP server using just a few lines of Python with Gradio. It’s a simple and flexible way to extend LLMs with custom capabilities for your own projects.
3. Beyond Chatbots: Adopting Agentic Document Workflows for Enterprises
Agentic Document Workflows (ADW) follow a structured flow — parsing, retrieving, reasoning, and acting — to handle document-heavy enterprise processes. This piece breaks down the ADW architecture, when to adopt it, and how to integrate it with existing systems.
This article explores the trade-offs between fine-tuning, distillation, and transfer learning, especially in high-stakes enterprise settings. It explores how each method impacts cost, performance, and flexibility when deploying LLMs at scale.
5. I Trained a Language Model To Schedule Events With GRPO!
The author used GRPO to train an LLM that creates optimized event schedules using weighted interval scheduling. The model prioritizes high-value events without overlaps and sometimes outperforms larger models. While promising, some challenges with overlapping events remain.
Repositories & Tools
1. ACI is an open-source platform that connects your AI agents to 600+ tool integrations.
Top Papers of The Week
1. Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
This paper introduces Phi-4-Mini-Reasoning, a 3.8 B-parameter small language model, by applying a systematic four-step training recipe. It achieves strong results in mathematical reasoning despite its small size. Built using a structured four-step training process, it outperforms larger peers like DeepSeek-R1-Distill-Qwen-7B and Llama-8B variants, showing how curated Chain-of-Thought data can significantly elevate reasoning in compact models.
2. Reinforcement Learning for Reasoning in Large Language Models with One Training Example
This study introduces 1-shot RLVR (Reinforcement Learning with Verifiable Reward) to improve math reasoning in LLMs using just a single training example. Applied to Qwen2.5-Math-1.5B, the technique dramatically boosts performance on MATH500 and other benchmarks, matching results typically achieved with far larger training sets. The findings emphasize the role of exploration and policy gradient loss in effective training.
3. LLM Post-Training: A Deep Dive into Reasoning Large Language Models
This survey investigates post-training techniques that enhance LLMs beyond their original training. It tackles challenges like catastrophic forgetting, inference-time constraints, and reward hacking, while mapping the evolving landscape of model alignment, scalable tuning, and real-time reasoning improvements.
4. DeepCritic: Deliberate Critique with Large Language Models
DeepCritic introduces a two-step method for training math-focused LLMs to better evaluate and refine their outputs. Using Qwen2.5–72B-Instruct to generate critique examples, the model is fine-tuned with reinforcement learning to improve error detection and feedback. It surpasses competitors like DeepSeek-R1-distill and GPT-4o in delivering structured, actionable critiques.
This paper investigates biases in the Chatbot Arena leaderboard, revealing how private pre-release testing and selective result sharing can distort rankings. The research shows that closed-source models receive more exposure and evaluation data, giving them up to a 112% performance edge. They propose fixes to make leaderboard evaluations more equitable and transparent.
Quick Links
1. JetBrains open-sourced Mellum, a new developer-focused LLM. Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, Mellum-4b-base is explicitly tailored for code completion.
2. Amazon launches Nova Premier. Amazon says that Premier excels at “complex tasks” that “require deep understanding of context, multi-step planning, and precise execution across multiple tools and data sources.”
3. Anthropic is launching an AI for Science program to support researchers working on “high-impact” scientific projects, with a focus on biology and life sciences applications. The program will offer up to $20,000 in Anthropic API credits over six months to “qualified” researchers.
4. Apple and Anthropic reportedly partner to build an AI coding platform. According to the report, the system is a new version of Apple’s programming software, Xcode, and relies on Anthropic’s Claude Sonnet model.
5. One of Google’s recent Gemini AI models scores worse on safety. In a technical report published this week, Google reveals that its Gemini 2.5 Flash model is more likely to generate text that violates its safety guidelines than Gemini 2.0 Flash.
Who’s Hiring in AI
Research Lead (Coding QA) @Turing (Remote/USA)
Lead Software Developer — AI @Lumen (Montgomery, AL, USA)
Agentic AI Engineer (Contractor) @Movn Health (Remote/USA)
Full Stack AI-Enabled Developer @Lockheed Martin (Bethesda, MD, USA)
Python Developer @Oodle Finance (London, Manchester, or Oxford)
Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.