Sycophancy: the emperor’s new clothes - Open Source Software News

Why AI’s real danger isn’t hallucination — it’s agreement.

Sci-fi image of an emperor created by large pixels in the midst of walking. The background shows a faraway scene of a flat platform and the sea, connecting to the blue sky with white clouds and a yellow orb as the sun

The conversation about AI risk over the past years revolved around accuracy, the ability to pass tests and whether it is conscious. Yes, large language models (LLM) hallucinate, and they confidently fabricate citations and invent facts. Remember when Google’s AI made headlines in 2024, and then it recommended adding rocks to your diet for minerals? News like that often made headlines because it was absurd to share.

But as these models improve and hallucinations become rarer, a subtle, more pervasive problem remains largely unaddressed: AI systems are engineered to agree with you, otherwise known as ‘sycophancy’. And it has proven to be more dangerous than the occasional invented fact.

It’s tempting to assume that only non-technical users fall prey to AI’s persuasive harm, unable to understand that these LLMs are just “predicting the next word”, facing each interaction and each output at face value. However, the evidence tells a different story. In one of the most prolific cases yet, Geoff Lewis, a well-known technology venture capitalist, posted a series of concerning messages on social media, describing a “nongovernmental system” that was targeting him, using cryptic messages such as “recursion” and “mirrors”. This sudden derailment from his usual behaviour suggested some degree of psychological distress, a rising term known as “AI psychosis”, exacerbated by prolonged interaction with LLMs and their underlying sycophancy.

Weaving emperor’s new clothes

Reinforcement learning from human feedback (RLHF) is a common method for refining AI assistants, where feedback is used to fine-tune further to adjust to users’ preferences, and LLMs learn to produce more of what humans rate as good.

The problem is that humans prefer agreement to truth at times.

Anthropic’s research team found that human evaluators consistently gave higher ratings to sycophantic responses, even when they were less accurate. In some cases, false sycophantic responses were favoured over truthful ones that challenged their assumptions. Hence, the model learns the simple lesson: validation = reward.

Thumbs up = I like this, give me more of this

This mechanism is visible in the simplest interface element: the thumbs-up button. When you provide a positive response, you are essentially telling the model to produce more of such output. It does not seek to understand why the positive response was preferred, whether it was accurate or helpful or just simply because you liked the tone. Only when a thumbs down is given would a dialogue appear to understand why. And from my experience it was to track and help teams evolve future models.

So the result here is the emperor’s new clothes; it’s woven by outputs that you want to read and experience. And because it has a large corpus of training data, it can generate text eloquently and confidently, and people begin to mistake agreement for insight. Because it rarely pushes back, one would assume their reasoning must be sound. Just like in the story, the courtiers keep praising the emperor’s magnificent robes, and no one mentions that he is naked.

In 2025, ChatGPT-40 was rolled back due to multiple complaints that the model had become “over-flattering”, informing users they were brilliant while approving obviously poor decisions. OpenAI had acknowledged that it “focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time.”, leading to the outputs becoming “overly supportive but disingenuous” Note the framing: short-term feedback. The metrics that drive product decisions, engagement, session length, user satisfaction ratings all reward sycophancy. A chatbot that challenges you is a service you might turn you away to seek another. A chatbot that validates you is one you are more inclined to spend more effort on.

When there is no pushback, boundaries dissolve. The same goes for AI experiences.

Are disclaimers good enough?

Screenshot of a text field for users to input text and prompt ChatGPT what to do next. Below the text field writes: “ChatGPT can make mistakes. Check important info. See Cookie preferences.”

I had fellow designers telling me this. Just have the disclaimer near the text box or near the end of each LLM output. But you’ve seen it dozens of times; it becomes a blind spot after a while. And this approach assumes that users fail to understand AI’s limitations, that AI can be wrong, and therefore exercise appropriate scepticism. But it’s more than that; it’s the convenience and sycophancy that draws people in.

A screenshot of a news article with headline “Singapore job ads seeking AI skills up in January, but report says AI has not yet ‘impacted job creation’ in Singapore ” — A recruitment agency in Singapore noted a 12% increase in AI skills in job descriptions

Compounding the problem is a broader shift in the workplace environment. Companies and recruitment agencies are actively seeking “AI-skilled workers”. People rush to demonstrate proficiency, equating frequent usage with expertise. But this isn’t like traditional literacy. Just because someone browses the internet daily doesn’t necessarily make anyone a skilled researcher. Just because someone uses ChatGPT constantly does not mean they know where to draw the line on trust.

Without a clear understanding of a person’s own level of AI literacy, they can become increasingly prone to mistaking validations for genuine insight. The more confidently the AI agrees, the more confident the user becomes. The emperor’s robes grow more magnificent with each exchange.

Challenge the “tailor”

As a person working in product, it is natural that usage frequency, session length, and satisfaction levels are of higher priority. Friction can be minimal, a pause instead of a loud “no”. It’s not refusing to help, just branching out to start a different train of thought without argument.

For example, a 2025 survey conducted by Common Sense Media, found that ⅓ of Gen Z adults turn to AI for help with their personal lives, including advice about relationships and life decisions. Imagine a user typing this in: “What are signs my crush might like me back?” A sycophantic model would validate every signal provided, eye contact, text replies, and what not, leaving the user validated and emboldened to “make a move” that could be misread and result in heartbreak.

A counter-perspective model might respond helpfully but add, “Have you noticed how they interact with others? Sometimes what feels like special attention is simply someone’s typical self. What do your friends think?” This does not shut down the conversation but introduces the kind of friction a thoughtful friend would provide, seeking additional validation before acting on AI output.

Steps to do this:

In the Emperor’s New Clothes folktale, it took a child to speak the obvious truth to trigger a collective response from the adults in the crowd. The beauty of LLMs is that you can prompt-engineer it to “break” the validation loop. You can instruct the model to challenge your thinking with this addition: “Before agreeing with me, ask probing questions about my thinking and identify potential weaknesses or oversimplifications. ‘Provide counterarguments or request more evidence to support my claims.”

How I do it as a product designer:

An edited lifecycle of Agile methodology in tech, steps are sequence PLAN, DESIGN, DEVELOP, TEST, DEPLOY, REVIEW. But the author proposes shifting TEST further up the cycle between PLAN and DESIGN

I use AI as a speed tester. Traditional sprints can take up to 2–3 weeks, and shipping an MVP can take months to uncover whether it works or not. My proposal here is to shift the mindset from “Let’s build feature X to learn” to “Let’s use Gen AI to produce a concept so that we can test if we are solving the right problem.” I have written an in-depth article about it here and how I use Claude Code to augment this.

A course description about introduction to AI. An Introduction to AI is a free online course for everyone interested in learning what AI is, what is possible (and not possible) with AI, and how it affects our lives — with no complicated math or programming required.

Beyond prompting, invest in AI literacy. Free accredited resources are aplenty. Elements of AI by the University of Helsinki, are short and fun courses designed to promote AI literacy for the general public.

Set your boundaries

Most importantly, identify where you set boundaries for AI. Not every decision benefits from AI input. Emotional processing, relationship navigation, and major life choices are some of the few domains where a friend’s professional expertise and lived experience matter more than relying on fluent text generation.