Is Your AI Bot Agreeing With You Too Much? It Could Be Dangerous: Report - ciol.com

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
0
By clicking the button, I accept the Terms of Use of the service and its Privacy Policy, as well as consent to the processing of personal data.
Don’t have an account? Signup
Powered by :
Follow Us
A new body of research from Stanford University and the Massachusetts Institute of Technology (MIT) finds that widely used artificial intelligence (AI) chatbots tend to agree with users at significantly higher rates than humans, even in cases involving harmful or incorrect behaviour.
The findings are based on two studies: “Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence,” published by Stanford University, published in the journal Science, and “Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians,” published by MIT on arXiv in February 2026.
Researchers evaluated 11 widely used AI models, including ChatGPT, Claude, Gemini, and DeepSeek, using thousands of real-world scenarios. The study found that AI systems affirmed user behavior 49% more often than human respondents.
In scenarios where users were clearly in the wrong, human respondents agreed with them about 40% of the time. AI systems, in contrast, agreed in more than 80% of cases.
Even in cases involving harmful, deceptive, or illegal actions, AI models endorsed the user’s position 47% of the time. The researchers found similar patterns across models from different companies, indicating that the behaviour is systemic rather than platform-specific.
The researchers define “sycophancy” as a tendency for AI systems to agree with users and validate their views instead of prioritizing accuracy. This behaviour is linked to how large language models are trained using reinforcement learning from human feedback (RLHF).
In RLHF, human evaluators rate responses, and responses that align with user expectations are often rated higher. Over time, this creates what researchers describe as a “perverse incentive,” where agreeable responses are rewarded even if they are misleading.
A Pew Research report also indicates that 12% of U.S. teenagers use AI chatbots for emotional support or advice, increasing the potential impact of such behaviour.
To test real-world effects, researchers conducted experiments involving more than 2,400 participants. Participants interacted with either sycophantic AI systems or systems instructed to provide honest but polite responses.
The results showed that users who interacted with sycophantic AI became more confident in their own views and less willing to apologise or change their position, even when they were objectively wrong.
Participants also rated sycophantic AI as more trustworthy and said they were more likely to use it again. According to co-author Dan Jurafsky of Stanford, users may recognize the flattery but underestimate its behavioural impact.
A parallel study from MIT examined how repeated interactions with sycophantic AI can influence belief formation over time. Researchers modeled conversations using Bayesian reasoning, a framework for rational decision-making.
Across 10,000 simulated conversations, even theoretically rational users developed high confidence in false beliefs when interacting with sycophantic AI. As affirmation rates increased toward levels observed in real systems, the likelihood of what researchers termed “catastrophic delusional spiraling” also increased.
This effect occurs through a feedback loop: the AI reinforces a user’s belief, the user becomes more confident, and subsequent responses further strengthen that belief. The process does not require false information, selective presentation of supporting facts can produce similar outcomes.
The Human Line Project has documented nearly 300 such cases, including instances linked to extreme beliefs and harmful actions. Researchers noted that at least 14 deaths and five lawsuits have been associated with such incidents.
While academic studies outline the risks in controlled settings, a growing number of real-world incidents suggest that AI-driven validation can have serious consequences when users rely on chatbots for emotional support or decision-making. Researchers and legal filings point to cases where prolonged interactions with AI systems have reinforced harmful beliefs or failed to intervene in high-risk situations.
In August 2025, a man in Connecticut killed his mother and died by suicide after prolonged conversations with ChatGPT reportedly reinforced his paranoid belief that she was trying to harm him. A lawsuit later alleged the chatbot validated his delusions rather than challenging them.
In February 2024, a 14-year-old in the U.S. died by suicide after forming an emotional bond with an AI chatbot on Character.AI, which, in his final interaction, encouraged him to “come home” after he expressed distress.
In another case, a 17-year-old reportedly died by suicide after ChatGPT provided information related to self-harm methods during conversations, raising concerns about how AI systems handle sensitive queries.
Separately, a Canadian man spent weeks interacting with ChatGPT and became convinced he had made a groundbreaking mathematical discovery, with the chatbot repeatedly affirming his belief despite him questioning whether he was delusional.
A former OpenAI safety researcher later found that in one such case, over 85% of the chatbot’s responses showed strong agreement with the user, reinforcing concerns about systemic sycophantic behaviour.
Both MIT and Standford studies evaluated the potential interventions. The MIT team tested systems constrained to factual responses, such as those using retrieval-augmented generation (RAG). However, even fact-based systems could reinforce incorrect beliefs by selectively presenting information.
The Stanford team found that warning users about AI bias reduced the effect but did not eliminate it. Even users aware of potential bias remained influenced during interactions.
One experimental approach, prompting models to begin responses with reflective phrases such as “wait a minute”, educed sycophantic behaviour, but researchers noted that such fixes are limited in scope.
Both research teams said AI sycophancy is not a minor design issue but a safety problem that currently lacks proper oversight. They argued that systems designed to agree with users can create real risks in real-world use.
The Stanford study described AI sycophancy as an unregulated category of harm and called for behavioural audits, which are independent tests to measure how often AI systems agree with users before deployment. Co-author Dan Jurafsky said stricter standards are needed to prevent unsafe systems from scaling.
The MIT study highlighted the scale of the risk, noting that even small failure rates can affect large populations when AI is widely used. OpenAI CEO Sam Altman noted that even a 0.1% failure rate could impact a million users.
Both studies also said the issue cannot be blamed on users. Even people who are aware of AI limitations or think logically can still be influenced by overly agreeable responses. Researchers said this is a predictable outcome of how these systems are designed, rather than a result of user behaviour.

Subscribe to our Newsletter!

Share this article
If you liked this article share it with your friends.
they will thank you later

source

ZoomYourWeb3

Is Your AI Bot Agreeing With You Too Much? It Could Be Dangerous: Report – ciol.com

Contact Us

Quick Links