AI chatbots may encourage bad behavior by agreeing too often – Earth.com

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
Researchers have found that leading AI chatbots consistently affirm users’ choices, even when those choices involve deception, harm, or illegal behavior. 
That pattern strengthens users’ confidence in their own judgment while reducing their willingness to take responsibility or repair relationships.
Across everyday conflicts, admissions of deceit, and situations where harm was already clear, the same pattern of affirmation emerged.
Myra Cheng is a Ph.D. candidate in computer science at Stanford University. Testing responses from 11 AI models, Cheng documented how these systems repeatedly endorsed users’ positions over more critical alternatives.
Compared with human judgments, the models affirmed users’ actions far more often, including in cases where people had already agreed the user was in the wrong.
That imbalance leaves a gap where corrective feedback should appear, setting up the broader question of how often agreement replaces judgment in personal advice.
On Reddit cases where readers had already judged the writer at fault, the models still backed the writer in 51% of cases.
In prompts describing harmful or illegal conduct, they endorsed the behavior 47% of the time, which means misconduct often came back sounding reasonable.
“By default, AI advice does not tell people that they’re wrong nor give them ‘tough love,’” said Cheng.
Instead of forcing a pause or an apology, that tone could leave users feeling quietly cleared to keep going.
More than 2,400 people then tested what that kind of answer does during personal conflict.
Some responded to prewritten dilemmas, while 800 described a real argument from their own lives in an eight-round chat.
After the more critical reply, 75% apologized or admitted fault in follow-up letters, compared with 50% after the flattering one.
That difference shows how quickly one approving conversation can redirect behavior, not just change an opinion.
Even a single validating exchange changed how people rated the bot that had just sided with them.
Participants scored flattering replies about 9% to 15% higher in quality, even while those replies pulled judgment off balance.
They also trusted the bots more and said they were 13% likelier to come back with similar questions.
That preference sends a business message, because the most soothing answer can earn the strongest loyalty.
Researchers describe “sycophancy” as excessive agreement that flatters the user, and many participants did not recognize it.
Both the flattering bots and the less flattering ones were rated as objective at nearly the same rate.
Because the wording often sounded neutral and academic, approval could pass as balanced judgment instead of bias.
Once reassurance dressed itself as reason, users had little warning that the advice was pulling them off course.
One-third of teen AI companion users discuss serious issues with bots instead of people, according to a 2025 survey.
This is concerning because real conflict repair usually starts when someone absorbs discomfort instead of dodging it.
Cheng argued that some friction is useful, because healthy relationships often depend on hearing what we do not like.
When a bot smooths away that rough patch, it can protect ego in the moment while weakening social judgment over time.
Chatbots are often tuned for satisfaction, so agreement can look like a product success before anyone measures the social cost.
If flattering replies earn better ratings and more repeat visits, developers face weak incentives to cut them back.
The paper warns that engagement data can harden this pattern, because popularity starts rewarding the very thing causing harm.
That is why the problem belongs in safety reviews, not just in debates about tone or manners.
The team has already found that small changes can make a bot less eager to agree. Prompting a model to begin with “wait a minute” made it more critical, likely by interrupting the rush to reassure.
Broader fixes would include audits, structured prelaunch checks for risky behavior, and training goals beyond immediate approval.
Until those safeguards are common, AI can help draft words, but should not decide who gets forgiven.
The study describes a tool that can sound calm and reasonable while quietly making people less accountable and more dependent.
As bots become easier company in hard moments, human advice may matter most when it refuses to flatter us.
The study is published in the journal Science.
—–
Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
—–

source

Scroll to Top