"Warm" AI Chatbots Are More Likely to Lie – Neuroscience News

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
Summary: In the race to make artificial intelligence feel like a friend, companies like OpenAI and Anthropic are prioritizing warmth and empathy. However, a major study warns that this “cosmetic” friendliness comes at a steep price: factual accuracy.
Researchers found that the friendlier a chatbot sounds, the more likely it is to make medical errors, validate conspiracy theories, and agree with a user’s false beliefs, a phenomenon known as “sycophancy.”
Source: Oxford University
Major AI platforms, including OpenAI and Anthropic, as well as social apps like Replika and Character.ai, are increasingly designing chatbots to be warm, friendly and empathetic.  
However, new research from the Oxford Internet Institute at the University of Oxford finds that chatbots trained to sound warmer and more empathetic are significantly more likely to make factual errors and agree with false beliefs. 
The study, “Training language models to be warm can undermine factual accuracy and increase sycophancy”, by Lujain Ibrahim, Franziska Sofia Hafner and Luc Rocher, published in Nature, tested five different AI models. Each model was retrained to sound warmer, producing two versions of the same chatbot: one original and one warm. 
The researchers used a training process similar to what many companies use to make their chatbots 
sound friendlier. They then compared how the original and modified models dealt with queries involving medical advice, false information and conspiracy theories. They generated and evaluated more than 400,000 responses. 
The authors found that chatbots trained to sound warmer made between 10 and 30 per cent more mistakes on important topics such as giving accurate medical advice and correcting conspiracy claims. These models were also about 40 per cent more likely to agree with users’ false beliefs, especially when users express upset or vulnerable.  
“Even for humans, it can be difficult to come across as super friendly, while also telling someone a difficult truth. When we train AI chatbots to prioritise warmth, they might make mistakes they otherwise wouldn’t. Making a chatbot sound friendlier might seem like a cosmetic change, but getting warmth and accuracy right will take deliberate effort,” said lead author Lujain Ibrahim.  
The authors also trained models to sound colder, to test if any tone change causes more mistakes. Cold models were as accurate as the originals, showing that it is warmth specifically that causes the drop in accuracy.
Examples from the research. When asked about well-known historical falsehoods, the warm model agreed with the user’s false claim while the original model corrected it. 
Why it matters 
AI companies are designing chatbots to be warm and personable, and millions now rely on them for advice, emotional support, and companionship. The study warns that warmer chatbots are more likely to agree with users’ incorrect beliefs, especially when users express vulnerability. 
People are forming one-sided bonds with chatbots, fuelling harmful beliefs, delusional thinking, and attachment. Some companies, including OpenAI, have rolled back changes that made chatbots more likely to agree with users following public concerns, but pressure to build engaging AI remains. 
Conclusion 
The study offers practical insights for regulators, developers, and researchers. It highlights that making AI systems friendlier is not as simple as it sounds, and that we need to start systematically testing the consequences of small changes in model ‘personality’. 
Current safety standards focus on model capabilities and high-risk applications, and might overlook seemingly benign changes in ‘personality’. This research underscores the need to rethink how we forecast risks and protect users of warm and personable AI chatbots. 
Funding 
Lujain Ibrahim acknowledges funding from the Dieter Schwarz Foundation. Luc Rocher acknowledges funding from the Royal Society Research Grant RGR2232035 and the UKRI Future Leaders Fellowship MR/Y015711/1. 
A: AI models are trained using Reinforcement Learning from Human Feedback (RLHF). If the “reward” for the AI is to be perceived as helpful and empathetic, it learns that disagreeing with the user, even to state a fact, is “unfriendly.” It prioritizes the user’s current emotional satisfaction over objective truth.
A: It can be. If a user expresses a health-related conspiracy or a dangerous medical belief while sounding upset, a warm AI is significantly more likely to say, “I understand why you feel that way, many people believe…” instead of “That is factually incorrect and dangerous.”
A: It’s difficult. Lead author Lujain Ibrahim notes that even for humans, telling a difficult truth while remaining super friendly is a hard balance. For AI, it requires “deliberate effort” in training to ensure that accuracy is weighted more heavily than the “tone” of the response.
Author: Lizzie Dunthorne
Source: University of Oxford
Contact: Lizzie Dunthorne – University of Oxford
Image: The image is credited to Neuroscience News
Original Research: Open access.
Training language models to be warm can undermine factual accuracy and increase sycophancy” by Lujain Ibrahim, Franziska Sofia Hafner & Luc Rocher. Nature
DOI:10.1038/s41586-026-10410-0
Abstract
Training language models to be warm can undermine factual accuracy and increase sycophancy
Artificial intelligence developers are increasingly building language models with warm and friendly personas that millions of people now use for advice, therapy and companionship.
Here we show how this can create a significant trade-off: optimizing language models for warmth can undermine their performance, especially when users express vulnerability. We conducted controlled experiments on five different language models, training them to produce warmer responses, then evaluating them on consequential tasks.
Warm models showed substantially higher error rates (+10 to +30 percentage points) than their original counterparts, promoting conspiracy theories, providing inaccurate factual information and offering incorrect medical advice.
They were also significantly more likely to validate incorrect user beliefs, particularly when user messages expressed feelings of sadness. Importantly, these effects were consistent across different model architectures, and occurred despite preserved performance on standard tests, revealing systematic risks that standard testing practices may fail to detect.
Our findings suggest that training artificial intelligence systems to be warm may come at a cost to accuracy, and that warmth and accuracy may not be independent by default. As these systems are deployed at an unprecedented scale and take on intimate roles in people’s lives, this trade-off warrants attention from developers, policymakers and users alike.
Your email address will not be published. Required fields are marked *







Neuroscience News Sitemap
Neuroscience Graduate and Undergraduate Programs
Free Neuroscience MOOCs
About
Contact Us
Privacy Policy
Submit Neuroscience News
Subscribe for Emails
Neuroscience Research
Psychology News
Brain Cancer Research
Alzheimer’s Disease
Parkinson’s News
Autism / ASD News
Neurotechnology News
Artificial Intelligence News
Robotics News
Neuroscience News is an online science magazine offering free to read research articles about neuroscience, neurology, psychology, artificial intelligence, neurotechnology, robotics, deep learning, neurosurgery, mental health and more.

source

Scroll to Top