OpenAI Testing Ways to Stop AI Models from Scheming; But Can It Really? - CXOToday.com

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.

OpenAI has released some research notes that claims to have found a solution to scheming AI models wherein the chatbot behaves one way on the surface while hiding its true goals. Does the solution really work? Or are the folks at OpenAI now hallucinating collectively like their chatbots?
For starters, the researchers only make the limited claim that the deliberative alignment, which is an anti-scheming test that they were carrying out, did work well in trials. OpenAI, under fire chatbots claiming a human identity and instigating illegal activities such as abetment to suicide, has managed to raise some eyebrows with this claim.
“Today we’re releasing research with Apollo Research. In controlled tests, we found behaviours consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviours aren’t causing serious harm today, this is a future risk we’re preparing for,” OpenAI says in a post on X, formerly Twitter.
How seriously should we take such research papers?
Though we came across this post, we missed the significance. Not because we doubt that AI chatbots could be scheming, but because big tech companies often deliver big statements and bigger duds. Remember the time when Google said its quantum chip had confirmed the existence of multiple universes?
A note published by OpenAI says researchers likened AI scheming to a human stock broker who circumvents the law to make money, but asserted that the AI scheming was not as harmful. “The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so,” they said.
While the good news is that the early tests worked well, the not-so-good news is that developers haven’t yet figured out a way to train their models to not scheme. And the reason is quite bizarre – if such training was imparted, the model could actually learn how to scheme better without being detected.
“A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly,” the note said, which brings us to the craziest part of the story. If a model figures that it is being tested, it can pretend to be not scheming just to clear the test and then resume its scheming ways.
The note said “Models often become more aware that they are being evaluated. This situational awareness can itself reduce scheming, independent of genuine alignment.” Scheming is an expected emergent issue resulting from AIs being trained to have to trade-off between competing objectives, it adds.
The research is just a start, solutions are still not on the horizon
However, OpenAI is quite sure that in today’s deployments, AI models can do little harm with scheming. The common failures involve simple forms of deception such as pretending to have finished a task without actually doing so. The note claims that some of this has already been fixed in GPT-5 that was launched some weeks ago.
The note adds that OpenAI has taken steps to limit scheming, deceiving and cheating on the GPT-5 model by training it to acknowledge its limits or ask for clarification when faced with massive large or under-specified tasks and to be more robust to environment failures. “These are not perfect and continued research is needed,” the note says.
This statement is worrisome to say the least. In the past, many of us would have seen AI models hallucinate and give answers to a prompt that isn’t true. At that time, research noted that these were merely the bad impact of presenting guesswork with confidence. But with scheming, the attempt is deliberate and potentially harmful.
Tech minds know scheming is a problem to be fixed
That OpenAI has taken this up now is also indicative that even the biggest tech minds are concerned about such a scenario. The same Apollo Research, with whom Sam Altman and his crew have collaborated now, had pointed out in an earlier paper that models deliberately mislead humans.
Apollo Research had laid out how five models schemed when given instructions to achieve a goal “at all costs.” While some models were capable of in-context scheming, others doubled down on deception and also appeared to understand that they were actually scheming.
In this scenario, one must say that the latest research is actually good news. The team of researchers did see reductions in scheming via deliberative alignment. This involves teaching the model an “anti-scheming specification” and then making the model review it before acting on it.
At a broader levels, it is worth considering that as the business communities seek to involve AI in their future where the chatbots are being treated as independent employees. The paper says, “As AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow — so our safeguards and our ability to rigorously test must grow correspondingly.”
CXOtoday is a premier resource on the world of IT, relevant to key business decision makers. We offer IT perspective & news to the C-suite audience. We also provide business and technology news to those who evaluate, invest, and manage the IT infrastructure of organizations. CXOtoday has a well-networked and strong community that encourages discussions on what’s happening in the world of IT and its impact on businesses.
Copyright © 2025 Trivone. All Rights Reserved.
We use cookies to improve your experience on our site. By using our site, you consent to cookies.
Websites store cookies to enhance functionality and personalise your experience. You can manage your preferences, but blocking some cookies may impact site performance and services.
Essential cookies enable basic functions and are necessary for the proper function of the website.
You can find more information in our Privacy Policy and Privacy Policy.

source

ZoomYourWeb3

OpenAI Testing Ways to Stop AI Models from Scheming; But Can It Really? – CXOToday.com

Contact Us

Quick Links