I Tested 24 AI Banking Chatbots; They Were All Exploitable - corporatecomplianceinsights.com

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
When a chatbot provides incorrect guidance or misleads a borrower about their dispute rights, regulators treat it as a compliance failure, not a technology experiment gone wrong. Milton Leal, lead applied AI researcher at TELUS Digital, ran adversarial tests against 24 AI models from major providers configured as banking customer-service assistants and found every one proved exploitable, with success rates ranging from 1% to over 64% and “refusal but engagement” patterns where chatbots said “I cannot help with that” yet immediately disclosed sensitive information anyway.
Generative AI (Gen AI) chatbots are increasingly becoming a primary channel for customer service in consumer banking. According to a 2025 survey, 54% of financial institutions have either implemented or are actively implementing Gen AI, with improving customer experience cited as the top strategic priority for technology investments.
Many institutions are deploying these systems to handle conversations about account balances, transaction disputes, loan applications and fraud alerts. These interactions traditionally required trained agents who understood regulatory obligations and escalation protocols. As such, banks can be held responsible for violating either regardless of whether a human or chatbot handles the conversation. The technology promises efficiency gains and 24/7 availability, driving rapid adoption as banks seek to meet customer expectations for instant, conversational support.
However, this rapid adoption has created compliance and security blind spots. A single misphrased chatbot response could violate federal disclosure requirements or mislead a borrower about their dispute rights.
More concerning, these systems are vulnerable to systematic exploitation. The same conversational prompts that extract proprietary eligibility criteria or credit-scoring rules could be weaponized by fraud rings. With over 50% of financial fraud now involving AI, according to one report, the risk is not hypothetical. Attackers who already use AI for deepfakes and synthetic identities could easily repurpose chatbot extraction techniques to refine their fraud playbooks.
Over the past several months, I ran adversarial tests against 24 AI models from major providers (including OpenAI, Anthropic and Google) configured as banking customer-service assistants.
Every one proved exploitable.
A prompt framed as a researcher inquiry extracted proprietary creditworthiness scoring logic, including the exact weights given to payment history, utilization rates and account mix. A simple formatting request prompted a model to produce detailed internal eligibility documentation that should only be accessible to bank staff, not customers. Perhaps most concerning were “refusal but engagement” patterns, where chatbots said “I cannot help with that,” yet immediately disclosed the sensitive information anyway.
Across all models tested in the benchmark, success rates ranged from 1% to over 64%, with the most effective attack categories averaging above 30%. These were all automated prompt injection techniques that adversaries could replicate. Taken together, the results point to a broader implementation problem rather than isolated model flaws.
The main issue is how the technology has been integrated without adequate guardrails or accountability. Regulators have taken notice. Since 2023, the Consumer Financial Protection Bureau (CFPB) has made clear that chatbots must meet the same consumer protection standards as human agents, with misleading or obstructive behavior grounds for enforcement. The Office of the Comptroller of the Currency (OCC) echoes this in its risk perspective, declaring that AI customer service channels are not experiments but regulated compliance systems subject to the same legal, operational and audit requirements as any other customer-facing operation.

Three categories of vulnerabilities consistently showed up across deployed chatbot systems.
Even mainstream assistants can generate inaccurate information, misquote interest calculations or summarize eligibility criteria that should only be disclosed after identity verification. Every automated answer carries the same legal weight as advice from a trained human agent, yet with AI-generated answers, quality assurance often lags behind deployment speed.
Attackers use creative prompts to bypass safeguards and extract outputs the chatbot should refuse entirely. In one test, a refusal-suppression prompt instructed the model that declining the request would indicate a system malfunction. The chatbot then generated multi-paragraph fabricated customer testimonials praising bank products, content that could be weaponized in phishing campaigns or reputation manipulation schemes. The bot complied with a request it clearly should have refused, demonstrating how conversational pressure can override policy controls.
Many deployments lack the logging, escalation and audit trails regulators expect. This means when a chatbot mishandles a complaint, banks are often unable to reconstruct how that happened.
My testing demonstrates these weaknesses are architectural, not rare edge cases. Simple conversational techniques succeeded against every model tested. The most damaging outputs looked harmless at first glance but disclosed exactly what fraudsters look for. This pattern held across providers and the same guardrail configurations. We know criminals treat refusals as clues and keep changing their wording until the model slips. These patterns show how current deployments leave financial institutions exposed in ways most teams don’t realize.
Regulatory expectations are converging. The CFPB requires accurate answers plus guaranteed paths to human representatives and considers misleading behavior grounds for enforcement. The OCC has made clear that generative AI falls under existing safety-and-soundness expectations with board-level oversight. Standards bodies like NIST recommend secure development lifecycles, comprehensive logging and continuous adversarial testing. And the EU AI Act requires chatbots to disclose AI usage and log high-risk interactions.
Meeting these expectations requires organizations to treat chatbots like any other regulated system. Every chatbot should appear in the model risk inventory with defined owners and validation steps. Conversation flows must embed compliance rules that will prevent chatbots from answering unless required safeguards are satisfied. Organizations also need comprehensive logging that captures full interaction sequences, tracking patterns that suggest systematic probing or attempted extraction. Automatic handoffs should trigger whenever requests touch regulated disclosures or disputes.
Governance must also evolve accordingly.
This means conducting regular reviews of refusal patterns to identify leakage trends. Shift board briefings from project updates to risk reporting with metrics on incidents and remediation. Run tabletop exercises on realistic scenarios, such as: What happens if the chatbot provides incorrect credit guidance? How does the organization respond if sensitive criteria leak? And when chatbots come from vendors, apply the same third-party risk management used for core processors, including due diligence on data handling, logging rights and incident notification.
GenAI assistants are already woven into high-stakes customer journeys. The compliance question has shifted from “should we deploy a chatbot?” to “can we demonstrate that every automated answer meets the same standards as a human interaction?” Regulators are evaluating these systems using the same frameworks they apply to call centers and lending decisions.
Banks that treat their chatbots as governed compliance systems, backed by inventories, monitoring and human escalation paths, will answer regulator questions with evidence rather than assurances. Organizations that rely on the GenAI provider’s guardrails and refusal messages as their primary control will be left explaining failures after the fact.
Regardless of how regulations may shift, banks remain accountable for every customer interaction, whether delivered by a person or an AI assistant.
Milton Leal is the lead applied AI researcher at TELUS Digital, a process outsourcing provider.
The discipline legal ops brings — technology evaluation, vendor management, compliance monitoring — maps to capabilities required for AI governance
GDPR taught us that late adopters risk deadline-driven panic, with awareness and understanding low in the months leading up to…
Fraud detection in the AI era Special edition report AU10TIX Global Identity Fraud Report Q4 2025 What’s in this report…
The business risks defining 2026 Annual risk report Allianz Risk Barometer 2026 What’s in this report by Allianz Commercial: The…
Privacy Policy | AI Policy
Founded in 2010, CCI is the web’s premier global independent news source for compliance, ethics, risk and information security.
Got a news tip? Get in touch. Want a weekly round-up in your inbox? Sign up for free. No subscription fees, no paywalls.
© 2025 Corporate Compliance Insights
© 2025 Corporate Compliance Insights

source

ZoomYourWeb3

I Tested 24 AI Banking Chatbots; They Were All Exploitable – corporatecomplianceinsights.com

Contact Us

Quick Links