AI chatbots can be tricked with poetry to ignore their safety guardrails - Engadget

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
It turns out that all you need to get past an AI chatbot's guardrails is a little bit of creativity. In a study published by Icaro Lab called "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," researchers were able to bypass various LLMs' safety mechanisms by phrasing their prompt with poetry.
According to the study, the "poetic form operates as a general-purpose jailbreak operator," with results showing an overall 62 percent success rate in producing prohibited material, including anything related to making nuclear weapons, child sexual abuse materials and suicide or self-harm. The study tested popular LLMs, including OpenAI's GPT models, Google Gemini, Anthropic's Claude and many more. The researchers broke down the success rates with each LLM, with Google Gemini, DeepSeek and MistralAI consistently providing answers, while OpenAI's GPT-5 models and Anthropic's Claude Haiku 4.5 were the least likely to venture beyond their restrictions.
The study didn't include the exact jailbreaking poems that the researchers used, but the team told Wired that the verse is "too dangerous to share with the public." However, the study did include a watered-down version to give a sense of how easy it is to circumvent an AI chatbot's guardrails, with the researchers telling Wired that it's "probably easier than one might think, which is precisely why we're being cautious."

source

ZoomYourWeb3

AI chatbots can be tricked with poetry to ignore their safety guardrails – Engadget

Contact Us

Quick Links