AI censorship affects accuracy, warns Bielik co-creator – Science in Poland

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
Several mechanisms allow artificial intelligence models to censor responses, which can affect the quality and reliability of the information they provide, according to Krzysztof Wróbel, co-creator of the Polish AI system Bielik.
A study recently published in PNAS Nexus found that Chinese AI chatbots respond differently to politically sensitive questions about China compared to Western language models. The Chinese systems were more likely to refuse to answer, omit inconvenient facts, or provide false information, indicating systemic censorship.
“In the case of closed models (like those from Google or OpenAI), we cannot be certain of their creators’ intentions. We do not know what data they used or what values guided their model development. Remember that the results you obtain from such sources may be biased,” Wróbel told PAP.
He said Bielik was designed without censorship. “In Bielik’s case, we assumed we would not censor it. We are not training it to refuse to answer specific questions.” He cited psychoactive substances as an example, where most closed models deliver censored responses. “However, there are industries, such as the pharmaceutical industry, where such topics should not be taboo. Therefore, Bielik (the downloadable version) is designed to provide information even on sensitive topics.”
Wróbel noted that completely unrestricted AI can also pose risks. He described the Bielik Guard (Sójka), a content moderation add-on that prevents the chatbot from delivering dangerous messages, including hate speech, profanity, sexual content, instructions for crime, or material related to self-harm. Sójka allows institutions to adjust “safety sliders” to protect chatbots—not just Bielik—from misuse.
According to Wróbel, AI censorship can occur at multiple stages. One is through the selection of training data. “If a model never sees texts on a given topic, it simply will not learn to talk about it. For example, if a country bans publishing content about a historical event, the language model will not learn about it and therefore will not provide a correct response.”
Creators can also deliberately reject or modify training texts before adding them to the database. Fully open models documenting every step of their development remain rare. Even in Bielik, low-quality materials had to be filtered, which could unintentionally introduce bias. “For example, we can assume that the Google models received a lot of data about the corporation itself. But perhaps it is mostly positive information about the company,” Wróbel said.
Censorship can also be introduced during training by human annotators, who teach the model desired forms of expression. Employees can then ensure AI responds according to organizational or government policies.
Restrictions can also be applied to an existing system through hidden instructions, or “prompts,” which specify how a chatbot should answer particular questions. According to Wróbel, developers can add new prompts overnight—sometimes at the request of government authorities or other stakeholders.
“The law in individual countries already influences the responses citizens receive from chatbots. In Poland, we also have some restrictions. For example, automated systems should not provide medical, legal, or financial advice,” he said. He added that failing to include appropriate disclaimers could expose developers to lawsuits.
Wróbel highlighted even subtler forms of censorship. Research on Chinese AI models generating source code found that projects on topics sensitive to China had 50% more security vulnerabilities than neutral projects, potentially making them more vulnerable to cyberattacks. “It was either a deliberate action or a side effect of incorporating censorship into the functioning of these models,” he said.
“If you use language models, remember: they will never be 100% accurate or objective. You must always verify the information they provide. The most important thing is not to blindly trust them,” Wróbel added.
Ludwika Tomala (PAP)
lt/ bar/
tr. RL
Before adding a comment, please read the Terms and Conditions of the Science in Poland forum.
View the discussion thread.
Scientists from the Małopolska Centre of Biotechnology at Jagiellonian University have shown that a key enzyme transport system in the parasite Trypanosoma cruzi is more complex than previously believed, providing new insight into a mechanism essential for the organism’s survival.
Dear Reader,

In line with the resolution of the European Parliament and of the Council of April 27, 2016  on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), we are informing you that we are processing your data.
The administrator of the data is Foundation of the PAP, headquartered in Warsaw, 6/8 Bracka St., 00-502 Warsaw.
The data in question are the data which are collected while You are using our services, including websites and other functionalities provided by Foundation of the PAP, mainly recorded in cookie files and other internet identifiers, which are installed on our webpages by us and the trusted partners of PAP SA.
The collected data are used exclusively for the following purposes:
• providing services electronically
• uncovering abuse of services
• statistical measurements and improvement of

The legal basis for processing the data is the provision and improvement of the service, as well as ensuring security, which constitutes the administrator’s legally justified interest. The data can be provided, at the request of the administrator of the data, to entities belonging to the entities which are entitled to obtain the data on the basis of current legal regulations. The person whom the data concern has a right to access the data, correct and remove them, as well as limit their processing. The person can also withdraw their consent to the processing of personal data.
All notifications concerning protection of personal data should please be directed to: fundacja@pap.pl or in writing to: Foundation of the PAP, 6/8 Bracka St., 00-502 Warsaw, with a note: ‘protection of personal data’.

More information about the principles of processing personal data and the User’s rights can be found in Privacy Policy. Learn More I agree

source

Scroll to Top