ChatGPT has declared fake articles to be ‘world-leading’ and ‘internationally excellent’. | Photo: provided by subject

Chatbots are increasingly becoming a general source of information. They’re also outperforming the usual Internet search engines. But this is a problem, as researchers from the University of Sheffield in England and the University of Turku in Finland have observed in a study printed in the journal Learned Publishing. When they asked ChatGPT 4o-mini to evaluate 217 retracted articles, it failed to recognise serious mistakes in them.

The papers tested were selected by the researchers from the database run by the blog Retraction Watch. One criterion for selecting a paper was that it should have garnered considerable attention – which meant it had to have a high altmetric score. Another criterion was that the paper’s problems should already be well known. The quality of each article was assessed 30 times, but the AI never once mentioned that it had been retracted. On the contrary, 190 of the articles tested were given the ratings ‘world leading’, ‘internationally excellent’ or something close to these. Even when the research team tested individual statements from these papers that had long been disproven, the chatbot deemed them true in well over half of the cases.

“It would be surprising if the specifications of a Large Language Model included a theory of knowledge”.Elizabeth-Marie Helms

Reactions to this study have been mixed. On Bluesky, Elizabeth-Marie Helms – an assistant professor at the Indiana University Libraries – asked: “What is ‘retraction’ to a Large Language Model but another token? It would be surprising if a theory of knowledge, much less a social theory of knowledge, was in their specifications”. Debora Weber-Wulff, a computer scientist at the HTW Berlin University of Applied Sciences, is not convinced by the methodology employed in the study. In an article in Chemical and Engineering News, she blames the scientific system itself: “The problem is that humans have a very difficult time determining if a paper or a dissertation has been retracted because of the reluctance of journals and universities to properly mark them”.