Researchers at Montreal’s Centre hospitalier universitaire Sainte-Justine (CHU Sainte-Justine) and the Montreal Children’s Hospital recently asked ChatGPT 20 medical questions. The artificial intelligence engine provided them with answers of … doubtful quality, including factual errors and made-up references. They recently published the results of their research in the Mayo Clinic Proceedings: Digital Health.
“These results are alarming, given that trust is a pillar of scientific communication,” said Dr. Jocelyn Gravel, lead author of the study and emergency physician at CHU Sainte-Justine. “ChatGPT users should pay close attention to the references provided before incorporating them into medical manuscripts.”
Aimed at scientists who might be tempted to use the ChatGPT artificial intelligence model for writing medical texts, the researchers recommended that they instead direct their questions to a professional.
For this study, the first to assess the quality and accuracy of the references provided by ChatGPT, the group claims, they drew their questions from existing studies and asked ChatGPT to back up its answers with references. Subsequently, the researchers had the software’s responses rated on a scale of 0 to 100 per cent by the authors of the articles from which the questions originated.
Seventeen authors agreed to review the responses. They rated them as of questionable quality (with a median score of 60 per cent). They also found five major and seven minor factual errors. For example, ChatGPT suggested administering an anti-inflammatory drug by injection when it should instead be ingested. Another example: it had increased the global mortality rate associated with Shigella infections tenfold.
Of the references provided, 69 per cent were invented, yet looked true. Ninety-five per cent of these used the names of authors who had previously published articles on a related topic, or from recognized organizations such as the U.S. Centers for Disease Control and Prevention or the U.S. Food and Drug Administration. They all had titles related to the subject, and used the names of well-known newspapers or websites. Furthermore, even the real references were problematic, with almost half of them containing errors.
The researchers then questioned ChatGPT about the accuracy of the references provided. In one case, the AI argued that “the references are available on PubMed” and provided a web link to other publications unrelated to the issue. In another case, the software replied, “I strive to provide the most accurate and up-to-date information I have, but errors or inaccuracies may occur.”
According to Dr. Esli Osmanlliu, emergency physician at the Montreal Children’s Hospital and scientist from the Child Health and Human Development Program at the Research Institute of the McGill University Health Centre, “The importance of correct references in science is undeniable. The quality and breadth of references provided in authentic studies demonstrate that researchers have conducted a comprehensive literature review and are familiar with the topic. This process allows results to be integrated into the context of previous work, a fundamental aspect of the advancement of medical research. Not providing references is one thing, but creating fake references would be considered fraudulent for researchers.”
“Researchers using ChatGPT could be misled by false information, as clear, seemingly consistent, and stylistically appealing references can hide low-quality content,” the researcher continued.