Oxford University Study Shows Large Language Models (LLMs) Pose Risk to Science with False Answers
Large Language Models (LLMs) are generative AI models that power chatbots, such as Google Bard and OpenAI’s ChatGPT. There has been a meteoric rise in the use of LLMs over the last 12 months and this is indicated in several studies and surveys. However, LLMs suffer from a critical vulnerability - AI hallucination.
A study by Professors Brent Mittelstadt, Chris Russell, and Sandra Wachter from the Oxford Internet Institute shows that (LLMs) pose a risk to science with false answers. The paper was published in Nature Human Behavior and reveals that untruthful responses, also referred to as hallucinations, cause LLMS to deviate from contextual logic, external facts, or both.
The Oxford Internet Institute functions as a multidisciplinary research and educational unit within the University of Oxford, focusing on the social science aspects of the Internet.
The findings of the study show that LLMs are designed to produce helpful and convincing responses without any overriding guarantees regarding their accuracy or alignment with fact.
Earlier this year, the Future of Life Institute issued an open letter calling for a pause on research and experiments on some AI systems to address some of the serious threats posed by the technology. The open letter was signed by nearly 1200 individuals, including several prominent IT leaders such as Elon Musk and Steve Wozniak.
One of the reasons for this problem in LLMs is the lack of reliability of the source. LLMS are trained on large datasets of text, taken from various sources, which can contain false data or non-factual information.
The lead author of the Oxford study, Director of Research, Associate Professor and Senior Research Fellow, Dr Brent Mittelstadt, explains, ‘People using LLMs often anthropomorphize the technology, where they trust it as a human-like information source. This is, in part, due to the design of LLMs as helpful, human-sounding agents that converse with users and answer seemingly any question with confident-sounding, well-written text. The result of this is that users can easily be convinced that responses are accurate even when they have no basis in fact or present a biased or partial version of the truth.’
The researchers for the paper recommend that clear expectations should be set around what LLMs can responsibly and hopefully contribute to, and to be mindful of the inherent risks. For tasks where truth and factual information are critical, such as the field of science, the use of LLMs should be restricted.
The increasing use of LLMs as knowledge bases makes users a target for regurgitated false information that was present in the training data, and ‘hallucinations’ - false information spontaneously generated by the LLM that was not present in the training data.
According to Prof Wacher, “The way in which LLMs are used matters. In the scientific community, it is vital that we have confidence in factual information, so it is important to use LLMs responsibly. If LLMs are used to generate and disseminate scientific articles, serious harm could result.”
Similar recommendations are made by Prof Russel, “It’s important to take a step back from the opportunities LLMs offer and consider whether we want to give those opportunities to a technology, just because we can.”
The authors of the paper argue that the use of LLMs for certain tasks should be limited to “zero-shot translators”, where the LLM is provided with appropriate information and asked to transform it into the desired output. This method of utilizing LLMs helps boost productivity while limiting the risks of false information. The authors acknowledge the potential of generative AI in supporting scientific workflows, however, they are clear that scrutiny of output is vital to protecting science.