Large language models (LLMs) like ChatGPT are known to sometimes provide blatantly false answers with as much confidence as when they get things right. This phenomenon, known as confabulation, can occur for various reasons such as training on misinformation or inaccurate extrapolation from limited data. A simpler explanation is that LLMs do not recognize correct answers and still feel compelled to provide one, often making something up in the process.
Researchers from the University of Oxford have introduced a method to determine when LLMs are confabulating. Their approach involves evaluating 'semantic entropy,' which assesses the statistical likelihood of answers generated by the LLM and examines whether these answers are semantically equivalent. If many answers share the same meaning, the LLM is likely struggling with phrasing but gets the right answer. However, if the answers differ significantly, it indicates potential confabulation.
The significance of this discovery lies in the rapid reliance on AI technologies for a variety of tasks, from academic assignments to job applications. By detecting and mitigating instances of confabulation, the new method could enhance the reliability of LLMs. The Oxford team's findings highlight the importance of training LLMs not just on vast amounts of text but also on accuracy to improve their performance and trustworthiness in real-world applications.