The rapid development of Large Language Models (LLMs) has ushered in a new era of generative AI applications. While these models have shown impressive capabilities in generating coherent text, they are not without their pitfalls. One of the most significant challenges faced by LLMs is hallucination. In this article, we will delve into the phenomenon of hallucination in LLMs, exploring its various manifestations, underlying causes, and potential solutions.
Understanding Hallucination in LLMs
In the realm of language models, hallucination occurs when the generated content deviates from the source content, resulting in nonsensical or unfaithful text. This deviation can manifest as contradictions to the source content or as information that cannot be supported or contradicted by the source.
Hallucination is a pervasive challenge across various Natural Language Processing (NLP) tasks, including question and answering, dialog systems, abstract summarization, and machine translation. Among the most common instances of hallucination is the tendency of LLMs to provide unfactual and plausible answers when faced with questions lacking relevant information.
Factors Contributing to Hallucination
Several factors contribute to hallucination in LLMs:
Training Data Collection
LLMs acquire knowledge through pre-training on vast text corpora. While this approach yields valuable insights, it introduces noise into the data. Heuristic rules used during data collection can inadvertently incorporate phrases that do not align with the input. As a result, models may generate text that is fluent but unsupported. Moreover, the information within LLMs may be incomplete or outdated, leading to performance degradation.
Model Training Process
LLMs are typically trained using maximum likelihood estimation (MLE), where they predict the next token based on ground-truth sequences. This can lead to "stochastic parroting," where models mimic training data without true comprehension. During inference, models generate the next token based on their previous output. This discrepancy (called exposure bias) can lead to increasingly erroneous text generation. Snowballing hallucinations may also occur to maintain coherence with earlier hallucinations.
LLMs memorize knowledge within their parameters, which can enhance performance but also lead to hallucination. Research suggests that LLMs prioritize parametric knowledge over input, resulting in hallucinatory output. When relevant memorized text is unavailable, LLMs may resort to simple corpus-based heuristics based on term frequency.
Measuring and Detecting Hallucination
Detecting and measuring hallucination is a burgeoning research area. Traditional metrics like ROUGE and BLUE, based on lexical matches, are ill-suited for this purpose. Instead, researchers are exploring alternatives such as Natural Language Inference metrics, which assess the probability of a source text entailing the generated text. More sophisticated approaches involve training hallucination classification models or leveraging self-evaluation methods asking LLMs to generate multiple outputs and check for contradiction or consistency.
Addressing hallucination in LLMs is of utmost significance, especially in contexts where maintaining factual accuracy is crucial. Retrieval augmented generation has gained prominence in industry use cases of LLMs for its ability to significantly reduce hallucination. Recent research in this area has seen a diverse array of approaches aimed at bolstering the reliability and factual accuracy of language models. These initiatives encompass strategies to diminish exposure bias during model training, employ logical constraints to enhance reasoning capabilities, and devise advanced evaluation metrics for identifying and rectifying hallucinatory outputs. Additionally, innovative frameworks have emerged, enabling models to autonomously self-correct and efficiently edit generated content, thus minimizing the dissemination of false or fabricated information. These collective efforts underscore a growing commitment to refining language models and ensuring that they consistently deliver contextually accurate and coherent outputs across a wide spectrum of applications and use cases.
Linc's Ongoing Endeavor in Hallucination Mitigation for LLMs
In our quest to responsible AI development, comprehending and mitigating the challenges posed by hallucination in LLMs stands as a fundamental priority. Here at Linc, we remain steadfast in our dedication to upholding the highest standards of quality and responsibility within the AI industry. Continually striving for innovative solutions while vigilantly monitoring the evolving AI research landscape, we have made substantial strides in significantly reducing LLM hallucination rates. Moreover, we have leveraged our domain expertise to further enhance AI performance.
To illustrate our progress, let's consider a scenario where a shopper inquires with our AI assistant about diaper bags, and our knowledge base contains the following passage related to licensed products:
"We have partnered with select companies to manufacture certain licensed products, available through retailers such as Target, Amazon, Bed Bath & Beyond, etc."
and a passage related to order cancellation due to out-of-stock merchandise:
"While we make every effort to fulfill your entire order, we may occasionally need to cancel one or more items due to out-of-stock merchandise. In such cases, you will be promptly notified via email, and you will not incur charges for the unshipped items."
With a decade of expertise in retail CX automation, Linc possesses a profound comprehension of both shopper and retailer requirements, encompassing aspects such as service ontology, product graph, and much more. Capitalizing on Linc's domain knowledge, we have identified that the last response does not adequately address the shopper's specific query. Consequently, we have chosen not to send this response to the shopper, ensuring that they receive precise and relevant information that ultimately elevates their experience with our AI system.
The advent of Large Language Models represents a transformative leap in generative AI, empowering the generation of coherent and contextually relevant text on an unparalleled scale. Yet, the enduring hurdle of hallucination in LLMs demands vigilant consideration. Our examination has unveiled the multifaceted factors underpinning hallucination and illuminated promising strategies for its identification and mitigation. At Linc, we acknowledge that as the AI landscape continues to evolve, the persistent endeavor to tackle hallucination will play a pivotal role in ensuring the responsible deployment of LLMs across a broad spectrum of applications.
- Survey of Hallucination in Natural Language Generation (Ji et al., 2022)
- Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022)
- Contrastive Learning Reduces Hallucination in Conversations (Sun et al., 2022)
- Holistic Evaluation of Language Models (Liang et al., 2022)
- A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity (Bang et al., 2023)
- Why Does ChatGPT Fall Short in Providing Truthful Answers? (Zheng et al., 2023)
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models (Manakul et al., 2023)
- Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation (Mündler et al., 2023)
- RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding (Ji et al., 2023)
- HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models (Li et al. 2023)
- Certified Reasoning with Language Models (Poesia et al., 2023)
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (Gou et al., 2023)
- PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions (Chen et al., 2023)
- How Language Model Hallucinations Can Snowball (Zhang et al., 2023)
- Sources of Hallucination by Large Language Models on Inference Tasks (McKenna et al., 2023)