Look Before You Leap: An Exploratory Study of Uncertainty Analysis for Large Language Models
/ Authors
/ Abstract
The recent performance leap of Large Language Models (LLMs) opens up new opportunities across numerous industrial applications and domains. However, the potential erroneous behavior (e.g., the generation of misinformation and hallucination) has also raised severe concerns for the trustworthiness of LLMs, especially in safety-, security- and reliability-sensitive industrial scenarios, potentially hindering real-world adoptions. While uncertainty estimation has shown its potential for interpreting the prediction risks made by classic machine learning (ML) models, the unique characteristics of recent LLMs (e.g., adopting self-attention mechanism as its core, very large-scale model size, often used in generative contexts) pose new challenges for the behavior analysis of LLMs. Up to the present, little progress has been made to better understand whether and to what extent uncertainty estimation can help characterize the capability boundary of an LLM, to counteract its undesired behavior, which is considered to be of great importance with the potential wide-range applications of LLMs across industry domains. To bridge the gap, in this paper, we initiate an early exploratory study of the risk assessment of LLMs from the lens of uncertainty. In particular, we conduct a large-scale study with as many as twelve uncertainty estimation methods and eight general LLMs on four NLP tasks and seven programming-capable LLMs on two code generation tasks to investigate to what extent uncertainty estimation techniques could help characterize the prediction risks of LLMs. Our findings confirm the potential of uncertainty estimation for revealing LLMs’ uncertain/non-factual predictions. The insights derived from our study can pave the way for more advanced analysis and research on LLMs, ultimately aiming at enhancing their trustworthiness.
Journal: IEEE Transactions on Software Engineering