Can LLMs Detect Their Own Hallucinations?

/ Authors

Sora Kadotani, Kosuke Nishida, Kyosuke Nishida

/ Abstract

Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs'capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT-$3.5$ Turbo with CoT detected $58.2\%$ of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.

Journal: ArXiv

DOI: 10.48550/arXiv.2511.11087