Optimizing open-domain question answering with graph-based retrieval augmented generation

/ Authors

Joyce Cahoon, Prerna Singh, Nick Litombe, Jonathan Larson, Ha Trinh, Yiwen Zhu, Andreas Müller, Fotis Psallidas, Carlo Curino

/ Abstract

In this work, we benchmark various graph-based retrieval-augmented generation (RAG) systems across a broad spectrum of query types, including OLTP-style (fact-based) and OLAP-style (thematic) queries, to address the complex demands of open-domain question answering (QA). Traditional RAG methods often fall short in handling nuanced, multi-document synthesis tasks. By structuring knowledge as graphs, we can facilitate the retrieval of context that captures greater semantic depth and enhances language model operations. We explore graph-based RAG methodologies and introduce TREX, a novel, cost-effective alternative that combines graph-based indexing and vector-based retrieval techniques. Our benchmarking across four diverse datasets highlights the strengths of different RAG methodologies, demonstrates TREX’s ability to handle multiple open-domain QA types, and reveals the limitations of current evaluation methods. We publicly release these datasets to facilitate further research and benchmarking at https://github.com/microsoft/graphrag-benchmarking-datasets. Our findings underscore the potential of augmenting large language models with advanced retrieval capabilities and scalable graph-based AI solutions.

Journal: Proceedings of the 1st workshop connecting academia and industry on Modern Integrated Database and AI Systems

DOI: 10.1145/3737412.3743489