An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care
/ Authors
Z. Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, C. Xue
and 21 more authors
Rachel Chong, Quan V. Hoang, L. Raghavan, Y. Tham, C. Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, E. Lamoureux, S. Saw, V. Nangia, S. Panda-Jonas, Jie Xu, Y. Wang, Xinxing Xu, J. Jonas, T. Y. Wong, R. Goh, Yong Liu, Ching-Yu Cheng
/ Abstract
Summary We present Meta-EyeFM, an integrated language-vision foundation model designed for conversational diagnostics and triaging in primary eye care. By combining a large language model (LLM) with eight task-specific vision foundation models (VFMs), Meta-EyeFM dynamically routes user queries and fundus photographs to the most appropriate VFMs (accuracy 96.8%). It demonstrates high performance in detecting ocular diseases (area under the receiver operating curve [AUC] ≥91.2%), differentiating disease severity (AUC ≥82%), identifying ocular signs (AUC ≥77.9%), and predicting systemic conditions like diabetes (AUC ≥79.8%). Meta-EyeFM is 11%–43% more accurate than Gemini-1.5-flash and GPT-4o LLM and generally outperforms junior ophthalmologist and optometrist graders in detecting different eye diseases. Its conversational interface and robust generalizability support its role as a diagnostic decision support tool in community settings. Through self-supervised learning and a user-friendly platform, Meta-EyeFM addresses the scarcity of skilled eye care professionals, offering scalable, explainable AI for enhancing vision screening and disease triage globally.
Journal: Cell Reports Medicine