Towards deployment-centric multimodal AI beyond vision and language
/ Authors
Xianyuan Liu, Jiayang Zhang, Shuo Zhou, T. L. V. D. Plas, Avish Vijayaraghavan, A. Grishina, Mengdie Zhuang, Daniel Schofield, C. Tomlinson, Yuhan Wang
and 38 more authors
Ruizhe Li, Louisa van Zeeland, Sina Tabakhi, Cyndie Demeocq, Xiang Li, Arunav Das, Orlando Timmerman, Thomas Baldwin-McDonald, Jinge Wu, Peizhen Bai, Zahraa Al Sahili, Omnia Alwazzan, T. Do, M. N. Suvon, Angelina Wang, Lucia Cipolina-Kun, Luigi Andrea Moretti, Lucas Farndale, Nitisha Jain, Natalia Efremova, Yan Ge, M. Varela, Hak-Keung Lam, Oya Çeliktutan, Ben R. Evans, Alejandro Coca-Castro, Honghan Wu, Z. Abdallah, Chen Chen, V. Danchev, N. Tkachenko, Lei Lu, Tingting Zhu, Gregory G. Slabaugh, Roger K. Moore, William K. Cheung, Peter H. Charlton, Haiping Lu
/ Abstract
Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction and decision-making across disciplines such as healthcare, science and engineering. However, most multimodal AI advances focus on models for vision and language data, and their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early on to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasize deeper integration across multiple levels of multimodality through stakeholder engagement and interdisciplinary collaboration to broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability and finance. By fostering interdisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact. Multimodal AI combines different types of data to improve decision-making in fields such as healthcare and engineering, but work so far has focused on vision and language models. To make these systems more usable in the real world, Liu et al. discuss the need to develop approaches with deployment in mind from the start, working closely with experts across relevant disciplines.
Journal: Nature Machine Intelligence