Towards Robust and Secure Embodied AI: A Survey on Vulnerabilities and Attacks
/ Authors
/ Abstract
Embodied AI systems, integrating Large Vision-Language Models (LVLMs) and Large Language Models (LLMs) with physical actuators and sensors, face unique robustness and security challenges stemming from the complex interplay between perception, cognition, and actuation in real-world environments. This survey provides a systematic analysis of these vulnerabilities and associated attack surfaces. We propose a tripartite vulnerability taxonomy comprising foundational, integration, and contextual risks. Foundational vulnerabilities arise from inherent limitations in current AI architectures and training paradigms; Integration vulnerabilities emerge from the composition of cyber-physical components; And contextual vulnerabilities stem from dynamic physical environments and deployment conditions. Correspondingly, we present a comprehensive attack taxonomy that encompasses foundational attacks on LLMs/LVLMs (including logits-based, optimization-based, prompt-based, and cross-modality attacks), integration-level cybersecurity threats (such as man-in-the-middle, firmware, side-channel, and supply chain attacks), and contextual attacks (primarily sensor spoofing across multiple modalities). We further examine representative failure modes of the cognitive core, review existing evaluation methodologies and benchmarks, and synthesize a multi-layer defense framework that integrates perceptual redundancy, runtime monitoring, and hardware-enforced safety mechanisms. This work offers a unified conceptual framework and practical roadmap for designing and evaluating robust, secure Embodied AI systems in safety-critical real-world deployments.
Journal: ACM Computing Surveys
DOI: 10.1145/3806048