Neural Architectural Backdoors
/ Authors
/ Abstract
This paper asks the intriguing question: is it possible to exploit neural architecture search (NAS) as a new attack vector to launch previously improbable attacks? Specifically, we present E VAS , a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers. Compared with existing attacks, E VAS demonstrates many interesting properties: ( i ) it does not require polluting training data or perturbing model parameters; ( ii ) it is agnostic to downstream fine-tuning or even re-training from scratch; ( iii ) it naturally evades defenses that rely on inspecting model parameters or training data. With extensive evaluation on benchmark datasets, we show that E VAS features high evasiveness, transferability, and robustness, thereby expanding the adversary’s design spectrum. We further characterize the mechanisms underlying E VAS , which are possibly explainable by architecture-level “shortcuts” that recognize trigger patterns. This work raises concerns about the current practice of NAS and points to potential directions to develop effective countermeasures.