GPUnion: Autonomous GPU Sharing on Campus
/ Authors
/ Abstract
A pronounced imbalance in GPU resources exists on campus, where some laboratories own underutilized servers while others lack the compute needed for AI research. GPU sharing can alleviate this disparity, while existing platforms typically rely on centralized oversight and persistent allocation models, conflicting with the voluntary and autonomous nature of academic resource ownership. We present GPUnion, a campus-scale GPU sharing platform enabling voluntary participation while preserving full provider autonomy. GPUnion incorporates three core mechanisms: i) container-based task dispatching and execution, ii) resource provider-first architecture, and iii) resilient execution featuring automatic checkpointing and migration. Case studies across multiple campus scenarios demonstrate 30% more GPU utilization improvement, 40% increase in interactive sessions, and 94% successful workload migration during provider departures.
Journal: Proceedings of the 24th ACM Workshop on Hot Topics in Networks