PartIR: Composing SPMD Partitioning Strategies for Machine Learning

/ Authors

Sami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, D. Maclaurin, James Molloy, T. Natan, Tamara Norman, Xiaoyue Pan

and 7 more authors

Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Joel Wee

/ Abstract

Training modern large neural networks (NNs) requires a combination of parallelization strategies, including data, model, or optimizer sharding. To address the growing complexity of these strategies, we introduce PartIR, a hardware-and-runtime agnostic NN partitioning system. PartIR is: 1) Expressive: It allows for the composition of multiple sharding strategies, whether user-defined or automatically derived; 2) Decoupled: the strategies are separate from the ML implementation; and 3) Predictable: It follows a set of well-defined general rules to partition the NN. PartIR utilizes a schedule-like API that incrementally rewrites the ML program intermediate representation (IR) after each strategy, allowing simulators and users to verify the strategy's performance. PartIR has been successfully used both for training large models and across diverse model architectures, demonstrating its predictability, expressiveness, and performance.

Journal: Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

DOI: 10.1145/3669940.3707284