GUST: Graph Edge-Coloring Utilization for Accelerating Sparse Matrix Vector Multiplication
/ Authors
/ Abstract
Sparse matrix-vector multiplication (SpMV) plays a vital role in various scientific and engineering fields, from scientific computing to machine learning. Traditional general-purpose processors often fall short of their peak performance with sparse data, leading to the development of domain-specific architectures to enhance SpMV. Yet, these specialized approaches, whether tailored explicitly for SpMV or adapted from matrix-matrix multiplication accelerators, still face challenges in fully utilizing hardware resources as a result of sparsity. To tackle this problem, we introduce GUST, a hardware/software co-design, the key insight of which lies in separating multipliers and adders in the hardware, thereby enabling resource sharing across multiple rows and columns, leading to efficient hardware utilization and ameliorating negative performance impacts from sparsity. Resource sharing, however, can lead to collisions, a problem we address through a specially devised edge-coloring scheduling algorithm. Our comparisons with various prior domain specific architectures using real-world datasets shows the effectiveness of GUST, with an average hardware utilization of 33.67%. We further evaluate GUST by comparing SpMV execution time and energy consumption of length-256 and -87 GUST with length-256 1-dimensional systolic array (1D), achieving an average speedup of 411× and 108×, and energy efficiency improvement of 137× and 148×, respectively. To asses the implementation aspect, we compare resource consumption of GUST with 1D as a baseline through FPGA synthesis. Length-256 GUST uses the same number of arithmetic units as length-256 1D, while length-87 GUST uses considerably less. We also compare GUST with Serpens, a state-of-the-art FPGA-based SpMV accelerator, with GUST achieving lower execution time on seven out of nine matrices and lower energy consumption on four.
Journal: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4