SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multiprecision DNN Inference
/ Authors
/ Abstract
Deploying deep neural networks (DNNs) on those resource-constrained edge platforms is hindered by their substantial computation and storage demands. Quantized multiprecision DNNs (MP-DNNs), denoted as MP-DNNs, offer a promising solution for these limitations but pose challenges for the existing RISC-V processors due to complex instructions, suboptimal parallel processing, and inefficient dataflow mapping. To tackle the challenges mentioned above, SPEED, a scalable RISC-V vector (RVV) processor, is proposed to enable efficient MP-DNN inference, incorporating innovations in customized instructions, hardware architecture, and dataflow mapping. First, some dedicated customized RISC-V instructions are introduced based on RVV extensions to reduce the instruction complexity, allowing SPEED to support processing precision ranging from 4- to 16-bit with minimized hardware overhead. Second, a parameterized multiprecision tensor unit (MPTU) is developed and integrated within the scalable module to enhance parallel processing capability by providing reconfigurable parallelism that matches the computation patterns of diverse MP-DNNs. Finally, a flexible mixed dataflow method is adopted to improve computational and energy efficiency according to the computing patterns of different DNN operators. The synthesis of SPEED is conducted on TSMC 28-nm technology. Experimental results show that SPEED achieves a peak throughput of 737.9 GOPS and an energy efficiency of 1383.4 GOPS/W for 4-bit operators. Furthermore, SPEED exhibits superior area efficiency compared with prior RVV processors, with the enhancements of $5.9\sim 26.9\times $ and $8.2\sim 18.5\times $ for 8-bit operator and best integer performance, respectively, which highlights SPEED’s significant potential for efficient MP-DNN inference.
Journal: IEEE Transactions on Very Large Scale Integration (VLSI) Systems