PUL: Pre-load in Software for Caches Wouldn't Always Play Along

cs.DB

/ Authors

Arthur Bernhardt, Sajjad Tamimi, Florian Stock, Andreas Koch, Ilia Petrov

/ Abstract

Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based prefetching exhibits higher efficiency, improving with newer CPU generations. In this paper we investigate software-based, post-Moore systems that offload operations to intelligent memories. We show that software-based prefetching has even higher potential in near-data processing settings by maximizing compute utilization through compute/IO interleaving.