The Hitchhiker's Guide to Programming and Optimizing Cache Coherent Heterogeneous Systems: CXL, NVLink-C2C, and AMD Infinity Fabric
/ Authors
Zixuan Wang, Suyash Mahar, Luyi Li, Jangseon Park, Jinpyo Kim, Theodore Michailidis, Yue Pan, Mingyao Shen, Tajana Rosing, D. Tullsen
and 2 more authors
/ Abstract
We present a thorough analysis of the use of modern heterogeneous systems interconnected by various cachecoherent links, including CXL, NVLink-C2C, and Infinity Fabric. We studied a wide range of server systems that combined CPUs from different vendors and various types of coherent memory devices, including CXL memory expander, CXL pool, CXL shared memory, GH200 GPU, and AMD MI300a HBM. For this study, we developed a heterogeneous memory benchmark suite, Heimdall, to profile the performance of such heterogeneous systems and present a detailed performance comparison across systems. By leveraging H E I M DA L L , we unveiled the detailed architecture design in these systems, drew observations on optimizing performance for workloads, and pointed out directions for future development of cache coherent heterogeneous systems.