On the Importance of Sampling in Training GCNs: Tighter Analysis and Variance Reduction
/ Authors
/ Abstract
Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of graph-related applications. Despite their great success, training GCNs on large graphs su ers from computational and memory issues. A potential path to circumvent these obstacles is sampling-based methods, where at each layer a subset of nodes is sampled. Although recent studies have empirically demonstrated the e ectiveness of sampling-based methods, these works lack theoretical convergence guarantees under realistic settings and cannot fully leverage the information of evolving parameters during optimization. In this paper, we describe and analyze a general doubly variance reduction schema that can accelerate any sampling method under the memory budget. The motivating impetus for the proposed schema is a careful analysis for the variance of sampling methods where it is shown that the induced variance can be decomposed into node embedding approximation variance (zeroth-order variance) during forward propagation and layerwise-gradient variance ( rst-order variance) during backward propagation. We theoretically analyze the convergence of the proposed schema and show that it enjoys an (1/T ) convergence rate. We complement our theoretical results by integrating the proposed schema in di erent sampling methods and applying them to di erent large real-world graphs. Code is public available at https://github.com/CongWeilin/SGCN.git.