Limit results for distributed estimation of invariant subspaces in multiple networks inference and PCA

/ Authors

/ Abstract

Several statistical problems, such as multiple heterogeneous graph analysis, distributed PCA, integrative data analysis, and simultaneous dimension reduction of images, can involve a collection of $m$ matrices whose leading subspaces $U^{(i)}$ consist of a shared subspace $U_c$ and individual subspaces $U_s^{(i)}$. We consider a distributed estimation procedure that first obtains $\hat U^{(i)}$ as the leading singular vectors for each observed noisy matrix, then computes the leading left singular vectors of the concatenated matrix $[\hat U^{(1)}|\hat U^{(2)}|\dots|\hat U^{(m)}]$ as $\hat U_c$, and finally computes the leading singular vectors of the projection of each $\hat U^{(i)}$ onto the orthogonal complement of $\hat U_c$ as $\hat U_s^{(i)}$. In this paper, we provide a framework for deriving limit results for such distributed estimation procedures, including expansions of estimation errors in both common and individual subspaces and their asymptotically normal approximations. We apply this framework specifically to (1) parameter estimation for multiple heterogeneous random graphs with shared subspaces, and (2) distributed PCA for independent sub-Gaussian random vectors with spiked covariance structures. Leveraging these results, we also consider a two-sample test for the null hypothesis that a pair of random graphs have the same edge probabilities, and present a test statistic whose limiting distribution converges to a central (resp., non-central) $\chi^2$ distribution under the null (resp., local alternative) hypothesis.