Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets
stat.ME
/ Authors
/ Abstract
Longitudinal binary or count functional data are common in neuroscience, but are often too large to analyze with existing functional regression methods. We propose one-step penalized generalized estimating equations that supports generalized functional outcomes (e.g., count, binary, proportion, continuous-valued) and is fast even when datasets have a large number of clusters and large cluster sizes. The method applies to functional and scalar covariates and the one-step estimation framework enables efficient smoothing parameter selection and joint confidence interval construction. Importantly, this semi-parametric approach yields coefficient confidence intervals that are provably valid asymptotically even under working correlation misspecification. By developing a general theory for adaptive one-step M-estimation, we prove that the coefficient estimates are asymptotically normal and as efficient as the fully-iterated estimator; we verify these theoretical properties in simulations. We illustrate the benefits of our approach for analyzing large-scale neural recordings by applying it to a recent calcium imaging dataset published in Nature. We show that our method reveals important timing effects obscured in non-functional analyses. In doing so, we also demonstrate scaling to common neuroscience dataset sizes: the one-step estimator fits to a dataset with 150,000 (binary) functional outcomes, each observed at 120 functional domain points, in only 6.5 minutes on a laptop without parallelization. We release our methods in the R package 'fastfGEE', which supports a wide range of link functions and working covariances.