Unbiased Sampling of Facebook
/ Authors
/ Abstract
The popularity of online social networks (OSNs) has given rise to a number of measurement studies that provide a first step towards their understanding. So far, such studies have been based either on complete data sets provided directly by the OSN itself or on Breadth-First-Search (BFS) crawling of the social graph, which does not guarantee good statistical properties of the collected sample. In this paper, we crawl the publicly available social graph and present the first unb iased sampling of Facebook (FB) users using a MetropolisHastings random walk with multiple chains. We study the convergence properties of the walk and demonstrate the uniformity of the collected sample with respect to multiple metrics of interest. We provide a comparison of our crawling technique to baseline algorithms, namely BFS and simple random walk, as well as to the “ground truth” obtained through truly uniform sampling of userIDs. Our contributions lie both in the measurement methodology and in the collected sample. With regards to the methodology, our measurement technique (i) applies and combines known results from random walk sampling specifically in the OSN context and (ii) addresses system implementation aspects that have made the measurement of Facebook challenging so far. With respect to the collected sample: (i) it is the first repre sentative sample of FB users and we plan to make it publicly available; (ii) we perform a characterization of several ke y properties of the data set, and find that some of them are substantially different from what was previously believed based on non-representative OSN samples.