Christian Borgs, Jennifer T. Chayes, Souvik Dhara, Subhabrata Sen
We investigate structural properties of large, sparse random graphs through the lens of "sampling convergence" (Borgs et. al. (2017)). Sampling convergence generalizes left convergence to sparse graphs, and describes the limit in terms of a "graphex". We introduce a notion of sampling convergence for sequences of multigraphs, and establish the graphex limit for the configuration model, a preferential attachment model, the generalized random graph, and a bipartite variant of the configuration model. The results for the configuration model, preferential attachment model and bipartite configuration model provide necessary and sufficient conditions for these random graph models to converge. The limit for the configuration model and the preferential attachment model is an augmented version of an exchangeable random graph model introduced by Caron and Fox (2017).
Souvik Dhara
Random graphs have played an instrumental role in modelling real-world networks arising from the internet topology, social networks, or even protein-interaction networks within cells. Percolation, on the other hand, has been the fundamental model for understanding robustness and spread of epidemics on these networks. From a mathematical perspective, percolation is one of the simplest models that exhibits phase transition, and fascinating features are observed around the critical point. In this thesis, we prove limit theorems about structural properties of the connected components obtained from percolation on random graphs at criticality. The results are obtained for random graphs with general degree sequence, and we identify different universality classes for the critical behavior based on moment assumptions on the degree distribution.
Souvik Dhara, Remco van der Hofstad, Johan S. H. van Leeuwaarden
In this paper, we study the critical behavior of percolation on a configuration model with degree distribution satisfying an infinite second-moment condition, which includes power-law degrees with exponent $τ\in (2,3)$. It is well known that, in this regime, many canonical random graph models, such as the configuration model, are robust in the sense that the giant component is not destroyed when the percolation probability stays bounded away from zero. Thus, the critical behavior is observed when the percolation probability tends to zero with the network size, despite of the fact that the average degree remains bounded. In this paper, we initiate the study of critical random graphs in the infinite second-moment regime by identifying the critical window for the configuration model. We prove scaling limits for component sizes and surplus edges, and show that the maximum diameter the critical components is of order $\log n$, which contrasts with the previous universality classes arising in the literature. This introduces a third and novel universality class for the critical behavior of percolation on random networks, that is not covered by the multiplicative coalescent framework due to Aldous and Limic (1998). We also prove concentration of the component sizes outside the critical window, and that a unique, complex giant component emerges after the critical window. This completes the picture for the percolation phase transition on the configuration model.
Shankar Bhamidi, Souvik Dhara, Remco van der Hofstad, Sanchayan Sen
We study limits of the largest connected components (viewed as metric spaces) obtained by critical percolation on uniformly chosen graphs and configuration models with heavy-tailed degrees. For rank-one inhomogeneous random graphs, such results were derived by Bhamidi, van der Hofstad, Sen [Probab. Theory Relat. Fields 2018]. We develop general principles under which the identical scaling limits as the rank-one case can be obtained. Of independent interest, we derive refined asymptotics for various susceptibility functions and the maximal diameter in the barely subcritical regime.
Souvik Dhara, Julia Gaudio, Elchanan Mossel, Colin Sandon
Community detection is the problem of identifying community structure in graphs. Often the graph is modeled as a sample from the Stochastic Block Model, in which each vertex belongs to a community. The probability that two vertices are connected by an edge depends on the communities of those vertices. In this paper, we consider a model of {\em censored} community detection with two communities, where most of the data is missing as the status of only a small fraction of the potential edges is revealed. In this model, vertices in the same community are connected with probability $p$ while vertices in opposite communities are connected with probability $q$. The connectivity status of a given pair of vertices $\{u,v\}$ is revealed with probability $α$, independently across all pairs, where $α= \frac{t \log(n)}{n}$. We establish the information-theoretic threshold $t_c(p,q)$, such that no algorithm succeeds in recovering the communities exactly when $t < t_c(p,q)$. We show that when $t > t_c(p,q)$, a simple spectral algorithm based on a weighted, signed adjacency matrix succeeds in recovering the communities exactly. While spectral algorithms are shown to have near-optimal performance in the symmetric case, we show that they may fail in the asymmetric case where the connection probabilities inside the two communities are allowed to be different. In particular, we show the existence of a parameter regime where a simple two-phase algorithm succeeds but any algorithm based on the top two eigenvectors of the weighted, signed adjacency matrix fails.
Souvik Dhara, Remco van der Hofstad, Johan S. H. van Leeuwaarden, Sanchayan Sen
We investigate the component sizes of the critical configuration model, as well as the related problem of critical percolation on a supercritical configuration model. We show that, at criticality, the finite third moment assumption on the asymptotic degree distribution is enough to guarantee that the sizes of the largest connected components are of the order $n^{2/3}$ and the re-scaled component sizes (ordered in a decreasing manner) converge to the ordered excursion lengths of an inhomogeneous Brownian Motion with a parabolic drift. We use percolation to study the evolution of these component sizes while passing through the critical window and show that the vector of percolation cluster-sizes, considered as a process in the critical window, converge to the multiplicative coalescent process in the sense of finite dimensional distributions. This behavior was first observed for Erdős-Rényi random graphs by Aldous (1997) and our results provide support for the empirical evidences that the nature of the phase transition for a wide array of random-graph models are universal in nature. Further, we show that the re-scaled component sizes and surplus edges converge jointly under a strong topology, at each fixed location of the scaling window.
Debankur Mukherjee, Souvik Dhara, Sem Borst, Johan S. H. van Leeuwaarden
A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all. Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations.
Souvik Dhara, Remco van der Hofstad
We study the giant component problem slightly above the critical regime for percolation on Poissonian random graphs in the scale-free regime, where the vertex weights and degrees have a diverging second moment. Critical percolation on scale-free random graphs have been observed to have incredibly subtle features that are markedly different compared to those in random graphs with converging second moment. In particular, the critical window for percolation depends sensitively on whether we consider single- or multi-edge versions of the Poissonian random graph. In this paper, and together with our companion paper with Bhamidi, we build a bridge between these two cases. Our results characterize the part of the barely supercritical regime where the size of the giant components are approximately same for the single- and multi-edge settings. The methods for establishing concentration of giant for the single- and multi-edge versions are quite different. While the analysis in the multi-edge case is based on scaling limits of exploration processes, the single-edge setting requires identification of a core structure inside certain high-degree vertices that forms the giant component.
Souvik Dhara, Subhabrata Sen
Consider the random graph sampled uniformly from the set of all simple graphs with a given degree sequence. Under mild conditions on the degrees, we establish a Large Deviation Principle (LDP) for these random graphs, viewed as elements of the graphon space. As a corollary of our result, we obtain LDPs for functionals continuous with respect to the cut metric, and obtain an asymptotic enumeration formula for graphs with given degrees, subject to an additional constraint on the value of a continuous functional. Our assumptions on the degrees are identical to those of Chatterjee, Diaconis and Sly (2011), who derived the almost sure graphon limit for these random graphs.
Souvik Dhara, Johan S. H. van Leeuwaarden, Debankur Mukherjee
A notorious problem in mathematics and physics is to create a solvable model for random sequential adsorption of non-overlapping congruent spheres in the $d$-dimensional Euclidean space with $d\geq 2$. Spheres arrive sequentially at uniformly chosen locations in space and are accepted only when there is no overlap with previously deposited spheres. Due to spatial correlations, characterizing the fraction of accepted spheres remains largely intractable. We study this fraction by taking a novel approach that compares random sequential adsorption in Euclidean space to the nearest-neighbor blocking on a sequence of clustered random graphs. This random network model can be thought of as a corrected mean-field model for the interaction graph between the attempted spheres. Using functional limit theorems, we characterize the fraction of accepted spheres and its fluctuations.
Souvik Dhara, Debankur Mukherjee, Subhabrata Sen
The $k$-section width and the Max-Cut for the configuration model are shown to exhibit phase transitions according to the values of certain parameters of the asymptotic degree distribution. These transitions mirror those observed on Erdős-Rényi random graphs, established by Luczak and McDiarmid (2001), and Coppersmith et al. (2004), respectively.
Souvik Dhara, Julia Gaudio, Elchanan Mossel, Colin Sandon
Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), as well as in a censored variant of the SBM. Here we show that this optimality is somewhat universal as it carries over to other planted substructures such as the planted dense subgraph problem and submatrix localization problem, as well as to a censored version of the planted dense subgraph problem.
Shankar Bhamidi, Souvik Dhara, Remco van der Hofstad, Sanchayan Sen
We establish the global lower mass-bound property for the largest connected components in the critical window for the configuration model when the degree distribution has an infinite third moment. The scaling limit of the critical percolation clusters, viewed as measured metric spaces, was established in [7] with respect to the Gromov-weak topology. Our result extends those scaling limit results to the stronger Gromov-Hausdorff-Prokhorov topology under slightly stronger assumptions on the degree distribution. This implies the distributional convergence of global functionals such as the diameter of the largest critical components. Further, our result gives a sufficient condition for compactness of the random metric spaces that arise as scaling limits of critical clusters in the heavy-tailed regime.
Christian Borgs, Jennifer T. Chayes, Souvik Dhara, Subhabrata Sen
Kallenberg (2005) provided a necessary and sufficient condition for the local finiteness of a jointly exchangeable random measure on $\R_+^2$. Here we note an additional condition that was missing in Kallenberg's theorem, but was implicitly used in the proof. We also provide a counter-example when the additional condition does not hold.
Shankar Bhamidi, Souvik Dhara, Remco van der Hofstad
We study the critical behavior for percolation on inhomogeneous random networks on $n$ vertices, where the weights of the vertices follow a power-law distribution with exponent $τ\in (2,3)$. Such networks, often referred to as scale-free networks, exhibit critical behavior when the percolation probability tends to zero at an appropriate rate, as $n\to\infty$. We identify the critical window for a host of scale-free random graph models such as the Norros-Reittu model, Chung-Lu model and generalized random graphs. Surprisingly, there exists a finite time inside the critical window, after which, we see a sudden emergence of a tiny giant component. This is a novel behavior which is in contrast with the critical behavior in other known universality classes with $τ\in (3,4)$ and $τ>4$. Precisely, for edge-retention probabilities $π_n = λn^{-(3-τ)/2}$, there is an explicitly computable $λ_c>0$ such that the critical window is of the form $λ\in (0,λ_c),$ where the largest clusters have size of order $n^β$ with $β=(τ^2-4τ+5)/[2(τ-1)]\in[\sqrt{2}-1, \tfrac{1}{2})$ and have non-degenerate scaling limits, while in the supercritical regime $λ> λ_c$, a unique `tiny giant' component of size $\sqrt{n}$ emerges. For $λ\in (0,λ_c),$ the scaling limit of the maximum component sizes can be described in terms of components of a one-dimensional inhomogeneous percolation model on $\mathbb{Z}_+$ studied in a seminal work by Durrett and Kesten. For $λ>λ_c$, we prove that the sudden emergence of the tiny giant is caused by a phase transition inside a smaller core of vertices of weight $Ω(\sqrt{n})$.
Souvik Dhara, Johan S. H. van Leeuwaarden, Debankur Mukherjee
We investigate Random Sequential Adsorption (RSA) on a random graph via the following greedy algorithm: Order the $n$ vertices at random, and sequentially declare each vertex either active or frozen, depending on some local rule in terms of the state of the neighboring vertices. The classical RSA rule declares a vertex active if none of its neighbors is, in which case the set of active nodes forms an independent set of the graph. We generalize this nearest-neighbor blocking rule in three ways and apply it to the Erdős-Rényi random graph. We consider these generalizations in the large-graph limit $n\to\infty$ and characterize the jamming constant, the limiting proportion of active vertices in the maximal greedy set.
Souvik Dhara, Remco van der Hofstad, Johan S. H. van Leeuwaarden, Sanchayan Sen
We study the critical behavior of the component sizes for the configuration model when the tail of the degree distribution of a randomly chosen vertex is a regularly-varying function with exponent $τ-1$, where $τ\in (3,4)$. The component sizes are shown to be of the order $n^{(τ-2)/(τ-1)}L(n)^{-1}$ for some slowly-varying function $L(\cdot)$. We show that the re-scaled ordered component sizes converge in distribution to the ordered excursions of a thinned Lévy process. This proves that the scaling limits for the component sizes for these heavy-tailed configuration models are in a different universality class compared to the Erdős-Rényi random graphs. Also the joint re-scaled vector of ordered component sizes and their surplus edges is shown to have a distributional limit under a strong topology. Our proof resolves a conjecture by Joseph, Ann. Appl. Probab. (2014) about the scaling limits of uniform simple graphs with i.i.d degrees in the critical window, and sheds light on the relation between the scaling limits obtained by Joseph and in this paper, which appear to be quite different. Further, we use percolation to study the evolution of the component sizes and the surplus edges within the critical scaling window, which is shown to converge in finite dimension to the augmented multiplicative coalescent process introduced by Bhamidi et. al., Probab. Theory Related Fields (2014). The main results of this paper are proved under rather general assumptions on the vertex degrees. We also discuss how these assumptions are satisfied by some of the frameworks that have been studied previously.
Souvik Dhara, Debankur Mukherjee, Kavita Ramanan
For an $n\times n$ matrix $A_n$, the $r\to p$ operator norm is defined as $$\|A_n\|_{r\to p}:= \sup_{\mathbf{x}\in\mathbb{R}^n:\|\mathbf{x} \|_r\leq 1 } \|A_n\mathbf{x} \|_p\quad\text{for}\quad r,p\geq 1.$$ For different choices of $r$ and $p$, this norm corresponds to key quantities that arise in diverse applications including matrix condition number estimation, clustering of data, and construction of oblivious routing schemes in transportation networks. This article considers $r\to p$ norms of symmetric random matrices with nonnegative entries, including adjacency matrices of Erdős-Rényi random graphs, matrices with positive sub-Gaussian entries, and certain sparse matrices. For $1<p\leq r<\infty$, the asymptotic normality, as $n\to\infty$, of the appropriately centered and scaled norm $\|A_n\|_{r\to p}$ is established. When $p \geq 2$, this is shown to imply asymptotic normality of the solution to the $\ell_p$ quadratic maximization problem, also known as the $\ell_p$ Grothendieck problem. Furthermore, a sharp $\ell_\infty$-approximation bound for the unique maximizing vector in the definition of $\|A_n\|_{r\to p}$ is obtained, and may be viewed as an $\ell_\infty$-stability result of the maximizer under random perturbations of the matrix with mean entries. This result is in fact shown to hold for a broad class of deterministic sequences of matrices having certain asymptotic expansion properties. The results obtained can be viewed as a generalization of the seminal results of Füredi and Komlós (1981) on asymptotic normality of the largest singular value of a class of symmetric random matrices. In the general case with $1<p\leq r< \infty$, spectral methods are no longer applicable, and so a new approach is developed involving a refined convergence analysis of a nonlinear power method and a perturbation bound on the maximizing vector, which may be of independent interest.
Souvik Dhara, Julia Gaudio, Elchanan Mossel, Colin Sandon
Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral algorithms using two matrices in a graph partitioning problem. We use two different matrices resulting from two different encodings of the same graph and then combine the spectral information coming from these two matrices. We analyze a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs. In particular, we consider the problem of recovering community assignments exactly in the censored stochastic block model, where each edge status is revealed independently with some probability. We show that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold. Further, we show that for most choices of the parameters, any spectral algorithm based on one matrix is suboptimal. The latter observation is in contrast to our prior works (2022a, 2022b) which showed that for the symmetric Stochastic Block Model and the Planted Dense Subgraph problem, a spectral algorithm based on one matrix achieves the information theoretic threshold. We additionally provide more general geometric conditions for the (sub)-optimality of spectral algorithms.
Aman Barot, Shankar Bhamidi, Souvik Dhara
With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learning or network embedding. These algorithms try to learn representations of network functionals (e.g.~nodes) by first running multiple random walks and then using the number of co-occurrences of each pair of nodes in observed random walk segments to obtain a low-dimensional representation of nodes on some Euclidean space. The aim of this paper is to rigorously understand the performance of two major algorithms, DeepWalk and node2vec, in recovering communities for canonical network models with ground truth communities. Depending on the sparsity of the graph, we find the length of the random walk segments required such that the corresponding observed co-occurrence window is able to perform almost exact recovery of the underlying community assignments. We prove that, given some fixed co-occurrence window, node2vec using random walks with a low non-backtracking probability can succeed for much sparser networks compared to DeepWalk using simple random walks. Moreover, if the sparsity parameter is low, we provide evidence that these algorithms might not succeed in almost exact recovery. The analysis requires developing general tools for path counting on random networks having an underlying low-rank structure, which are of independent interest.