Jianfeng Lu, Cody Murphey, Stefan Steinerberger
We study the problem of predicting highly localized low-lying eigenfunctions $(-Δ+V) φ= λφ$ in bounded domains $Ω\subset \mathbb{R}^d$ for rapidly varying potentials $V$. Filoche & Mayboroda introduced the function $1/u$, where $(-Δ+ V)u=1$, as a suitable regularization of $V$ from whose minima one can predict the location of eigenfunctions with high accuracy. We proposed a fast method that produces a landscapes that is exceedingly similar, can be used for the same purposes and can be computed very efficiently: the computation time on an $n \times n$ grid, for example, is merely $\mathcal{O}(n^2 \log{n})$, the cost of two FFTs.
Stefan Steinerberger
An easy consequence of Kantorovich-Rubinstein duality is the following: if $f:[0,1]^d \rightarrow \infty$ is Lipschitz and $\left\{x_1, \dots, x_N \right\} \subset [0,1]^d$, then $$ \left| \int_{[0,1]^d} f(x) dx - \frac{1}{N} \sum_{k=1}^{N}{f(x_k)} \right| \leq \left\| \nabla f \right\|_{L^{\infty}} \cdot W_1\left( \frac{1}{N} \sum_{k=1}^{N}{δ_{x_k}} , dx\right),$$ where $W_1$ denotes the $1-$Wasserstein (or Earth Mover's) Distance. We prove another such inequality with a smaller norm on $\nabla f$ and a larger Wasserstein distance. Our inequality is sharp when the points are very regular, i.e. $W_{\infty} \sim N^{-1/d}$. This prompts the question whether these two inequalities are specific instances of an entire underlying family of estimates capturing a duality between transport distance and function space.
Jianfeng Lu, Stefan Steinerberger
We study synchronization properties of systems of Kuramoto oscillators. The problem can also be understood as a question about the properties of an energy landscape created by a graph. More formally, let $G=(V,E)$ be a connected graph and $(a_{ij})_{i,j=1}^{n}$ denotes its adjacency matrix. Let the function $f:\mathbb{T}^n \rightarrow \mathbb{R}$ be given by $$ f(θ_1, \dots, θ_n) = \sum_{i,j=1}^{n}{ a_{ij} \cos{(θ_i - θ_j)}}.$$ This function has a global maximum when $θ_i = θ$ for all $1\leq i \leq n$. It is known that if every vertex is connected to at least $μ(n-1)$ other vertices for $μ$ sufficiently large, then every local maximum is global. Taylor proved this for $μ\geq 0.9395$ and Ling, Xu \& Bandeira improved this to $μ\geq 0.7929$. We give a slight improvement to $μ\geq 0.7889$. Townsend, Stillman \& Strogatz suggested that the critical value might be $μ_c = 0.75$.
Ariel Jaffe, Yuval Kluger, Ofir Lindenbaum, Jonathan Patsenker, Erez Peterfreund, Stefan Steinerberger
word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
Ofir Lindenbaum, Stefan Steinerberger
We study the problem of exact support recovery: given an (unknown) vector $θ\in \left\{-1,0,1\right\}^D$, we are given access to the noisy measurement $$ y = Xθ+ ω,$$ where $X \in \mathbb{R}^{N \times D}$ is a (known) Gaussian matrix and the noise $ω\in \mathbb{R}^N$ is an (unknown) Gaussian vector. How small we can choose $N$ and still reliably recover the support of $θ$? We present RAWLS (Randomly Aggregated UnWeighted Least Squares Support Recovery): the main idea is to take random subsets of the $N$ equations, perform a least squares recovery over this reduced bit of information and then average over many random subsets. We show that the proposed procedure can provably recover an approximation of $θ$ and demonstrate its use in support recovery through numerical examples.
Stefan Steinerberger, Aleh Tsyvinski
We consider a small set of axioms for income averaging -- recursivity, continuity, and the boundary condition for the present. These properties yield a unique averaging function that is the density of the reflected Brownian motion with a drift started at the current income and moving over the past incomes. When averaging is done over the short past, the weighting function is asymptotically converging to a Gaussian. When averaging is done over the long horizon, the weighing function converges to the exponential distribution. For all intermediate averaging scales, we derive an explicit solution that interpolates between the two.
Jeremy G. Hoskins, Stefan Steinerberger
Let $x_1, \dots, x_n$ be $n$ independent and identically distributed random variables with mean zero, unit variance, and finite moments of all remaining orders. We study the random polynomial $p_n$ having roots at $x_1, \dots, x_n$. We prove that for $\ell \in \mathbb{N}$ fixed as $n \rightarrow \infty$, the $(n-\ell)-$th derivative of $p_n^{}$ behaves like a Hermite polynomial: for $x$ in a compact interval,$${n^{\ell/2}} \frac{\ell!}{n!} \cdot p_n^{(n-\ell)}\left( \frac{x}{\sqrt{n}}\right) \rightarrow He_{\ell}(x + γ_n),$$ where $He_{\ell}$ is the $\ell-$th probabilists' Hermite polynomial and $γ_n$ is a random variable converging to the standard $\mathcal{N}(0,1)$ Gaussian as $n \rightarrow \infty$. Thus, there is a universality phenomenon when differentiating a random polynomial many times: the remaining roots follow a Wigner semicircle distribution.
Stefan Steinerberger
Let $Ω\subset \mathbb{R}^2$ be a bounded, convex domain and let $-Δφ_1 = μ_1 φ_1$ be the first nontrivial Laplacian eigenfunction with Neumann boundary conditions. The Hot Spots conjecture claims that the maximum and minimum are attained at the boundary. We show that they are attained far away from one another: if $x_1, x_2 \in Ω$ satisfy $\|x_1 - x_2\| = \mbox{diam}(Ω)$, then every maximum and minimum is assumed within distance $c\cdot \mbox{inrad}(Ω)$ of $x_1$ and $x_2$, where $c$ is a universal constant (which is the optimal scaling up to the value of $c$).
Louis Brown, Stefan Steinerberger
We discuss the classical problem of measuring the regularity of distribution of sets of $N$ points in $\mathbb{T}^d$. A recent line of investigation is to study the cost ($=$ mass $\times$ distance) necessary to move Dirac measures placed in these points to the uniform distribution. We show that Kronecker sequences satisfy optimal transport distance in $d \geq 3$ dimensions. This shows that for differentiable $f: \mathbb{T}^d \rightarrow \mathbb{R}$ and badly approximable vectors $α\in \mathbb{R}^d$, we have $$ \ | \int_{\mathbb{T}^d} f(x) dx - \frac{1}{N} \sum_{k=1}^{N} f(k α) \ | \leq c_α \frac{ \| \nabla f\|^{(d-1)/d}_{L^{\infty}}\| \nabla f\|^{1/d}_{L^{2}} }{N^{1/d}}.$$ We note that the result is uniformly true for a sequence instead of a set. Simultaneously, it refines the classical integration error for Lipschitz functions, $\| \nabla f\|_{L^{\infty}} N^{-1/d}$. We obtain a similar improvement for numerical integration with respect to the regular grid. The main ingredient is an estimate involving Fourier coefficients of a measure; this allows for existing estimates to be conviently `recycled'. We present several open problems.
Stefan Steinerberger
We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.
Stefan Steinerberger
The purpose of this note is to point out that the theory of expander graphs leads to an interesting test whether $n$ real numbers $x_1, \dots, x_n$ could be $n$ independent samples of a random variable. To any distinct, real numbers $x_1, \dots, x_n$, we associate a 4-regular graph $G$ as follows: using $π$ to denote the permutation ordering the elements, $x_{π(1)} < x_{π(2)} < \dots < x_{π(n)}$, we build a graph on $\left\{1, \dots, n\right\}$ by connecting $i$ and $i+1$ (cyclically) and $π(i)$ and $π(i+1)$ (cyclically). If the numbers are i.i.d. samples, then a result of Friedman implies that $G$ is close to Ramanujan. This suggests a test for whether these numbers are i.i.d: compute the second largest (in absolute value) eigenvalue of the adjacency matrix. The larger $λ- 2\sqrt{3}$, the less likely it is for the numbers to be i.i.d. We explain why this is a reasonable test and give many examples.
Trevor J. Richards, Stefan Steinerberger
Let $p:\mathbb{C} \rightarrow \mathbb{C}$ be a polynomial. The Gauss-Lucas theorem states that its critical points, $p'(z) = 0$, are contained in the convex hull of its roots. A recent quantitative version Totik shows that if almost all roots are contained in a bounded convex domain $K \subset \mathbb{C}$, then almost all roots of the derivative $p'$ are in a $\varepsilon-$neighborhood $K_{\varepsilon}$ (in a precise sense). We prove another quantitative version: if a polynomial $p$ has $n$ roots in $K$ and $\lesssim c_{K, \varepsilon} (n/\log{n})$ roots outside of $K$, then $p'$ has at least $n-1$ roots in $K_{\varepsilon}$. This establishes, up to a logarithm, a conjecture of the first author: we also discuss an open problem whose solution would imply the full conjecture.
Stefan Steinerberger
A central result of Sturm-Liouville theory (also called the Sturm-Hurwitz Theorem) states that if $φ_k$ is a sequence of eigenfunctions of a second order differential operator on the interval $I \subset \mathbb{R}$, then any linear combination satisfies a uniform bound on the roots $$ \# \left\{x \in I:\sum_{k \geq n}{ a_k φ_k(x)} = 0 \right\} \geq n-1.$$ We provide a sharp (up to logarithmic factors) generalization to two dimensions: let $(M,g)$ be a compact two-dimensional manifold (with or without boundary), let $(φ_k)$ denote the sequence of eigenfunctions of a uniformly elliptic operator $-\mbox{div}(a(\cdot) \nabla)$ (with Dirichlet or Neumann boundary conditions). Then, for any linear combination of eigenfunctions above a certain index $n$, $$ f = \sum_{k \geq n}{a_k φ_k} ~ \mbox{we have} \quad \mathcal{H}^1 \left\{ x: f(x) = 0\right\} \gtrsim_{} \frac{\sqrt{n}}{\sqrt{\log{n}}} \log \left(n \frac{\|f\|_{L^2(M)}}{\|f\|_{L^1(M)}} \right)^{-1/2} \frac{\|f\|_{L^1(M)}}{\| f \|_{L^{\infty}(M)}} .$$ Examples on $M=\mathbb{T}^2$ and $M=\mathbb{S}^2$ shows that this is optimal up to the logarithmic factors. The proof is using optimal transport and a new inequality for the Wasserstein metric $W_p$: if $f(x)dx$ and $g(x)dx$ are two absolutely continuous measures on a two-dimensional domain $M$ with continuous densities and the same total mass, then, for all $1 \leq p <\infty$, $$ W_p(f(x)dx, g(x) dx) \cdot \mathcal{H}^1 \left\{x \in M: f(x) = g(x) \right\} \gtrsim_{M,p} \frac{\|f-g\|_{L^1(M)}^{1+1/p}}{\|f-g\|_{L^{\infty}(M)}}.$$
Jianfeng Lu, Christopher D. Sogge, Stefan Steinerberger
We consider Laplacian eigenfunctions on a $d-$dimensional bounded domain $M$ (or a $d-$dimensional compact manifold $M$) with Dirichlet conditions. These operators give rise to a sequence of eigenfunctions $(e_\ell)_{\ell \in \mathbb{N}}$. We study the subspace of all pointwise products $$ A_n = \mbox{span} \left\{ e_i(x) e_j(x): 1 \leq i,j \leq n\right\} \subseteq L^2(M).$$ Clearly, that vector space has dimension $\mbox{dim}(A_n) = n(n+1)/2$. We prove that products $e_i e_j$ of eigenfunctions are simple in a certain sense: for any $\varepsilon > 0$, there exists a low-dimensional vector space $B_n$ that almost contains all products. More precisely, denoting the orthogonal projection $Π_{B_n}:L^2(M) \rightarrow B_n$, we have $$ \forall~1 \leq i,j \leq n~ \qquad \|e_ie_j - Π_{B_n}( e_i e_j) \|_{L^2} \leq \varepsilon$$ and the size of the space $\mbox{dim}(B_n)$ is relatively small: for every $δ> 0$, $$ \mbox{dim}(B_n) \lesssim_{M,δ} \varepsilon^{-δ} n^{1+δ}.$$ We obtain the same sort of bounds for products of arbitrary length, as well for approximation in $H^{-1}$ norm. Pointwise products of eigenfunctions are low-rank. This has implications, among other things, for the validity of fast algorithms in electronic structure computations.
Jianfeng Lu, Stefan Steinerberger
We consider the variational problem of cross-entropy loss with $n$ feature vectors on a unit hypersphere in $\mathbb{R}^d$. We prove that when $d \geq n - 1$, the global minimum is given by the simplex equiangular tight frame, which justifies the neural collapse behavior. We also prove that as $n \rightarrow \infty$ with fixed $d$, the minimizing points will distribute uniformly on the hypersphere and show a connection with the frame potential of Benedetto & Fickus.
Stefan Steinerberger
Phase retrieval is concerned with recovering a function $f$ from the absolute value of its Fourier transform $|\widehat{f}|$. We study the stability properties of this problem in Lebesgue spaces. Our main results shows that $$ \| f-g\|_{L^2(\mathbb{R}^n)} \leq 2\cdot \| |\widehat{f}| - |\widehat{g}| \|_{L^2(\mathbb{R}^n)} + h_f\left( \|f-g\|^{}_{L^p(\mathbb{R}^n)}\right) + J(\widehat{f}, \widehat{g}),$$ where $1 \leq p < 2$, $h_f$ is an explicit nonlinear function depending on the smoothness of $f$ and $J$ is an explicit term capturing the invariance under translations. A noteworthy aspect is that the stability is phrased in terms of $L^p$ for $1 \leq p < 2$ while, usually, $L^p$ cannot be used to control $L^2$, the stability estimate has the flavor of an inverse Hölder inequality. It seems conceivable that the estimate is optimal up to constants.
Stefan Steinerberger
We consider the nonlinear Poisson equation $-Δu = f(u)$ in domains $Ω\subset \mathbb{R}^n$ with Dirichlet boundary conditions on $\partial Ω$. We show (for monotonically increasing concave $f$ with small Lipschitz constant) that if $D^2 u$ is negative semi-definite on the boundary, then $u$ is concave. A conjecture of Saint Venant from 1856 (proven by Polya in 1948) is that among all domains $Ω$ of fixed measure, the solution of $-Δu =1$ assumes its largest maximum when $Ω$ is a ball. We extend this to $-Δu =f(u)$ for monotonically increasing $f$ with small Lipschitz constant.
Ronald R. Coifman, Nicholas F. Marshall, Stefan Steinerberger
Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant function that is smooth with respect to all graphs in $\mathcal{G}$, simultaneously, and how to find it if it exists.
Eric C. Chi, Stefan Steinerberger
Convex clustering refers, for given $\left\{x_1, \dots, x_n\right\} \subset \mathbb{R}^p$, to the minimization of \begin{eqnarray*} u(γ) & = & \underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}^{n}{\lVert x_i - u_i \rVert^2} + γ\sum_{i,j=1}^{n}{w_{ij} \lVert u_i - u_j\rVert},\\ \end{eqnarray*} where $w_{ij} \geq 0$ is an affinity that quantifies the similarity between $x_i$ and $x_j$. We prove that if the affinities $w_{ij}$ reflect a tree structure in the $\left\{x_1, \dots, x_n\right\}$, then the convex clustering solution path reconstructs the tree exactly. The main technical ingredient implies the following combinatorial byproduct: for every set $\left\{x_1, \dots, x_n \right\} \subset \mathbb{R}^p$ of $n \geq 2$ distinct points, there exist at least $n/6$ points with the property that for any of these points $x$ there is a unit vector $v \in \mathbb{R}^p$ such that, when viewed from $x$, `most' points lie in the direction $v$ \begin{eqnarray*} \frac{1}{n-1}\sum_{i=1 \atop x_i \neq x}^{n}{ \left\langle \frac{x_i - x}{\lVert x_i - x \rVert}, v \right\rangle} & \geq & \frac{1}{4}. \end{eqnarray*}
Stefan Steinerberger
We introduce a notion of curvature on finite, combinatorial graphs. It can be easily computed by solving a linear system of equations. We show that graphs with curvature bounded below by $K>0$ have diameter bounded by $\mbox{diam}(G) \leq 2/K$ (a Bonnet-Myers theorem), that $\mbox{diam}(G) = 2/K$ implies that $G$ has constant curvature (a Cheng theorem) and that there is a spectral gap $λ_1 \geq K/(2n)$ (a Lichnerowicz theorem). It is computed for several families of graphs and often coincides with Ollivier curvature or Lin-Lu-Yau curvature. The von Neumann minimax theorem features prominently in the proofs.