Natalie Collina, Jiuyao Lu, Georgy Noarov, Aaron Roth
We study the minimax sample complexity of multicalibration in the batch setting. A learner observes $n$ i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most $\varepsilon$ with respect to a given family of groups. For every fixed $κ> 0$, in the regime $|G|\le \varepsilon^{-κ}$, we prove that $\widetildeΘ(\varepsilon^{-3})$ samples are necessary and sufficient, up to polylogarithmic factors. The lower bound holds even for randomized predictors, and the upper bound is realized by a randomized predictor obtained via an online-to-batch reduction. This separates the sample complexity of multicalibration from that of marginal calibration, which scales as $\widetildeΘ(\varepsilon^{-2})$, and shows that mean-ECE multicalibration is as difficult in the batch setting as it is in the online setting, in contrast to marginal calibration which is strictly more difficult in the online setting. In contrast we observe that for $κ= 0$, the sample complexity of multicalibration remains $\widetildeΘ(\varepsilon^{-2})$ exhibiting a sharp threshold phenomenon. More generally, we establish matching upper and lower bounds, up to polylogarithmic factors, for a weighted $L_p$ multicalibration metric for all $1 \le p \le 2$, with optimal exponent $3/p$. We also extend the lower-bound template to a regular class of elicitable properties, and combine it with the online upper bounds of Hu et al. (2025) to obtain matching bounds for calibrating properties including expectiles and bounded-density quantiles.
Zhigen Zhao, Shonosuke Sugaasawa
Empirical Bayes methods are widely used for large-scale inference, yet most classical approaches assume homoscedastic observations and focus primarily on posterior mean estimation. We develop a nonparametric empirical Bayes framework for the heteroscedastic normal means problem with unequal and unknown variances. Our first contribution is a generalized Tweedie-type identity that expresses the Bayes estimator entirely in terms of the joint marginal density of the observed statistics and its partial derivatives, extending the classical Tweedie's formula to settings with heterogeneous and unknown variances. Our second contribution is to introduce a moment-generating-function representation that enables recovery of the full posterior distribution within the f-modeling paradigm without specifying or estimating the prior distribution. The resulting method provides a unified framework for point estimation, uncertainty quantification, and hypothesis testing while accommodating arbitrary dependence between means and variances. Simulation studies and real-data analysis demonstrate that the proposed approach achieves accurate shrinkage estimation and reliable posterior inference in heterogeneous data environments.
Sebastian Arnold, Yo Joong Choe, Marco Scarsini, Ilia Tsetlin
How can we monitor, in real time, whether one uncertain prospect has any upside over another? To answer this question, we develop a novel family of sequential, anytime-valid tests for stochastic dominance (SD; also known as stochastic ordering), a classical and popular notion for comparing entire distribution functions. The problem is distinct from the popular problem of testing for dominance in means, which would not capture distributional differences beyond the first moment. We first derive powerful, nonparametric e-processes that quantify evidence against the null hypothesis that one prospect is dominated by another. For first-order SD, these e-processes are constructed as a mixture of asymptotically growth-rate optimal e-variables and yield a test of power one. The approach further generalizes to sequential testing for SD beyond the first order, including any higher-order SD. Empirically, we demonstrate that the resulting sequential tests are competitive with existing non-sequential SD tests in terms of power, while achieving validity under continuous monitoring that existing methods do not. Finally, we sketch the complementary and challenging problem of testing the non-SD null hypothesis, which asks whether a prospect has a definite upside, and describe the conditions under which we can derive a nontrivial anytime-valid test.
Tim Kutta, Nina Dörnemann, Piotr Kokoszka
Functional data analysis is concerned with the analysis of infinite-dimensional data functions. Functional principal component analysis (FPCA) is a key method to obtain finite-dimensional summaries. Consistency of FPCA has been theoretically established for sufficiently regular data functions. However, empirical evidence shows that FPCA can become severely inconsistent when the underlying functions are too rough. This paper provides the first theoretical explanation for this phenomenon. We propose a model that explicitly captures the roughness of functional data and allows us to quantify the resulting bias of FPCA, depending on the functional roughness. The model undergoes a phase transition marking the point at which FPCA becomes entirely uninformative. Based on these probabilistic results, we discuss diagnostic tests for informative principal components. As an additional contribution, we derive results on spectral statistics that may serve as a foundation for goodness-of-fit tests for rough functional data. Mathematically, our approach combines recent advances in random matrix theory and generic chaining with tools from FDA. We illustrate the effects of roughness on FPCA using simulations, as well as climate and environmental datasets.
Manli Cheng, Yangjianchen Xu, Qinglong Tian, Pengfei Li
Logistic regression is widely used to model the propensity score in the analysis of nonignorable missing data. However, goodness-of-fit testing for this propensity score model has received limited attention in the literature. In this paper, we propose a new goodness-of-fit testing procedure for the logistic propensity score model under nonignorable missing data. The proposed test is based on an unweighted sum-of-squared residuals constructed from the marginal missingness mechanism and accommodates the partial observability of the outcome. We establish the asymptotic distribution of the test statistic under both the null hypothesis and general alternatives, and develop a bootstrap procedure with theoretical guarantees to approximate its null distribution. We show that the resulting bootstrap test attains asymptotically correct size and is consistent, with power converging to one under model misspecification. Simulation studies and a real data application demonstrate that the proposed method performs well in finite samples.
Nick W. Koning
We introduce the E-measure: a measure-like generalization of the E-value to a class of hypotheses. Unlike classical measures, E-measures are closed under infimums instead of addition. They arise from a compatibility axiom with logical implications, that there should be at least as much evidence against more specific hypotheses. We show that E-measures are the only non-dominated such objects, if the hypothesis class is closed under intersections. We propose to use the E-measure to present all the relevant evidence for a problem, where the relevance is captured by the choice of hypothesis class. We showcase this by applying the E-measure to decision making, inducing a hypothesis class from the uncertain consequences of decisions. This results in uniform E-consequence bounds on decisions, which nest high-probability loss bounds. Correcting for multiplicity, we consider 'familywise evidence' and 'false evidence rate' control, generalizing from errors and discoveries to continuous evidence. Remarkably, E-measures control these without multiplicity correction if the hypothesis class is intersection-closed. Moreover, we obtain a 'frequentist' notion of updating from E-prior to E-posterior. Abstracting the notion of a 'hypothesis', we advocate for using E-measures for any unknown quantity, leading to predictive E-measures.
R. Labouriau
Many important statistical models fall outside classical moment-based methods due to the non-existence of moments or moment generating functions. We propose a generalised probabilistic framework in which densities are replaced by pairs $(T,\varphi)$, where $T \in \mathcal{S}'(\mathbb{R})$ is a tempered distribution and $\varphi \in \mathcal{S}(\mathbb{R})$ is a Schwartz kernel. Expectations are defined via the action of distributions on regularised test functions, yielding well-defined weak moments, weak characteristic functions, and weak cumulants of all orders. These extend classical quantities and retain key algebraic properties such as additivity under independence and natural affine transformation rules. The main results are: (i) a systematic algebra of weak cumulants; (ii) a weak moment problem where existence of all moments holds unconditionally and uniqueness depends on the kernel, with uniqueness results under Gaussian kernels (via Hermite completeness), positive Schwartz kernels with square-integrable densities (via a Carleman-type criterion), and kernels with exponential decay (via Denjoy-Carleman quasi-analyticity); and (iii) a weak central limit theorem formulated as convergence of weak characteristic functions to a Gaussian limit, covering cases where the classical theorem fails. The framework is illustrated with Student's $t$, stable, and hyperbolic distributions. As a statistical consequence, the weak first moment yields a consistent estimator of the location parameter in the Cauchy model, where no classical moment-based estimator exists. A full statistical treatment is given in a companion paper.
Hongjian Wang, Aaditya Ramdas
We develop e-values and e-processes testing the null hypothesis that a distribution over nonnegative integers is monotone, and that a distribution over integers is unimodal given a certain mode. Our e-processes lead to tests of power one under any non-null distribution with a sequence of i.i.d. observations, and consistent set-valued mode estimators that eventually equal the true set of modes. Additionally, we characterize the set of all e-values, and therefore the set of all valid tests, with one monotone and unimodal observation, as well as the most powerful e-value for a fixed alternative. We then show that many of our results can be generalized to continuous random variables, relating them to the existing results in the shape-constrained inference literature.
Marko Lalovic, Nicos Georgiou, Istvan Z. Kiss
We develop a likelihood-based inference for finite-state birth-death processes with composite birth rates, in which multiple distinct mechanisms contribute additively to the total birth intensity. Our main motivating example is an SIS epidemic model with pairwise and higher-order transmission. The process is observed through a single aggregate trajectory, and in the main setting of interest, birth events are unmarked. This creates a deconvolution problem in event space: the state is one-dimensional, but the mechanism underlying each birth is latent. We formulate the inference under a Doob $h$-transformed $Q$-process, which is time-homogeneous and ergodic and which provides a time-homogeneous asymptotic surrogate for the law of the original process conditioned on long survival. We derive the corresponding conditional likelihood and study both the conditional maximum likelihood estimator and a quasi-maximum likelihood estimator which is based on a simplified working score. Under the Doob-transform law, we prove consistency and asymptotic normality for both estimators, with asymptotic covariance determined by the inverse Fisher and inverse Godambe information matrices, respectively. We also showcase a practical one-dimensional test for the presence of a specific higher-order birth mechanism.
Huanyan Zhu, Cheng Li
Gaussian processes are widely used for accurate emulation of unknown surfaces in sequential design of expensive simulation experiments. Integrated mean squared error (IMSE) is an effective acquisition function for sequential designs based on Gaussian processes. However, existing approaches struggle with its implementation because the required integrals often lack closed-form expressions for most kernel functions. We propose a novel and computationally efficient Hilbert space Gaussian process approximation for the IMSE acquisition function, where a truncated eigenbasis representation of the integral enables closed-form evaluation. We establish sharp global non-asymptotic bounds for both the approximation error of isotropic kernels and the resulting error in the acquisition function. In a series of numerical experiments with $γ$-stabilizing, the proposed method achieves substantially lower prediction error and reduced computation time compared to existing benchmarks. These results demonstrate that the proposed Hilbert space Gaussian process framework provides an accurate and computationally efficient approach for Gaussian process based sequential design.
Nils Lid Hjort
This invited paper proposes and discusses several Bayesian attempts at nonparametric and semiparametric density estimation. The main categories of these ideas are as follows: 1) Build a nonparametric prior around a given parametric model. We look at cases where the nonparametric part of the construction is a Dirichlet process or relatives thereof. (2) Express the density as an additive expansion of orthogonal basis functions, and place priors on the coefficients. Here attention is given to a certain robust Hermite expansion around the normal distribution. Multiplicative expansions are also considered. (3) Express the unknown density as locally being of a certain parametric form, then construct suitable local likelihood functions to express information content, and place local priors on the local parameters.
Shubhada Agrawal, Aaditya Ramdas
Consider betting against a sequence of data in $[0,1]$, where one is allowed to make any bet that is fair if the data have a conditional mean $m_0 \in (0,1)$. Cover's universal portfolio algorithm delivers a worst-case regret of $O(\ln n)$ compared to the best constant bet in hindsight, and this bound is unimprovable against adversarially generated data. In this work, we present a novel mixture betting strategy that combines insights from Robbins and Cover, and exhibits a different behavior: it eventually produces a regret of $O(\ln \ln n)$ on \emph{almost} all paths (a measure-one set of paths if each conditional mean equals $m_0$ and intrinsic variance increases to $\infty$), but has an $O(\log n)$ regret on the complement (a measure zero set of paths). Our paper appears to be the first to point out the value in hedging two very different strategies to achieve a best-of-both-worlds adaptivity to stochastic data and protection against adversarial data. We contrast our results to those in~\cite{agrawal2025regret} for a sub-Gaussian mixture on unbounded data: their worst-case regret has to be unbounded, but a similar hedging delivers both an optimal betting growth-rate and an almost sure $\ln\ln n$ regret on stochastic data. Finally, our strategy witnesses a sharp game-theoretic upper law of the iterated logarithm, analogous to~\cite{shafer2005probability}.
Tianyi Chen, Mohammad Sharifi Kiasari, Sijing Yu, Youngser Park, Avanti Athreya, Vince Lyzinski, Carey E Priebe, Zachary Lubberts
Inference for time series of networks often relies on accurate vertex correspondence between network realizations at different times. In practice, however, such vertex alignments can be misspecified or unknown. We study the impact of vertex alignment on changepoint localization for dynamic networks through two illustrative models, each with a similar changepoint, with the key distinction being whether changepoint information is contained in marginal or joint distributions of the time-varying latent positions. We compare localization techniques ranging from the simple network statistic of average degree to the modern procedure of Euclidean mirrors. In one model, vertex misalignment causes little error, and in the other, it impairs localization in ways that cannot be corrected through graph matching or optimal transport, which we show are closely related in this setting. Our results demonstrate that robust network inference necessitates reckoning with the subtle interplay of marginal and joint information in the observed network time series.
Beibei Li, Wenge Guo
In many statistical applications, particularly in clinical studies, hypotheses may carry different levels of importance, motivating the use of weighted multiple testing procedures (wMTPs) to control the familywise error rate (FWER). Among these approaches, two weighted Holm procedures are commonly used: the weighted Holm procedure (WHP), which is based on ordered weighted $p$-values, and the weighted alternative Holm procedure (WAP), which relies on ordered raw $p$-values. This paper provides a systematic comparison of these two procedures, along with practical recommendations for their use. We first examine their corresponding closed testing procedures (CTPs) and show that WHP is uniformly more powerful than WAP. We further investigate their structural properties, demonstrating that WAP, while consonant, lacks monotonicity. To facilitate communication with non-statisticians, we introduce graphical representations of both procedures using a common initial graph and distinct updating strategies. In addition, we derive adjusted $p$-values and adjusted weighted $p$-values for both methods. Finally, we establish an optimality result: WHP cannot be improved by enlarging any of its critical values without violating FWER control, whereas WAP is optimal only under specific conditions. Simulation studies support these theoretical findings and highlight the superior FWER control and average power of WHP.
Manuel Fernandez, Ludovic Stephan, Yizhe Zhu
We study the community detection problem in the non-uniform hypergraph stochastic block model (HSBM), where hyperedges of varying sizes coexist. This setting captures higher-order and multi-view interactions and raises a fundamental question: can multiple uniform hypergraph layers below the detection threshold be combined to enable weak recovery? We answer this question by establishing a Kesten--Stigum-type bound for weak recovery in a general class of non-uniform HSBMs with $r$ blocks, generated according to multiple symmetric probability tensors. In the case $r=2$, we show that weak recovery is possible whenever the sum of the signal-to-noise ratios across all uniform hypergraph layers exceeds one, thereby confirming the positive part of a conjecture in (Chodrow et al., 2023). Moreover, we provide a polynomial-time spectral algorithm that achieves this threshold via an optimally weighted non-backtracking operator. For the unweighted non-backtracking matrix, our spectral method attains a different algorithmic threshold, also conjectured in (Chodrow et al., 2023). Our approach develops a spectral theory for weighted non-backtracking operators on non-uniform hypergraphs, including a precise characterization of outlier eigenvalues and eigenvector overlaps. We introduce a novel Ihara--Bass formula tailored to weighted non-uniform hypergraphs, which yields an efficient low-dimensional representation and leads to a provable spectral reconstruction algorithm. Taken together, these results provide a principled and computationally efficient approach to clustering in non-uniform hypergraphs, and highlight the role of optimal weighting in aggregating heterogeneous higher-order interactions.
Yan Zhang
This work makes two advances in the study of the (approximate) nonparametric maximum likelihood estimator (NPMLE) for exponential family mixture models. First, we develop a data-compression strategy that reduces the cost of repeated likelihood evaluations in NPMLE computation to logarithmic order in the sample size. Second, we show that, for a broad class of approximate NPMLEs, the resulting marginal density estimator attains an almost parametric rate of convergence.
Guillaume Gautier, Rémi Bardenet, Michal Valko
The standard Monte Carlo estimator $\widehat{I}_N^{\mathrm{MC}}$ of $\int fdω$ relies on independent samples from $ω$ and has variance of order $1/N$. Replacing the samples with a determinantal point process (DPP), a repulsive distribution, makes the estimator consistent, with variance rates that depend on how the DPP is adapted to $f$ and $ω$. We examine two existing DPP-based estimators: one by Bardenet & Hardy (2020) with a rate of $\mathcal{O}(N^{-(1+1/d)})$ for smooth $f$, but relying on a fixed DPP. The other, by Ermakov & Zolotukhin (1960), is unbiased with rate of order $1/N$, like Monte Carlo, but its DPP is tailored to $f$. We revisit these estimators, generalize them to continuous settings, and provide sampling algorithms.
Junhao Bian, Yilin Bi, Tao Zhou
Hypergraphs serve as an effective tool widely adopted to characterize higher-order interactions in complex systems. The most intuitive and commonly used mathematical instrument for representing a hypergraph is the incidence matrix, in which each entry is binary, indicating whether the corresponding node belongs to the corresponding hyperedge. Although the incidence matrix has become a foundational tool for hypergraph analysis and mining, we argue that its binary nature is insufficient to accurately capture the complexity of node-hyperedge relationships arising from the fact that different hyperedges can contain vastly different numbers of nodes. Accordingly, based on the resource allocation process on hypergraphs, we propose a continuous-valued matrix to quantify the proximity between nodes and hyperedges. To verify the effectiveness of the proposed proximity matrix, we investigate three important tasks in hypergraph mining: link prediction, vital nodes identification, and community detection. Experimental results on numerous real-world hypergraphs show that simply designed algorithms centered on the proximity matrix significantly outperform benchmark algorithms across these three tasks.
Andrew D. McRae, Richard Y. Zhang
Low-rank matrix recovery can be solved to statistical optimality by convex matrix optimization under the classical assumption of restricted isometry property (RIP). However, for large problems, the convex formulation is commonly replaced by a smooth rank-constrained factored nonconvex problem for which algorithmic theory typically only guarantees convergence to second-order critical points. In this paper, we develop a sharp and statistically optimal theory for second-order critical points of the factored nonconvex matrix LASSO (nuclear-norm--regularized least-squares estimator) under RIP with particular emphasis on the overparametrized regime where the search rank $r$ exceeds the ground-truth rank $r_*$. Our recovery error bounds reveal the precise role of nuclear norm regularization, interpolating between the classical convex rate and known rates for the unregularized nonconvex problem. Complementing this positive result, we give examples showing that, contrary to popular belief, rank overparametrization does not always improve the optimization landscape even under RIP. This negative result raises questions about the fundamental statistical recovery capability of rank-constrained nonconvex approaches in comparison to convex approaches which have worse computational scaling. All of our results generalize to arbitrary convex functions with nuclear-norm regularization under restricted strong convexity and smoothness. In particular, we give sharp conditions under which second-order critical points of the nonconvex problem either (1) approximately recover low-rank approximate minima of the convex problem or (2) exactly recover a low-rank global optimum if one exists.
Pierre-François Massiani, Sebastian Schulze, Mattes Mollenhauer
We introduce the concept of an asymptotic e-process, which is a doubly indexed stochastic process $(E_{m,n})_{m,n\in\mathbb{N}}$ that approximates an e-process with monitoring time $n$ in terms of a suitable limiting behavior for an approximation parameter $m\to \infty$. This theory is motivated by practical applications in sequential hypothesis testing, in which e-variables can only be constructed approximately from observations due to model misspecification or estimation errors. We derive an asymptotic version of Ville's inequality, which bounds excursion probabilities of $(E_{m,n})_{m,n\in\mathbb{N}}$ over some threshold uniformly over $n$ up to a time horizon $r_m$ that is determined by the quality of process approximation over $m$. We investigate properties of asymptotic e-processes, their connections to asymptotic supermartingales, and provide examples of how they can be constructed from asymptotic e-variables.