Sebastiano A. Piccolo, Andrea Tagarelli
Identifying critical nodes in complex networks is a fundamental task in graph mining. Yet, methods addressing an all-or-nothing coverage mechanics in a bipartite dependency network, a graph with two types of nodes where edges represent dependency relationships across the two groups only, remain largely unexplored. We formalize the CriticalSet problem: given an arbitrary bipartite graph modeling dependencies of items on contributors, identify the set of k contributors whose removal isolates the largest number of items. We prove that this problem is NP-hard and requires maximizing a supermodular set function, for which standard forward greedy algorithms provide no approximation guarantees. Consequently, we model CriticalSet as a coalitional game, deriving a closed-form centrality, ShapleyCov, based on the Shapley value. This measure can be interpreted as the expected number of items isolated by a contributor's departure. Leveraging these insights, we propose MinCov, a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy, prioritizing contributors who uniquely support many items. Extensive experiments on synthetic and large-scale real datasets, including a Wikipedia graph with over 250 million edges, reveal that MinCov and ShapleyCov significantly outperform traditional baselines. Notably, MinCov achieves near-optimal performance, within 0.02 AUC of a Stochastic Hill Climbing metaheuristic, while remaining several orders of magnitude faster.
Mariko I. Ito, Hiroyuki Hasada, Yudai Honma, Takaaki Ohnishi, Tsutomu Watanabe, Kazuyuki Aihara
Market instability has been extensively studied using mathematical approaches to characterize complex trading dynamics and detect structural change points. This study explores the potential for early warning of market instability by applying the Dynamical Network Marker (DNM) theory to order placement and execution data from the Tokyo Stock Exchange. DNM theory identifies indicators associated with critical slowing down -- a precursor to critical transitions -- in high-dimensional systems of many interacting elements. In this study, market participants are identified using virtual server IDs from the trading system, and multivariate time series representing their trading activities are constructed. This framework treats each participant as an interacting element, thereby enabling the application of DNM theory to the resulting time series. The results suggest that early warning signals of large price movements can be detected on a daily time scale. These findings highlight the potential to develop practical DNM-based early-warning systems for large price movements by further refining forecasting horizons and integrating multiple time series capturing different aspects of trading behavior.
Shady E. Ahmed, Hui Wan, Saad Qadeer, Panos Stinis, Kezhen Chong, Mohammad Taufiq Hassan Mozumder, Kai Zhang, Ann S. Almgren
Toward the goal of using Scientific Machine Learning (SciML) emulators to improve the numerical representation of aerosol processes in global atmospheric models, we explore the emulation of aerosol microphysics processes under cloud-free conditions in the 4-mode Modal Aerosol Module (MAM4) within the Energy Exascale Earth System Model version 2 (E3SMv2). To develop an in-depth understanding of the challenges and opportunities in applying SciML to aerosol processes, we begin with a simple feedforward neural network architecture that has been used in earlier studies, but we systematically examine key emulator design choices, including architecture complexity and variable normalization, while closely monitoring training convergence behavior. Our results show that optimization convergence, scaling strategy, and network complexity strongly influence emulation accuracy. When effective scaling is applied and convergence is achieved, the relatively simple architecture, used together with a moderate network size, can reproduce key features of the microphysics-induced aerosol concentration changes with promising accuracy. These findings provide practical clues for the next stages of emulator development; they also provide general insights that are likely applicable to the emulation of other aerosol processes, as well as other atmospheric physics involving multi-scale variability.
Haitao Gao, Aaryash Bharadwaj
The Smith Hat tile is the first known aperiodic monotile, having been discovered in 2023. The simple structure, constructed using only 8 kites, is unique and well motivated for analysis within percolation theory. The primary goal of this paper is to discover the critical threshold $p_c$ in both site and bond Bernoulli structures using Monte Carlo simulation for the Smith hat tile(1,$\sqrt3$). Our findings are site and bond values of $p_c^s = 0.822725 \pm 0.000044$ and $p_c^b = 0.798161 \pm 0.000044$ for edge percolation and $0.544247 \pm 0.000101$ for site percolation on the dual graph.
Ranit Das, Marie Hein, Gregor Kasieczka, Michael Krämer, Lukas Lang, Radha Mastandrea, Louis Moureaux, Alexander Mück, David Shih
An enormous amount of R&D effort has resulted in many new resonant anomaly detection methods being proposed in recent years. However, the vast majority of previous R&D studies have suffered from two limitations: they have focused on a very small set of simulated signal benchmark models; and they have either used small sets of carefully crafted high-level jet substructure observables, which can be highly performant but are prone to model dependence, or the full collider event phase space, which is more agnostic but suffers from reduced sensitivity. In this work, we address both limitations: we formulate a number of new simulated signal benchmarks, which we make publicly available in a format fully compatible with the LHCO R&D benchmark; and we explore a high-level, yet highly agnostic, observable set consisting of Energy Flow Polynomials in addition to the usual subjettiness variables. We evaluate this "kitchen sink" observable set for both an idealized anomaly detector and the CWoLa hunting task, along with three baseline observable sets (the Baseline LHC Olympics set, subjettiness observables, and Energy Flow Polynomials). We find that our kitchen sink approach is the most sensitive to a broad range of signal types. Furthermore, we show that an attribute bagging variant, in which each ensemble member is trained on a random subset of substructure observables, yields comparable anomaly detection performance while significantly reducing training cost.
Dinh Triem Phan, Jérôme Bobin, Cheick Thiam, Christophe Bobin
Identifying and quantifying $γ$-emitting radionuclides, considering spectral deformation from $γ$-interactions in radioactive source surroundings, present a significant challenge in $γ$-ray spectrometry. In that context, a hybrid machine learning method has been previously proposed to jointly estimate the counting and spectral signatures of $γ$-emitters under conditions of spectral variability. This paper addresses the uncertainty quantification of the estimators (i.e., the counting and the variable $λ$ which characterizes the spectral signatures) obtained by this spectral unmixing algorithm. The focus is on the coverage interval, as defined by the GUM, which corresponds closely to a credible interval in the Bayesian framework. Given the inverse problem and the constraints associated with spectral deformation, two Bayesian methods - Laplace approximation and Markov Chain Monte Carlo - have been developed for uncertainty quantification to ensure robust decision-making. The Laplace approximation technique approximates the posterior distribution by a Gaussian distribution, while the Markov Chain Monte Carlo technique samples the posterior distribution. This study evaluates these two methods in terms of precision of coverage interval based on repeated Monte Carlo samples using the long-run success rate. Numerical experiments show that both methods yield similar results close to the expected success rate of 95.4$\%$ when constraints related to spectral signatures deformation and counting are inactive. However, when constraints are active or the background counting significantly dominates other radionuclides, the Laplace approximation method deviates from the expected long-run success rate due to the non-Gaussian posterior distribution. In such cases, the Markov Chain Monte Carlo method still provides robust results.
Andrew Fowlie
Prompted by misconceptions in the recent literature, we review the justifications for naturalness arguments and Occam's razor found in Bayesian statistics. We discuss the automatic Occam's razor that emerges in Bayesian formalism, bringing together points of view from diverse fields, including statistics, social sciences, physics and machine learning. In pedagogical calculations, we demonstrate that this automatic razor disfavors unnatural models in which predictions must be fine-tuned to agree with observation.
Zinovy Malkin
Apr 18, 2026·astro-ph.IM·PDF Possibilities are considered to simplify the computation of several statistical functions used to test statistical hypotheses when processing observations: the inverse normal distribution, the Student's t-distribution, and the criterion for rejecting outliers. For these three cases, simple approximation expressions are proposed for the quantiles of these statistical distributions, which are accurate enough for most practical applications.
Baoqiang Ma, Djennifer K. Madzia-Madzou, Rosa C. J. Kraaijveld, Jin Ouyang
For head and neck cancer (HNC) patients, prognostic outcome prediction can support personalized treatment strategy selection. Improving prediction performance of HNC outcomes has been extensively explored by using advanced artificial intelligence (AI) techniques on PET/CT data. However, the interpretability of AI remains a critical obstacle for its clinical adoption. Unlike previous HNC studies that empirically selected explainable AI (XAI) techniques, we are the first to comprehensively evaluate and rank 13 XAI methods across 24 metrics, covering faithfulness, robustness, complexity and plausibility. Experimental results on the multi-center HECKTOR challenge dataset show large variations across evaluation aspects among different XAI methods, with Integrated Gradients (IG) and DeepLIFT (DL) consistently obtained high rankings for faithfulness, complexity and plausibility. This work highlights the importance of comprehensive XAI method evaluation and can be extended to other medical imaging tasks.
Matthew Mould, Rodrigo Tenorio, Davide Gerosa
Gravitational-wave events are interpreted in terms of Bayesian posteriors for their source properties inferred under unphysical reference priors. Though these parameter estimates are important intermediate data products for downstream analyses, across the catalog they provide generically biased sourced properties and are therefore unsuitable for direct astrophysical interpretation. Hierarchical parameter estimation is the solution, where joint analysis of the entire catalog of observations not only reduces statistical uncertainties but actually informs the correct prior. Population-informed source properties from there derived are naturally suited to astrophysical interpretation and catalog statistics, such as identification of exceptional events from previous and ongoing observing runs. Using the latest LIGO-Virgo-KAGRA data, we thus demonstrate that population inference is not optional to interpret gravitational-wave observations.
Tingjia Miao, Wenkai Jin, Muhua Zhang, Jinxin Tan, Yuelin Hu, Tu Guo, Jiejun Zhang, Yuhan Wang, Wenbo Li, Yinuo Gao, Shuo Chen, Weiqi Jiang, Yayun Hu, Zixing Lei, Xianghe Pang, Zexi Liu, Yuzhi Zhang, Linfeng Zhang, Kun Chen, Wei Wang, Weinan E, Siheng Chen
The paradigm of agentic science requires AI systems to conduct robust reasoning and engage in long-horizon, autonomous exploration. However, current scientific benchmarks remain confined to domain knowledge comprehension and complex reasoning, failing to evaluate the exploratory nature and procedural complexity of real-world research. In this work, we present research-oriented evaluations in theoretical and computational physics, a natural testbed with comprehensive domain knowledge, complex reasoning, and verifiable end-to-end workflows without reliance on experiments. Here we introduce PRL-Bench (Physics Research by LLMs), a benchmark designed to systematically map the capability boundaries of LLMs in executing end-to-end physics research. Constructed from 100 curated papers from the latest issues of Physical Review Letters since August 2025 and validated by domain experts, PRL-Bench covers five major theory- and computation-intensive subfields of modern physics: astrophysics, condensed matter physics, high-energy physics, quantum information, and statistical physics. Each task in the benchmark is designed to replicate the core properties of authentic scientific research, including exploration-oriented formulation, long-horizon workflows, and objective verifiability, thereby reconstructing the essential reasoning processes and research workflows of real physics research. Evaluation across frontier models shows that performance remains limited, with the best overall score below 50, revealing a pronounced gap between current LLM capabilities and the demands of real scientific research. PRL-Bench serves a reliable testbed for accessing next generation AI scientists advancing AI systems toward autonomous scientific discovery.
E. Russeil, R. Lunnan, J. Peloton, S. Schulze, P. J. Pessi, D. Perley, J. Sollerman, A. Gkini, Y. Hu, T. -W. Chen, E. C. Bellm, T. X. Chen, B. Rusholme
Apr 16, 2026·astro-ph.IM·PDF Superluminous supernovae (SLSNe) are one of the most luminous stellar explosions known, yet they remain poorly understood. Because they are intrinsically rare, efficiently identifying them in the large alert streams produced by modern time-domain surveys is essential for enabling spectroscopic follow-up. We present NOMAI, a machine learning classifier designed to identify SLSN candidates directly from photometric alerts in the ZTF stream, using light curves accumulated over at least 30 days. It does not require any spectroscopic redshift and is running in real time within the Fink broker. ZTF light curves are transformed into a set of physically motivated features derived primarily from model-fitting procedures using SALT2 and Rainbow, a blackbody-based multi-band fitting framework. These features are used to train an XGBoost classifier on a curated dataset of labeled ZTF sources constructed using literature samples of SLSNe, along with TNS and internal ZTF labeled sources. The final training dataset contains 5280 unique sources, including 225 spectroscopically classified SLSNe. On the training sample, the classifier reaches 66% completeness and 58% purity. Deployed within the Fink broker, NOMAI has been running continuously since 18/12/2025 on the ZTF alert stream and publicly reports SLSN candidates every night by automatically posting them to dedicated communication channels. Based on this, we also report the first two-month as an evaluation period, where the classifier successfully recovered 22 of the 24 active SLSNe reported on the Transient Name Server. The achieved performances demonstrate that the classifier provides a valuable tool for experts to efficiently scan the alert stream and identify promising candidates. In the near future, NOMAI is intended to be adapted to operate on the Legacy Survey of Space and Time conducted by the Vera C. Rubin Observatory.
Masahiko Saito, Tomoe Kishimoto, Junichi Tanaka
Ensuring the reproducibility of physics results is one of the crucial challenges in high-energy physics (HEP). In this study, we develop a proof-of-concept system that uses large language models (LLMs) to extract analysis procedures from HEP publications and generate executable analysis code for reproducing published results. Our method consists of two stages. In the first stage, open-weight LLMs extract event selection criteria, object definitions, and other relevant analysis information from a target paper and, when necessary, from its referenced publications, and then produce a structured selection list. In the second stage, the structured selection list is used to generate analysis code, which is then executed and validated iteratively. As a benchmark, we use the ATLAS $H \to ZZ^{*} \to 4\ell$ analysis based on proton-proton collision data recorded in 2015 and 2016 and released as ATLAS Open Data. This benchmark allows direct comparison between the generated results and the published analysis, as well as comparison with a manually developed baseline implementation. We separately evaluate selection extraction and code generation in order to clarify the current capabilities and limitations of open-weight LLMs for HEP analysis reproduction. Our initial results show that recent open-weight models can recover many documented selection criteria from papers and references, and that in some runs they can generate event selections fully matching a baseline implementation at the event level. At the same time, stochasticity, hallucination, and execution failure remain significant challenges. These results suggest that LLMs are already promising as human-in-the-loop tools for reproducibility support, although they are not yet reliable as fully autonomous HEP analysis agents. In this paper, we report the design of the prototype system and its initial performance evaluation.
Biwei Dai, Po-Wen Chang, Wahid Bhimji, Paolo Calafiura, Ragansu Chakkappai, Yuan-Tang Chou, Sascha Diefenbacher, Jordan Dudley, Ibrahim Elsharkawy, Steven Farrell, Isabelle Guyon, Chris Harris, Elham E Khoda, Benjamin Nachman, David Rousseau, Uroš Seljak, Ihsan Ullah, Yulei Zhang
Apr 15, 2026·astro-ph.CO·PDF Weak gravitational lensing, the correlated distortion of background galaxy shapes by foreground structures, is a powerful probe of the matter distribution in our universe and allows accurate constraints on the cosmological model. In recent years, high-order statistics and machine learning (ML) techniques have been applied to weak lensing data to extract the nonlinear information beyond traditional two-point analysis. However, these methods typically rely on cosmological simulations, which poses several challenges: simulations are computationally expensive, limiting most realistic setups to a low training data regime; inaccurate modeling of systematics in the simulations create distribution shifts that can bias cosmological parameter constraints; and varying simulation setups across studies make method comparison difficult. To address these difficulties, we present the first weak lensing benchmark dataset with several realistic systematics and launch the FAIR Universe Weak Lensing Machine Learning Uncertainty Challenge. The challenge focuses on measuring the fundamental properties of the universe from weak lensing data with limited training set and potential distribution shifts, while providing a standardized benchmark for rigorous comparison across methods. Organized in two phases, the challenge will bring together the physics and ML communities to advance the methodologies for handling systematic uncertainties, data efficiency, and distribution shifts in weak lensing analysis with ML, ultimately facilitating the deployment of ML approaches into upcoming weak lensing survey analysis.
Louis González, Saad Bhamla
Dynamic soaring allows seabirds to harvest mechanical energy from vertical wind shear, but field trajectories lack a benchmark for comparing flight performances across species. We derive a reduced lower bound on transport effort from a simplified Hamilton-Jacobi-Bellman optimal-control model in which slow flight incurs an induced-drag penalty, fast flight incurs a dissipative penalty, and wind shear supplies an effective energetic subsidy. After species-specific normalization of transport speed and an accelerometer-based effort proxy, we map wandering albatrosses, Cory's shearwaters, and Eurasian oystercatchers into a common reduced speed-effort plane and estimate their empirical lower frontiers. The albatross frontier lies closest to the reduced bound, consistent with near-optimal wind-energy harvesting. The shearwater frontier is systematically displaced above it, and oystercatchers occupy a distinct non-soaring regime. The resulting framework places specialist dynamic soaring, mixed flap-gliding, and non-soaring flight in a common mechanical representation and provides a reduced benchmark for comparing wind-assisted flight across species using field trajectories.
I. D. Kolesnikov, D. A. Maksimov, V. M. Moskvitin, N. Semenova
This study examines the impact of additive and multiplicative noise on both a single leaky integrate-and-fire (LIF) neuron and a trained spiking neural network (SNN). Noise was introduced at different stages of neural processing, including the input current, membrane potential, and output spike generation. The results show that multiplicative noise applied to the membrane potential has the most detrimental effect on network performance, leading to a significant degradation in accuracy. This is primarily due to its tendency to suppress membrane potentials toward large negative values, effectively silencing neuronal activity. To address this issue, input pre-filtering strategies were evaluated, with a sigmoid-based filter demonstrating the best performance by shifting inputs to a strictly positive range. Under these conditions, additive noise in the input current becomes the dominant source of performance degradation, while other noise configurations reduce accuracy by no more than 1\%, even at high noise intensity. Additionally, the study compares the effects of common and uncommon noise across neuron populations in hidden layer, revealing that SNNs exhibit greater robustness to common noise. Overall, the findings identify the most critical noise mechanisms affecting SNNs and provide practical approaches for improving their robustness.
Nafis Fuad
Comparison of two probability density/mass functions (PDF/PMFs) is ubiquitous in various forms of scientific analysis, including machine learning, optimization problems, and hypothesis tests. A copious amount of distance metrics have already been proposed and are regularly being used in this regard. In this document, we report a data-driven systematic comparison among a few of such metrics. The metrics considered here are Hellinger distance, Wasserstein distances (1D), $\sqrt{JS}$ distance, $L_\infty$ norm, Kolmogorov-Smirnov distance, and Fisher-Rao metric. We perform this comparison using electron and photon events from a decaying \iso{Kr}{83} isotope, collected through an HPGe spectrometer operating under cryo-vacuum conditions. To accomplish this, first, a dimensionless Parameter of Interest (PoI) was established, then PDF/PMFs were generated from the data, and finally the stabilities of the PoI under various criteria, such as sample size, discretization length, and normalizing functions, were studied and the results were summarized. In this report, we also propose a list of properties that a normalizing function should have and utilize them in the comparison.
Leonardo Solidoro, Sebastian H. Völkel, Silke Weinfurtner
The emergence of precision gravity simulators in quantum and fluid systems is opening new avenues for probing curved-spacetime physics and black-hole phenomenology under controlled laboratory conditions. In parallel, advances in understanding how fundamental physics can be probed in the spectral signatures of black holes and exotic compact objects motivate the development of modern spectroscopic techniques within analogue-gravity experiments. In this work, we model the spectral properties of analogue black holes sourced by broadband stochastic noise, a crucial aspect in realistic experiments that poses substantial challenges for established data-analysis techniques. Using simulation-based inference, we demonstrate that the physical parameters encoded in noisy spectra can be reliably extracted, showing that these techniques provide a powerful tool for studying both spacetime properties and boundary effects in gravity simulators.
Gregor Krzmanc, Vinicius Mikuni, Benjamin Nachman, Callum Wilkinson
Future AI-based studies in particle physics will likely start from a foundation model to accelerate training and enhance sensitivity. As a step towards a general-purpose foundation model for particle physics, we investigate whether the OmniLearned foundation model pre-trained on diverse high-$Q^2$ simulated and real $pp$ and $ep$ collisions can be effectively transferred to a few-GeV fixed-target neutrino experiment. We process MINERvA neutrino--nucleus scattering events and evaluate pre-trained models on two types of tasks: regression of available energy and binary classification of charged-current pion final states ($\mathrm{CC1π^{\pm}}$, $\mathrm{CCNπ^{\pm}}$, and $\mathrm{CC1π^{0}}$). Pre-trained OmniLearned models consistently outperform similarly sized models trained from scratch, achieving better overall performance at the same compute budget, as well as achieving better performance at the same number of training steps. These results suggest that particle-level foundation models acquire inductive biases that generalize across large differences in energy scale, detector technology, and underlying physics processes, pointing toward a paradigm of detector-agnostic inference in particle physics.
Seong-Hoon Jang, Di Zhang, Xue Jia, Hung Ba Tran, Linda Zhang, Ryuhei Sato, Yusuke Hashimoto, Yusuke Ohashi, Toyoto Sato, Kiyoe Konno, Shin-ichi Orimo, Hao Li
Hydrogen is a promising energy carrier, yet its practical deployment is limited by the lack of storage materials that simultaneously achieve high storage capacity ($w$) and practical equilibrium pressure at room temperature ($P_{\rm eq,RT}$). Interstitial metal hydrides offer fast kinetics and favorable thermodynamics (high $P_{\rm eq,RT}$) but suffer from intrinsically low w. Here, we establish a physically interpretable, data-driven framework to uncover descriptor-property relationships in interstitial hydrides using a curated database of pressure-composition-temperature measurements (Digital Hydrogen Platform, DigHyd) and white-box symbolic regression. Strikingly, the analysis reveals a clear separation of governing mechanisms, in which $w$ is governed by geometric and lattice conditions, captured by the average atomic radius ($\left\langle r_M \right\rangle$) and average thermal conductivity ($\left\langleκ\right\rangle$), with an optimal regime of $r_M \sim 1.47 Å$ and relatively low $\left\langleκ\right\rangle$. In contrast, $P_{\rm eq,RT}$ is governed by elastic properties, captured by the average shear modulus ($\left\langle G \right\rangle$) and average Poisson's ratio ($\left\langle ν\right\rangle$), reflecting the role of lattice rigidity and mechanical compliance. These relationships are translated into compositional optimization pathways that follow the descriptor trends above, enabling the design of candidate materials with enhanced w under practical equilibrium conditions ($P_{\rm eq,RT} \sim 0.1$ MPa). This work establishes a general, interpretable strategy for physics-informed design of energy materials systems.