Irene Aldridge, Jolie An, Riley Burke, Michael Cao, Chia-Yi Chien, Kexin Deng, Ruipeng Deng, Yichen Gao, Olivia Guo, Shunran He, Zheng Li, George Lin, Weihang Lin, Percy Lyu, Alex Ng, Qi Wang, Hanxi Xiao, Dora Xu, Yuanyuan Xue, Sheng Zhang, Sirui Zhang, Yun Zhang, Sirui Zhao, Xiaolong Zhao, Yihan Zhao, Waner Zheng
The emergence of agentic artificial intelligence (AI) represents a fundamental transformation in financial markets, characterized by autonomous systems capable of reasoning, planning, and adaptive decision-making with minimal human intervention. This comprehensive survey synthesizes recent advances in agentic AI across multiple dimensions of financial operations, including system architecture, market applications, regulatory frameworks, and systemic implications. We examine how agentic AI differs from traditional algorithmic trading and generative AI through its capacity for goal-oriented autonomy, continuous learning, and multi-agent coordination. Our analysis shows that while agentic AI offers substantial potential for enhanced market efficiency, liquidity provision, and risk management, it also introduces novel challenges related to market stability, regulatory compliance, interpretability, and systemic risk. Through a systematic review of foundational research, technical architectures, market applications, and governance frameworks, this survey provides scholars and practitioners with a structured understanding of how agentic AI is reshaping financial markets and identifies critical research directions for ensuring that these systems enhance both operational efficiency and market resilience.
Tengyuan Liang
Treatment effect distributions are not identified without restrictions on the joint distribution of potential outcomes. Existing approaches either impose rank preservation -- a strong assumption -- or derive partial identification bounds that are often wide. We show that a single scalar parameter, rank stickiness, suffices for nonparametric point identification while permitting rank violations. The identified joint distribution -- the coupling that maximizes average rank correlation subject to a relative entropy constraint, which we call the Bregman-Sinkhorn copula -- is uniquely determined by the marginals and rank stickiness. Its conditional distribution is an exponential tilt of the marginal with a Bregman divergence as the exponent, yielding closed-form conditional moments and rank violation probabilities; the copula nests the comonotonic and Gaussian copulas as special cases. The empirical Bregman-Sinkhorn copula converges at the parametric $\sqrt{n}$-rate with a Gaussian process limit, despite the infinite-dimensional parameter space. We apply the framework to estimate the full treatment effect distribution, derive a variance estimator for the average treatment effect tighter than the Fréchet--Hoeffding and Neyman bounds, and extend to observational studies under unconfoundedness.
Lars van der Laan, Mark Van Der Laan
We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability weighting (AIPW) [Robins et al., 1994], which protects against prediction-model misspecification but can be inefficient when the prediction score is poorly aligned with the outcome scale. We introduce Calibrated Prediction-Powered Inference, which post-hoc calibrates the prediction score on the labeled sample before using it for semisupervised estimation. This simple step requires no retraining and can improve the original score both as a predictor of the outcome and as a regression adjustment for semisupervised inference. We study both linear and isotonic calibration. For isotonic calibration, we establish first-order optimality guarantees: isotonic post-processing can improve predictive accuracy and estimator efficiency relative to the original score and simpler post-processing rules, while no further post-processing of the fitted isotonic score yields additional first-order gains. For linear calibration, we show first-order equivalence to PPI++. We also clarify the relationship among existing estimators, showing that the original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is accurate, while PPI++ is AIPW with empirical efficiency maximization [Rubin et al., 2008]. In simulations and real-data experiments, our calibrated estimators often outperform PPI and are competitive with, or outperform, AIPW and PPI++. We provide an accompanying Python package, ppi_aipw, at https://larsvanderlaan.github.io/ppi-aipw/.
David Gunawan
Survey data are widely used to study how income inequality, poverty, and welfare evolve over time. A common practice is to estimate the income distribution separately for each year, treating annual observations as independent cross-sections. For population subgroups with relatively small sample sizes, however, this approach can produce unstable parameter estimates, imprecise inference for inequality and poverty measures, and potentially misleading posterior probabilities of Lorenz and stochastic dominance. This paper develops flexible Bayesian models for time-varying income distributions that borrow strength across adjacent years by allowing the parameters of income distributions to evolve dynamically. We consider a random walk specification and an extended model with shrinkage priors. The proposed framework yields coherent inference for the full income distributions over time, as well as for associated inequality measures, poverty indices, and dominance probabilities. Simulation studies show that, relative to independent year-by-year models, the proposed approach produces substantially more precise and stable inference, while avoiding spurious variation in welfare comparisons. An application to the Aboriginal and residents of the Australian Capital Territory (ACT) population subgroups in the Household, Income and Labour Dynamics in Australia survey shows that the dynamic models deliver improved inference for income distributions and related welfare measures, and can change conclusions about distributional dominance over time.
Olivia Martin, Amar Venugopal
Local government meetings are the most common formal channel through which residents speak directly with elected officials, contest policies, and shape local agendas. However, data constraints typically limit the empirical study of these meetings to agendas, single cities, or short time horizons. We collect and transcribe a massive new dataset of city council meetings from 115 California cities over the last decade, using advanced transcription and diarization techniques to analyze the speech content of the meetings themselves. We document two sets of descriptive findings: First, city council meetings are frequent, long, and vary modestly across towns and time in topical content. Second, public participants are substantially older, whiter, more male, more liberal, and more likely to own homes than the registered voter population, and public participation surges when topics related to land use and zoning are included in meeting agendas. Given this skew, we examine the main policy lever municipalities have to shift participation patterns: meeting access costs. Exploiting pandemic-era variation in remote access, we show that eliminating remote options reduces the number of speakers, but does not clearly change the composition of speakers. Collectively, these results provide the most comprehensive empirical portrait to date of who participates in local democracy, what draws them in, and how institutional design choices shape both the volume and composition of public input.
Irene Aldridge, Gavhar Annaeva, Leyla Beriker, Zhiheng Cai, Samyak Choudhary, Camila Godoy, Kaicheng Gong, Zitao Huang, Jonah Ji, Hetvi Kharvasiya, Heng Li, Yuxuan Li, Tianchi Ma, Qingcheng Meng, Ruiyang Shi, Ananya Shrivastava, Jiaqi Wang, Yifan Wang, Zihua Wu, Jiayang Xu, Yuheng Yan, Zijun Zeng, Bowen Zhang, Francesco Zhang
Blockchain technology is widely expected to reduce transaction costs by automating contract enforcement and eliminating intermediaries; yet, the execution costs imposed by network congestion have received little attention in the operations management literature. We study on-chain peak shaving, the systematic scheduling of Ethereum transactions toward low-congestion windows to reduce gas fee exposure. We use transaction-level data from seven firms across seven industries (N = 62,142 transactions, January-March 2026). Gas fees vary significantly throughout the day: the peak-hour premium at 10 AM Eastern Time reaches USD 0.220 per transaction above the overnight baseline, driven primarily by speculative-arbitrage demand rather than operational activity. Firm-level scheduling responses are heterogeneous and not uniformly disciplined. Only three of seven firms transact disproportionately during off-peak hours; four transact counter-cyclically, concentrated in peak windows due to external deadlines or governance cycles. This heterogeneity is explained by two moderators: transaction deferrability and gas intensity. We formalize these into an On-Chain Scheduling Matrix that maps firms to four regimes: 1) full peak shaving, 2) selective peak shaving, 3) cost provisioning, and 4) accept-market-rate, with regime membership predicting both fee savings and residual cost floors (40-92 percent of actual expenditure). Theoretically, we extend Transaction Cost Economics to account for time-varying execution costs imposed by congestion externalities. In addition to extending Williamson's original cost taxonomy, we introduce a dual classification of gas fees as execution costs in timing but maladaptation costs in origin. The findings reposition on-chain gas-fee management alongside energy procurement and foreign exchange hedging as a domain requiring systematic operational planning.
Samuele Centorrino, Christopher F. Parmeter
Causal inference methods (instrumental variables, difference-in-differences, regression discontinuity, etc.) are primary tools used across many social science milieus. One area where their application has lagged however, is in the study of productivity and efficiency. A main reason for this is that the nature of the stochastic frontier model does not immediately lend itself to a causal framework when interest hinges on an error component of the model. This paper reviews the nascent literature on attempts to merge the stochastic frontier literature with causal inference methods. We discuss modeling approaches and empirical issues that are likely to be relevant for applied researchers in this area. This review shows how this model can be easily put within the confines of causal analysis, reviews existing work that has already made inroads in this area, addresses challenges that have yet to be met and discusses core findings.
Simon Hirsch, Florian Ziel
Apr 21, 2026·q-fin.ST·PDF Electricity price forecasting supports decision-making in energy markets and asset operation. Probabilistic forecasts are increasingly adopted to explicitly quantify uncertainty, typically issued as quantile predictions or ensembles of the full predictive distribution. However, how improvements in statistical forecast quality translate into economic value remains unclear. Battery storage arbitrage in day-ahead markets is a popular application-based benchmark for this purpose. We analyze quantile-based trading strategies (QBTS) and identify two critical flaws: they do not incentivize honest probabilistic forecasting and they ignore the intertemporal dependence structure of electricity prices. We therefore frame battery optimization as a stochastic program based on fully probabilistic forecasts and examine decision quality measurement for risk-neutral and risk-averse settings under different uncertainty models. Our discussion touches both sides of the coin: How reliable is the economic evaluation of forecasting models though (simplified) application studies - and how do improvements in statistical forecast quality for stochastic programs relate to the decision-quality and economic performance? We provide theoretical justification and empirical evidence from a case study on the German electricity market. Our results highlight the pitfalls of ranking forecasting models through battery trading strategies. We conclude with implications for evaluation practice and directions for future research in application-based forecast assessment.
Lu Yu, Xiang Li
Financial firms have gone through three major technological waves: computerization in the 1980s and 1990s, the rise of indexing and passive investing in the 2000s and 2010s, and the AI and automation wave from roughly 2015 to the present. This project studies how much labor is required to manage capital across those waves by tracking a simple productivity measure: assets under management per employee. Using a small panel of representative firms, we compare changes in AUM per employee, revenue per employee, and operating expense intensity over time. The goal is not to identify causal effects, but to document stylized facts about how technology changes the scale of asset management work.
Ana Maria Herrera, Elena Pesavento, Alessia Scudiero
We propose a clustered local projection (clustered LP) method to estimate impulse response functions in a class of time-varying models where parameter variation is linked to a low-dimensional matrix of observables. We show that the clustered LP recovers the conditional average response when the driving variables are exogenous and a weighted average of the conditional marginal effects when they are endogenous. We propose an iterative estimation method that first classifies the data using k-means, estimates impulse response functions via GMM, and evaluates differences across clustered LP estimates. Our Monte Carlo simulations illustrate the ability of clustered LP to approximate the conditional average response function. We employ our technique to examine how uncertainty influences the transmission of a contractionary monetary policy shock to the 5- and 10-year U.S. nominal Treasury yields. Our estimation results suggest macroeconomic and monetary policy uncertainty operate through complementary but distinct channels: the former primarily amplifies the risk compensation embedded in the term premium, while the latter governs the speed and persistence with which markets revise their expectations about the future rate path following a monetary policy shock.
Artūras Juodis, Martin Weidner
We revisit panel regressions with unobserved heterogeneity through the lens of variance-weighted average treatment effects. Building on established results for cross-sectional OLS and one-way fixed effects panels, we show that two-way panel estimators with latent factors, specifically the principal components estimator of Greenaway-McGrevy, Han and Sul (2012) and the interactive fixed effects estimator of Bai (2009), also converge to interpretable estimands under fully nonparametric assumptions. Both estimators consistently estimate the same variance-weighted average of unit-time-specific treatment effects, where the weights are proportional to the conditional variance of the regressor given the unobserved heterogeneity. The result requires the number of estimated factors to grow with the sample size and applies to the single regressor case. We discuss the challenges that arise when extending to multiple regressors and to inference.
Maximilian Kasy, Elizabeth Linos, Sanaz Mobasseri
This paper develops a framework for identification, estimation, and inference on the causal mechanisms driving endogenous social network formation. Identification is challenging because of unobserved confounders and reverse causality; inference is complicated by questions of equilibrium and sampling. We leverage repeated observations of a network over time and random variation in initial ties to address challenges to causal identification. Our design-based approach sidesteps questions of sampling and asymptotics by treating both the set of nodes (individuals) and potential outcomes as non-random. We apply our approach to data from a large professional services firm, where new hires are randomly assigned to project teams within offices. We estimate the causal effect on tie formation of indirect ties, network degree, and local network density. Indirect ties have a strong and significant positive effect on tie formation, while the effects of degree and density are smaller and less robust.
Yukai Yang, Rickard Sandberg
Subsample-based estimation is a standard tool for achieving robustness to outliers in econometric models. This paper shows that, in dynamic time series settings, such procedures are fundamentally invalid under contamination, even under oracle knowledge of contamination locations. The key issue is that contamination propagates through the model's residual filter and distorts the estimation criterion itself. As a result, removing contaminated observations does not, in general, restore the uncontaminated objective or ensure consistency. We characterise this failure as a structural incompatibility between pointwise subsampling and residual propagation. To address it, we propose a propagation-compatible transformation of index sets, formalised through a patch removal operator that removes the residual footprint of contamination. Under suitable conditions, the proposed operator leaves the estimator asymptotically unchanged under the uncontaminated model, while restoring consistency for the clean-data parameter under contamination. The results apply to a broad class of residual-based estimators and show that valid subsample-based estimation in dynamic models requires explicit control of residual propagation.
Ziming Lin, Fang Han
Double/debiased machine learning (DML) provides a general framework for inference with high-dimensional or otherwise complex nuisance parameters by combining Neyman-orthogonal scores with cross-fitting, thereby circumventing classical Donsker-type conditions in many modern machine-learning settings. Despite its strong empirical performance, bootstrap inference for DML estimators has received little theoretical justification. This is particularly noteworthy since bootstrap methods are suggested ad used for inference on DML estimators, even though bootstrap procedures can fail for estimators that are root-$n$ consistent and asymptotically normal. This paper fills this gap by establishing bootstrap validity for DML estimators under general exchangeably weighted resampling schemes, with Efron's bootstrap as a special case. Under exactly the same conditions required for the validity of DML itself, we prove that the bootstrap law converges conditionally weakly to the sampling law of the original estimator.
Daniel Aronoff, Kristian Praizner, Armin Sabouri
Bitcoin transaction fees will become more important as the block subsidy declines, but fee formation is hard to study with blockchain data alone because the relevant queueing environment is unobserved. We develop and estimate a structural model of Bitcoin fee choice that treats the mempool as a market for scarce blockspace. We assemble a novel, high-frequency mempool panel, from a self-run Bitcoin node that records transaction arrivals, exits, block inclusion, fee-bumping events, and congestion snapshots. We characterize the fee market as a Vickery-Clarke-Groves mechanism and derive an equation to estimate fees. In the first-stage we estimate a monotone delay technology linking fee-rate priority and network state to expected confirmation delay. We then estimate how fees respond to that delay technology and to transaction characteristics. We find that congestion is the main determinant of delay; that the marginal value of priority is priced in fees, which is increasing in the gradient of confirmation time reduction per movement up in the fee queue; and that transactor choice of RBF, CPFP, and block conditions have economically important effects on fees.
Nima Afsharhajari, Jonathan Yu-Meng Li
Apr 18, 2026·q-fin.GN·PDF Sparsity or complexity? In modern high-dimensional asset pricing, these are often viewed as competing principles: richer feature spaces appear to favor complexity, while economic intuition has long favored parsimony. We show that this tension is misplaced. We distinguish capacity sparsity-the dimensionality of the candidate feature space-from factor sparsity-the parsimonious structure of priced risks-and argue that the two are complements: expanding capacity enables the discovery of factor sparsity. Revisiting the benchmark empirical design of Didisheim et al. (2025) and pushing it to higher complexity regimes, we show that nonlinear feature expansions combined with basis pursuit yield portfolios whose out-of-sample performance dominates ridgeless benchmarks beyond a critical complexity threshold. The evidence shows that the gains from complexity arise not from retaining more factors, but from enlarging the space from which a sparse structure of priced risks can be identified. The virtue of complexity in asset pricing operates through factor sparsity.
Saad Bin Shafiq
Enterprise hiring systems generate data across multiple disconnected platforms: applicant tracking systems (ATS) record candidate profiles, human resource information systems (HRIS) record performance outcomes, and behavioral assessments capture personality and behavioral dimensions. Each system operates independently, and the reasoning behind hiring decisions is lost when managers retire, transfer, or leave. Decision traces are structured evidence chains connecting screening inputs, assessment signals, and production outcomes. They have been theorized but never operationalized at production scale. We present, to our knowledge, the first such study: a deployment at a Fortune 500 insurance carrier (N=10,765 agents hired, 2022-2025), where connecting three siloed data systems produced three findings. First, of 8,181 unique skills parsed from ATS profiles (3,597 testable), not a single keyword predicts production after Bonferroni correction; 30 are significantly anti-predictive, and the median keyword is associated with 25% lower odds of production. Requiring insurance experience alone would reject 2,863 agents who produced $17.7M in annual premium credit. Second, personality-based behavioral assessment (Predictive Index) achieves AUC=0.647 standalone and AUC=0.735 when fused with ATS and behavioral scoring data. Third, speed-to-production follows a measurable economic constant of $54/day per agent unadjusted, or $35/day controlling for source channel and tenure, moderated by behavioral score: high-scored agents capture $114/day from speed acceleration versus $41/day for low-scored agents. These findings were invisible within any single system. We discuss implications for hiring system design, the limitations of keyword-based screening, and the conditions under which institutional knowledge can be captured and operationalized.
Reca Sarfati, Vod Vilfort
Empirical researchers often use diagnostic checks to assess the plausibility of their modeling assumptions, such as testing for covariate balance in RCTs, pre-trends in event studies, or instrument validity in IV designs. While these checks are traditionally treated as external hurdles to estimation, we argue they should be integrated into the estimation process itself. In particular, we propose residualizing one's baseline estimator against the vector of diagnostic check statistics to remove the component of baseline sampling variation explained by the diagnostic checks. This residualized estimator offers researchers a "free lunch," delivering three properties simultaneously: (i) eliminating inference distortions from check-based selective reporting; (ii) reducing variance without changing the estimand when the baseline model is correctly specified; and (iii) minimizing worst-case bias under bounded local misspecification within the class of linear adjustments. We apply our method to the RCT in Kaur et al. (2024) and find that, even in a setting where all balance checks pass comfortably, residualization increases the magnitude of the baseline point estimate and reduces its standard error, equivalent to approximately a 10% increase in sample size.
José Francisco Perles-Ribes
We propose a descriptive, realization-centred framework for detecting and characterising explosive and co-explosive behaviour in economic time series, which we term path-explosive behaviour. Departing from the data-generating-process (DGP) perspective that underlies recursive unit root testing, the approach operates directly on observable path properties of the realised series. Four diagnostic layers -- level geometry, growth rate dynamics, normalised curvature, and log-space behaviour -- yield statistics that discriminate between genuine self-reinforcing multiplicative growth and I(2) dynamics without distributional assumptions or asymptotic critical values. Two theoretically motivated absolute gate thresholds screen detected episodes before a composite intensity score is assigned. Co-explosive behaviour between pairs of series is assessed at the episode level through a Jaccard co-occurrence index and non-parametric intensity concordance measures. The theoretical motivation draws on the path dependence and planning irreversibility literatures to argue that, in settings where discrete institutional decisions shape growth trajectories, a realization-centred characterisation is epistemically more appropriate than a DGP-based test. A simulation study across four DGP regimes validates the framework's discriminating power and conservatism. An empirical application to real house prices, commodity prices, public debt, and Spanish tourism destinations illustrates the empirical content of the path-explosive concept and distinguishes it from speculative bubble detection.
Pierre-Andre Chiappori, Dam Linh Nguyen, Bernard Salanie
Since Choo and Siow (2006), a burgeoning literature has analyzed matching markets when utility is perfectly transferable and the joint surplus is separable. We take stock of recent methodological developments in this area. Combining theoretical arguments and simulations, we show that the separable approach is reasonably robust to omitted variables and/or non-separabilities. We conclude with a caveat on data requirements and imbalanced datasets.