Yasushi Kawase, Warut Suksompong, Hanna Sumita, Yu Yokoi
In random assignment, fairness is often captured by stochastic-dominance envy-freeness (SD-EF). We observe that assignments satisfying SD-EF may admit decompositions that result in each agent envying another agent with high probability. To address this, we introduce decomposition envy-freeness (Dec-EF), which is a property of a decomposition rather than of an assignment matrix. We show that an SD-EF assignment matrix always admits a Dec-EF decomposition when there are at most three agents or the agents have at most two distinct preferences.
Yuqing Hou
Although logit quantal response equilibrium (logit QRE) offers a natural equilibrium selection mechanism and converges to Nash equilibrium as the rationality parameter tends to infinity, its computation in extensive-form games is generally intractable when based on the normal-form representation, due to the exponential growth of the strategy space. To address this difficulty, this paper develops a sequence-form formulation of logit QRE for finite n-player extensive-form games with perfect recall, which avoids explicit construction of the normal form and enables compact computation. Based on this formulation, we further develop a differentiable path-following method starting from an arbitrary initial point, such that each point on the path corresponds to a logit QRE associated with a particular value of the rationality parameter, and the limiting point yields a Nash equilibrium. In this way, the proposed method provides an efficient computational framework for exploiting the equilibrium selection property of logit QRE in extensive-form games. The effectiveness of the proposed method is validated by theoretical analysis and numerical experiments.
Björn Assmann, Ulan Degenbaev
Many automated market makers can be understood through the geometry of their trading orbits, the sets of states reachable from one another through swaps. In prominent designs, this geometry is captured by a simple closed-form invariant such as the constant product $xy$ in Uniswap or a weighted geometric mean $x^w y^{1-w}$ in Balancer. This paper explains why these forms arise by deriving them from three basic assumptions: validity invariance (swaps preserve the validity of states), Pareto efficiency (no state on an orbit weakly dominates another), and unit invariance (changing measurement units does not change the mechanism). Together, these force every trading orbit of a two-asset AMM to be a level set of a weighted geometric mean $x^w y^{1-w}$. Applied pairwise, the axioms extend the classification to $n$-asset pools: orbits are level sets of $\prod_i x_i^{w_i}$ with positive weights $w_i$ summing to $1$. Imposing token-relabeling symmetry then pins down the weights, recovering the constant-product form $xy$ in the two-asset case and $\prod_i x_i$ in general. The main text provides an intuitive proof sketch and discusses fees and liquidity operations. Complete proofs and a machine-checked Lean 4 formalization accompany the paper.
Yue Gruszecki, Elliot Anshelevich
We study Nash equilibria in strategic facility location games where clients are located in an arbitrary metric space. Specifically, there are $n$ clients, and the goal is to choose a facility from a set of given locations, so that the total distance from the clients to the facility is as small as possible. While some of the clients are always truthful, $k$ of them are strategic, and will lie about their location if it benefits them. We quantify how the fraction of strategic clients affects the existence and quality of Nash equilibrium and strong equilibrium solutions, and note that even for relatively large $k$, the properties of these solutions can be much better than the results of fully strategyproof mechanisms. For Nash equilibrium, we show that it always exists, and the price of stability is very close to 1. More importantly, we prove that all Nash equilibria are within a factor of at most $\frac{n+2k}{n-2k}$ from the optimum solution, and that this price of anarchy bound is almost tight. While strong equilibrium may not exist for this setting, we prove that it always exists for line metrics, and its cost is at most $\frac{n+k}{n-k}$ times that of optimum.
Junji Yan, Asrin Efe Yorulmaz, Hanchen Zhou, Tamer Başar
Modern Graphics Processing Unit (GPU)-backed services must satisfy strict latency service-level objectives (SLOs) while controlling spare-capacity cost. In multi-tenant GPU cloud platforms, this trade-off is inherently dynamic because workload demand is endogenous; specifically, pricing shapes the submissions of heterogeneous tenants, which subsequently impact congestion and delay. We formulate the joint pricing-and-scaling problem as a large-population Stackelberg game problem, and we derive an explicit equilibrium demand map. The resulting closed-loop model reveals a structural failure mode in which delay-insensitive workloads sustain a residual demand floor, making the backlog undrainable under bounded price and service capacity. This observation motivates a computable drainability guardrail that certifies uniformly negative drift in the residual-demand regime. For any fixed price-capacity pair satisfying the drainability guardrail, we establish a unique operating point and global convergence towards it under a checkable step-size condition. Building on this fixed-pair analysis, we further develop an optimizer-agnostic action shield for the full dynamic problem and show empirically that it improves safety and robustness for model-free reinforcement learning (RL) in this setting.
Andjela Mladenovic, Aaron Courville, Gauthier Gidel
In recent years, with the advancement of frontier AI, we have observed certain dynamics in open-sourcing and closed-sourcing decisions. We propose a game-theoretic model to analyze these dynamics in the current landscape of the AI race. Our model builds on an R&D race framework under a winner-takes-all setting, and it accounts for the cases where the players' actions can be either discrete or continuous (i.e., partial open-sourcing, such as open weights). We show that determining the existence of a discrete pure non-trivial Nash equilibrium is NP-hard in general but that we can transform the discrete Nash existence computation into a MIP (Mixed-Integer Programming) problem, making it tractable for small instances using a standard MIP solver. Next, we show the existence and tractability of pure Nash equilibria in the continuous version of our problem, leveraging standard convex analysis results, and constructing an equivalent MIP formulation. Throughout this work, we leverage both our main technical results as well as surrounding technical analysis, to derive socially relevant insights that we believe can serve both to understand already existing decisions and dynamics and to potentially inform new policies.
Ankit Maloo
We introduce the first version of KWBench (Knowledge Work Bench), a benchmark for unprompted problem recognition in large language models: can an LLM identify a professional scenario before attempting to solve it. Existing frontier benchmarks have saturated, and most knowledge-work evaluations to date reduce to extraction or task completion against a specification. KWBench targets the step before that: recognizing the governing structure of the situation from raw inputs alone. The benchmark contains 223 tasks sourced from practitioners across acquisitions, contract negotiations, clinical pharmacy, organizational politics, fraud analysis, and incentive design. Each task encodes a formal game-theoretic pattern (principal-agent conflict, signaling, mechanism design failure, strategic omission, coalitional dynamics, strategic interdependence) and carries structured ground truth recording the expert reading of the situation and the anticipated failure modes. Models receive raw data and a task prompt with no indication of problem type. Scoring is a three-tier rubric gated by a mandatory conjunctive check. Mandatory criteria encode the predicted wrong paths. We evaluate 16 models. The best model passes on 27.9% of tasks. The top two models agree on only 31.7% of their passes. Among the top 8, 44 tasks are solved by exactly one model; routing across the top 8 covers 50.7% of the benchmark, nearly double the best single model. Conditional on passing, quality scores converge (approx 83% across models); unconditional scores do not. Same models articulate the relevant game-theoretic concept correctly when asked, then fail to apply it unprompted. We release KWBench to shift how frontier models are evaluated on knowledge work, scoring them on whether they recognize the right problem from the situation alone, not only on how well they execute once the problem has been framed for them.
Deep Kumar Ganguly, Chandradithya S Jonnalagadda, Pratham Chintamani, Adithya Ananth
Cooperative equilibria are fragile. When agents learn alongside each other rather than in a fixed environment, the process of learning destabilizes the cooperation they are trying to sustain: every gradient step an agent takes shifts the distribution of actions its partner will play, turning a cooperative partner into a source of stochastic noise precisely where the cooperation decision is most sensitive. We study how this co-learning noise propagates through the structure of coordination games, and find that the cooperative equilibrium, even when strongly Pareto-dominant, is exponentially unstable under standard risk-neutral learning, collapsing irreversibly once partner noise crosses the game's critical cooperation threshold. The natural response to apply distributional robustness to hedge against partner uncertainty makes things strictly worse: risk-averse return objectives penalize the high-variance cooperative action relative to defection, widening the instability region rather than shrinking it, a paradox that reveals a fundamental mismatch between the domains where robustness is applied and instability originates. We resolve this by showing that robustness should target the policy gradient update variance induced by partner uncertainty, not the return distribution. This distinction yields an algorithm whose gradient updates are modulated by an online measure of partner unpredictability, provably expanding the cooperation basin in symmetric coordination games. To unify stability, sample complexity, and welfare consequences of this approach, we introduce the Price of Paranoia as the structural dual of the Price of Anarchy. Together with a novel Cooperation Window, it precisely characterizes how much welfare learning algorithms can recover under partner noise, pinning down the optimal degree of robustness as a closed-form balance between equilibrium stability and sample efficiency.
Yirui Zhang, Zhixuan Fang
In the conventional principal-agent problem, a principal delegates a task to an agent and formulates a contract to incentivize the agent's actions on behalf of the principal. However, this framework overlooks the information that is possibly available during the delegation process in some scenarios. To address this limitation, we propose a novel model that incorporates multiple intermediate states to capture such information revealed during the delegation. Furthermore, to evaluate the impact of the information embedded in these intermediate states, we introduce two distinct contracts: the pay-halfway contract, which provides payments based not only on final outcomes but also on intermediate states, and the terminate-halfway contract, which allows the principal to terminate the delegation process upon encountering undesirable intermediate states. This leads to the question of whether and how these contract types can leverage intermediate-state information? In particular, we ask: Can these contract types outperform standard contracts, and if so, when and to what extent? We answer the first question affirmatively and provide several important insights regarding the second, shedding light on the circumstances in which intermediate-state-aware contracts yield substantial advantages.
Sungyong Chung, Tina Radvand, Alireza Talebpour
As automated vehicles (AVs) enter mixed traffic, proactively anticipating the evolution of human driving behavior during critical interactions, such as lane changes, is essential. However, classical Evolutionary Game Theory (EGT) fails to capture the complexity of human decision-making during lane changes. Specifically, by strictly assuming independence between agents, classical models calibrated on empirical payoffs predict a convergence to unrealistic full cooperation, contradicting the stable 42% cooperation rate observed in real-world data. To resolve this discrepancy, this study introduces a Quantum Game Theory (QGT) framework. We analyze 7,636 lane-changing interactions from the Waymo Open Motion Dataset (WOMD) to derive empirical payoff matrices via a Quantal Response Equilibrium (QRE) model. Utilizing the Marinatto-Weber (MW) quantization scheme, we introduce an entanglement parameter to mathematically embed latent correlations directly into the payoff structure of a single interaction. Our results identify a human entanglement parameter of $|b|^2_{HDV} \approx 0.52$ that accurately reproduces the observed mixed equilibrium. Furthermore, simulations of three AV deployment strategies (classical, entangled, and inverted) reveal that human adaptation depends critically on the underlying AV algorithm: while cooperative classical AVs maximize system-wide cooperation at high market penetration rates, defective inverted AVs paradoxically yield higher overall cooperation at low penetration rates by prompting more cooperative behaviors from human drivers. Consequently, rather than waiting for large scale deployment to observe these effects, stakeholders can utilize this framework to simulate repeated interactions and proactively anticipate how human driver behavior will evolve in response to specific AV software designs.
Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, Zhijing Jin
It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings. Indeed, our experiments show that recent models -- with or without reasoning enabled -- consistently defect in single-shot social dilemmas. To tackle this safety concern, we present the first comparative study of game-theoretic mechanisms that are designed to enable cooperative outcomes between rational agents _in equilibrium_. Across four social dilemmas testing distinct components of robust cooperation, we evaluate the following mechanisms: (1) repeating the game for many rounds, (2) reputation systems, (3) third-party mediators to delegate decision making to, and (4) contract agreements for outcome-conditional payments between players. Among our findings, we establish that contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models, and that repetition-induced cooperation deteriorates drastically when co-players vary. Moreover, we demonstrate that these cooperation mechanisms become _more effective_ under evolutionary pressures to maximize individual payoffs.
Elizabeth Baldwin, Paul Duetting, Michal Feldman, Maya Schlesinger
In the combinatorial action model of contract design, a principal delegates a complex project to an agent, incentivizing a subset of actions from a ground set of $n$ actions, via a linear contract. Computing the optimal contract is a challenging problem that generally hinges on two factors: (i) the number of "critical values" - values of the linear contract parameter at which the agent's best response changes from one set to another, and (ii) the complexity of the agent's best-response problem (demand query). Prior work has used this approach to devise polynomial-time algorithms for the optimal contract problem under specific reward functions: gross substitutes, supermodular, and ultra. We develop a unified geometric framework for algorithmic contract design by establishing a fundamental link to the theory of demand types from consumer theory. Under this geometric view, bounding the number of critical values reduces to counting the best-response regions which the "contract ray" pierces. Leveraging this connection, we introduce the class of All Substitutes and Complements (ASC) functions, and show that it admits at most $O(n^2)$ critical values, strictly generalizing and unifying all previously known classes admitting poly-many critical values. We conjecture that, under some mild assumptions, ASC is the maximal such class. Turning to the demand query aspect, we develop a new technique for efficiently computing a demand query using value queries, which works in general for "succinct" demand types. Combining these structural and algorithmic results, we obtain polynomial-time algorithms for new classes of reward functions that exhibit substitutes and complements simultaneously.
Thanh Linh Nguyen, Nguyen Van Huynh, Quoc-Viet Pham
In data-sensitive domains such as healthcare, cross-silo federated learning (CFL) allows organizations to collaboratively train AI models without sharing raw data. However, practical CFL deployments are inherently coopetitive, in which organizations cooperate during model training while competing in downstream markets. In such settings, training contributions, including data volume, quality, and diversity, can improve the global model yet inadvertently strengthen rivals. This dilemma is amplified by non-IID data, which leads to asymmetric learning gains and undermines sustained participation. While existing competition-aware CFL and incentive-design approaches reward organizations based on marginal training contributions, they fail to account for the costs of strengthening competitors. In this paper, we introduce CoCoGen+, a coopetition-compatible data generation and incentivization framework that jointly models non-IID data and inter-organizational competition while endogenizing GenAI-based synthetic data generation as a strategic decision. Specifically, CoCoGen+ formulates each training round as a weighted potential game, where organizations strategically decide how much synthetic data to generate by balancing learning performance gains against computational costs and competition-caused utility losses. We then provide a tractable equilibrium characterization and derive implementable generation strategies to maximize social welfare. To promote long-term collaboration, we integrate a payoff redistribution-based incentive mechanism to compensate organizations for their contributions and competition-caused utility degradation. Experiments on varying learning tasks validate the feasibility of CoCoGen+. The results show how non-IID data, competition intensity, and incentives shape organizational strategies and social welfare, while CoCoGen+ outperforms baselines in efficiency.
Dongxin Guo, Jikun Wu, Siu-Ming Yiu
Large Language Model (LLM) agents are increasingly deployed in multi-agent systems requiring strategic coordination. While recent work has analyzed LLM behavior in two-player games, coalition formation, where $n$ agents dynamically form cooperative groups, remains theoretically uncharacterized. We present the first framework grounding coalition formation in LLM agent networks in hedonic game theory with formal stability guarantees. We introduce the LLM Coalition Formation Game (LCFG), establish sufficient conditions for Nash-stable partitions, and prove complexity results. Our analysis reveals that LLM agents exhibit bounded rationality characterized by $ε$-rational preferences; we provide both deterministic existence guarantees and consistency-driven stability bounds whose predictions are consistent with empirical outcomes. Experiments with GPT-4, Claude-3, and Llama-3 across 2,400 episodes validate our framework: LLM coalitions achieve Nash stability in 73.2% of cases under our Coalition-of-Thought (CoalT) protocol, compared to 58.4% under chain-of-thought and 41.8% under standard prompting ($p < 0.001$). Our framework provides theoretical foundations for designing stable multi-agent LLM systems.
Hillel Bar-Gera, Stephen D. Boyles, Liron Ravner
Vickrey's classic single-bottleneck departure time choice equilibrium model exhibits instability under many plausible day-to-day learning dynamics. Such instability is not observed in reality -- does this difference stem from the day-to-day dynamics or from one of the simplifying assumptions of the basic model? This paper explores a variant of the basic model with a continuous distribution of schedule delay parameters which we intuitively expect to have more favorable stability properties. To attain tractability we assume a monotonic relationship between earliness and lateness parameters. We first verify the existence and uniqueness of the equilibrium solution for this model. We then study a broad class of day-to-day dynamics satisfying local pressure and order preservation conditions. Our main contribution is a formal proof that, surprisingly, all such day-to-day dynamics in this context are unstable.
Sayan Kumar Chaki, Antoine Gourru, Julien Velcin
Fairness in language models is typically studied as a property of a single, centrally optimized model. As large language models become increasingly agentic, we propose that fairness emerges through interaction and exchange. We study this via a controlled hospital triage framework in which two agents negotiate over three structured debate rounds. One agent is aligned to a specific ethical framework via retrieval-augmented generation (RAG), while the other is either unaligned or adversarially prompted to favor demographic groups over clinical need. We find that alignment systematically shapes negotiation strategies and allocation patterns, and that neither agent's allocation is ethically adequate in isolation, yet their joint final allocation can satisfy fairness criteria that neither would have reached alone. Aligned agents partially moderate bias through contestation rather than override, acting as corrective patches that restore access for marginalized groups without fully converting a biased counterpart. We further observe that even explicitly aligned agents exhibit intrinsic biases toward certain frameworks, consistent with known left-leaning tendencies in LLMs. We connect these limits to Arrow's Impossibility Theorem: no aggregation mechanism can simultaneously satisfy all desiderata of collective rationality, and multi-agent deliberation navigates rather than resolves this constraint. Our results reposition fairness as an emergent, procedural property of decentralized agent interaction, and the system rather than the individual agent as the appropriate unit of evaluation.
Sicheng Wu, Minghui Liwang, Yangyang Gao, Deqing Wang, Wenbo Zhu, Yiguang Hong, Wei Ni, Seyyedali Hosseinalipour
In air-ground integrated networks (AGINs), unmanned aerial vehicles (UAVs) provide on-demand edge services to ground vehicles. Realizing this vision requires carefully designed incentives to coordinate interactions among self-interested participants. This is exacerbated by the dynamic nature of AGINs, where spatio-temporal variations introduce significant uncertainty in matching UAVs and vehicles. Existing real-time service provisioning typically relies on precise trajectory information, raising privacy concerns and incurring decision latency. To address these challenges, we propose look one-step ahead (LOSA), a novel framework for efficient and privacy-aware service provisioning. By exploiting predictable vehicle travel times between intersections, LOSA decomposes the process into two coupled phases: (i) a privacy-aware look-ahead phase and (ii) a lightweight real-time execution phase. The look-ahead phase allows vehicles to adaptively adjust privacy budgets based on historical utility, balancing trajectory exposure and matching accuracy. Leveraging this, a double auction mechanism establishes binding one-step-ahead agreements (OSAAs) through trajectory similarity clustering, while constructing preference lists to hedge against mobility uncertainty. The execution phase then enforces pre-established OSAAs and preference lists, resolving real-time resource conflicts without costly re-negotiations. This design reduces computational overhead and preserves robustness. We analytically corroborate that LOSA guarantees truthfulness, individual rationality, and budget balance. Experiments on real-world datasets (DAIR-V2X, HighD, and RCooper) demonstrate that LOSA achieves superior privacy protection while lowering transaction latency compared to baseline approaches.
Ahmed Sheta
Online multiplayer games are population-dependent systems whose playability depends on the continued presence of an active player base. We propose a formal framework for reasoning about viability collapse in such systems under explicit scope conditions. The framework introduces a conditional Critical Mass Threshold $Φ$, below which queue times, match quality, or role balance render a game operationally non-viable under a fixed operational profile; an uninhabited runtime taxonomy spanning pre-launch and post-decline states; and a Nostalgia Inversion Point $ψ$, at which cultural memory exceeds active participation. We model post-peak decline using a threshold-sensitive hazard model and show how games in the modeled class can cross below viability under finite official-service horizons or bounded novelty under continuing exposure. Case studies based on public concurrent-player data are used illustratively rather than as formal validation. The contribution of the paper is not a universal law, but a formal vocabulary, a collapse model, and an empirical agenda for studying online game decline, preservation risk, and uninhabited virtual worlds.
Shi Feng, Hanlin Zhang, Fan Nie, Sham Kakade, Yiling Chen
Mechanisms for continued self-improvement of language models without external supervision remain an open challenge. We propose Peer-Predictive Self-Training (PST), a label-free fine-tuning framework in which multiple language models improve collaboratively by leveraging a cross-model aggregated response as an internal training signal. Given a prompt question, the models generate responses sequentially; the final aggregated answer, often more reliable than individual responses in practice, serves as an internal target for learning. We measure how informative each intermediate response is about the aggregate using pointwise mutual information (PMI), and use this signal to scale self-training updates. Responses already aligned with the aggregate are updated less, while less informative or misaligned responses are updated more. On mathematical reasoning benchmarks (SimulEq, Math500, and MultiArith), PST improves exact-match accuracy by 2.2 to 4.3 percentage points across Gemma-2-2B, LLaMA-3.2-1B, and Qwen-2.5-1.5B, and reduces the average generator-verifier gap (GV-Gap) by 26 to 40 percent, while requiring no external supervision or teacher-student hierarchy and relying solely on cross-model interactions. These results suggest that cross-model generations and peer-predictive feedback can serve as an effective approach for self-supervised training.
Nguyen Kim Thang
The rise of automated bidding strategies in online advertising presents new challenges in designing and analyzing efficient auction mechanisms. In this paper, we focus on proportional mechanisms within the context of auto-bidding and study the efficiency of pure Nash equilibria, specifically the price of anarchy (PoA), under the liquid welfare objective. We first establish a tight PoA bound of 2 for the standard proportional mechanism. Next, we introduce a modified version with an alternative payment scheme that achieves a PoA bound of $1 + \frac{O(1)}{n-1}$ where $n \geq 2$ denotes the number of bidding agents. This improvement surpasses the existing PoA barrier of 2 and approaches full efficiency as the number of agents increases. Our methodology leverages duality and the Karush-Kuhn-Tucker (KKT) conditions from linear and convex programming. Despite its conceptual simplicity, our approach proves powerful and may offer broader applications for establishing PoA bounds.