Learning to Play Against Unknown Opponents
cs.GT
/ Abstract
We consider the problem of a learning agent who has to repeatedly play a general sum game against a strategic opponent who acts to maximize their own payoff by optimally responding against the learner's algorithm. The learning agent knows their own payoff function, but is uncertain about the payoff of their opponent (knowing only that it is drawn from some distribution $\mathcal{D}$). What learning algorithm should the agent run in order to maximize their own total utility, either in expectation or in the worst-case over $\mathcal{D}$? When the learning algorithm is constrained to be a no-regret algorithm, we demonstrate how to efficiently construct an optimal learning algorithm (asymptotically achieving the optimal utility) in polynomial time for both the in-expectation and worst-case problems, independent of any other assumptions. When the learning algorithm is not constrained to no-regret, we show how to construct an $\varepsilon$-optimal learning algorithm (obtaining average utility within $\varepsilon$ of the optimal utility) for both the in-expectation and worst-case problems in time polynomial in the size of the input and $1/\varepsilon$, when either the size of the game or the support of $\mathcal{D}$ is constant. Finally, for the special case of the maximin objective, where the learner wishes to maximize their minimum payoff over all possible optimizer types, we construct a learner algorithm that runs in polynomial time in each step and guarantees convergence to the optimal learner payoff. All of these results make use of recently developed machinery that converts the analysis of learning algorithms to the study of the class of corresponding geometric objects known as menus.