Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability
Natalie Collina, Varun Gupta, Aaron Roth
[arXiv]
We study a repeated contracting setting in which a Principal adaptively chooses amongst k Agents at each of T rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a T-round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory --- the game induced when choosing an Agent to contract with. First, we show that this game admits a pure-strategy non-responsive equilibrium amongst the Agents --- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a monotone bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight --- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed T-round contract to the best Agent in hindsight, which would have induced a different sequence of actions. Finally, we show that if the Principal selects Agents using a monotone bandit algorithm which guarantees no swap-regret, then the Principal can additionally offer only limited liability contracts (in which the Agent never needs to pay the Principal) while getting no-regret to the counterfactual world in which she offered a linear contract to the best Agent in hindsight --- despite the fact that linear contracts are not limited liability. We instantiate this theorem by demonstrating the existence of a monotone no swap-regret bandit algorithm, which to our knowledge has not previously appeared in the literature.