Research Post

Approximate exploitability: Learning a best response in large games


A standard metric used to measure the approximate optimality of policies in imperfect information games is exploitability, i.e. the performance of a policy against its worst-case opponent. However, exploitability is intractable to compute in large games as it requires a full traversal of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric using an approximate best response; the approximation is done by using search and reinforcement learning. This is a generalization of local best response, a domain specific evaluation metric used in poker. We provide empirical results for a specific instance of the method, demonstrating that our method converges to exploitability in the tabular and function approximation settings for small games. In large games, our method learns to exploit both strong and weak agents, learning to exploit an AlphaZero agent.

Latest Research Papers

Connect with the community

Get involved in Alberta's growing AI ecosystem! Speaker, sponsorship, and letter of support requests welcome.

Explore training and advanced education

Curious about study options under one of our researchers? Want more information on training opportunities?

Harness the potential of artificial intelligence

Let us know about your goals and challenges for AI adoption in your business. Our Investments & Partnerships team will be in touch shortly!