Single-Agent Policy Tree Search With Guarantees

We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to prove an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for “needle-in-a-haystack” problems. The second algorithm is based on sampling and we prove an upper bound on the expected number of nodes it expands before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide the search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.

Acknowledgements

The authors wish to thank Peter Sunehag, Andras Gyorgy, Rémi Munos, Joel Veness, Arthur Guez, Marc Lanctot, André Grahl Pereira, and Michael Bowling for helpful discussions pertaining this research. Financial support for this research was in part provided by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Single-Agent Policy Tree Search With Guarantees

Latest Research Papers

Jump Point Search with Temporal Obstacles

Conflict-Based Increasing Cost Search

Iterative-deepening Bidirectional Heuristic Search with Restricted Memory

Let us help you

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence