Alberta Machine Intelligence Institute

Confident Natural Policy Gradient for Local Planning in q_pi-realizable Constrained MDPs, Tian Tian

Published

Mar 24, 2025

The AI Seminar is a weekly meeting at the University of Alberta where researchers interested in artificial intelligence (AI) can share their research. Presenters include both local speakers from the University of Alberta and visitors from other institutions. Topics can be related in any way to artificial intelligence, from foundational theoretical work to innovative applications of AI techniques to new fields and problems.

Abstract: Constrained Markov decision processes (CMDPs) are a key reinforcement learning framework for ensuring safety and critical objectives while maximizing rewards. Yet, efficient learning in CMDP settings with infinite states and function approximation remains challenging. We address this problem using linear function approximation under q_pi-realizability, where all policies’ value functions can be represented by a known feature map. Using a local-access model, we propose a primal-dual algorithm with polynomial sample complexity that strictly satisfies constraints and returns a near-optimal policy.

Presenter Bio: Tian Tian is a PhD student in Computing Science at the University of Alberta, working under the supervision of Rich Sutton and collaborating with Ling F. Yang from UCLA and Csaba Szepesvári. She completed her master’s degree at the University of Alberta under Rich Sutton’s guidance, following her bachelor’s degree in Computing Engineering and Statistics from the same institution. Tian Tian's primary research interest lies in reinforcement learning theory.