Presenter: Dr. Yangchen Pan, Department of Engineering Science, University of Oxford
Title: Unifying supervised learning and reinforcement learning via an MRP formulation: generalized TD learning
Abstract: This presentation challenges the traditional i.i.d. assumption in statistical learning by modeling data as interconnected through a Markov reward process (MRP). We reformulate supervised learning as an on-policy policy evaluation in reinforcement learning (RL) and propose a generalized temporal difference (TD) learning algorithm. Our theoretical analysis connects linear TD solutions to ordinary least squares (OLS), showing TD’s advantage when noise is correlated. We prove convergence under linear function approximation. Empirical studies validate our approach, showcasing its utility in tasks like regression and deep learning-based image classification.
Presenter Bio: Dr. Yangchen Pan is a Lecturer in Machine Learning at the Department of Engineering Science, University of Oxford. He previously earned his Ph.D. from the University of Alberta under the supervision of Prof. Martha White and Prof. Amir-massoud Farahmand. His research focuses on achieving sample-efficient generalization with scalable computation, with particular interest in learning settings involving distribution shifts, including robust learning, reinforcement learning, and continual learning.
Hosted by: Dr. Martha White
About AI Seminar
Hosted by Amii, AI Seminar is a weekly meeting where students, developers, and professors in the AI field share their current research. Presenters include speakers from the University of Alberta, local industry, and other institutions from across Canada and abroad. Once a month, Technology Alberta participates as a co-host, featuring new and exciting Alberta Tech companies.
Learn more and see all upcoming seminars on the AI Seminar website.