Now that the 2020 Tea Time Talks are on Youtube, you can always have time for tea with Amii and the RLAI Lab! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, these 20-minute talks on technical topics are delivered by students, faculty and guests. The talks are a relaxed and informal way of hearing leaders in AI discuss future lines of research they may explore, with topics ranging from ideas starting to take root to fully-finished projects.
Week three of the Tea Time Talks features:
Kris De Asis: Inverse Policy Evaluation for Value-based Decision-making
In the reinforcement learning setting, the problem of policy evaluation is to estimate the values given a policy. In this talk, Kris explores inverse policy evaluation, which is the process of solving for a likely policy given a value function, as a method for deriving behaviour from a value function.
Andy Patterson: Objective Function Geometry for Learning Values
Andy discusses the distribution of prediction error when learning value functions that minimize a few popular objective functions in RL. He does this with the help of a geometric perspective of the objective functions.
Junfeng Wen: Batch Stationary Distribution Estimation
In his talk, Junfeng considers the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions. Classical simulation-based approaches assume access to the underlying process so that trajectories of sufficient length can be gathered to approximate stationary sampling. Instead, he considers an alternative setting where a fixed set of transitions has been collected beforehand by a separate (possibly unknown) procedure. The goal is still to estimate properties of the stationary distribution, but without additional access to the underlying system. He proposes a consistent estimator that is based on recovering a correction ratio function over the given data. In particular, he introduces a variational power method (VPM) that provides provably consistent estimates under general conditions. In addition to unifying a number of existing approaches from different subfields, VPM yields significantly better estimates across a range of problems, including queueing, stochastic differential equations, post-processing MCMC and off-policy evaluation.
Vincent Liu: Towards a practical measure of interference for reinforcement learning
Catastrophic interference is common in many network-based learning systems and many proposals exist for mitigating it. However, to overcome interference, we must understand it better. In this talk, Vincent provides a definition of interference for control in reinforcement learning. His group systematically evaluates their new measures by assessing correlation with several measures of learning performance including stability, sample efficiency, and online and offline control performance across a variety of learning architectures. Their new interference measure allows them to ask novel scientific questions about commonly used deep learning architectures. In particular, they show that target network frequency is a dominating factor for interference, and that updates on the last layer result in significantly higher interference than updates internal to the network. This new measure can be expensive to compute; they conclude with motivation for an efficient proxy measure and empirically demonstrate it is correlated with their definition of interference.
Watch the Tea Time Talks live online this year, Monday through Thursday from 4:15 – 4:45 p.m. MT. Each talk will be conducted here (please note that if you are accessing the chat from an email ID outside the domain of ualberta.ca, you may have to wait a few seconds for someone inside the meeting to let you in). You can take a look at the full schedule to find talks that interest you, subscribe to the RLAI mailing list or catch up on previous talks on the Youtube playlist.