The Tea Time Talks 2020: Week Ten

Now that the 2020 Tea Time Talks are on Youtube, you can always have time for tea with Amii and the RLAI Lab! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, these 20-minute talks on technical topics are delivered by students, faculty and guests. The talks are a relaxed and informal way of hearing leaders in AI discuss future lines of research they may explore, with topics ranging from ideas starting to take root to fully-finished projects.

Week ten of the Tea Time Talks features:

Abhishek Naik: Learning and Planning in Average-Reward MDPs

In this talk, Abhishek talks about a family of new learning and planning algorithms for average-reward Markov decision processes. Key to these algorithms is the use of the temporal-difference (TD) error to update the reward rate estimate instead of the conventional error, enabling proofs of convergence in the general off-policy case without recourse to any reference states. Empirically, this generally results in faster learning, while reliance on a reference state can result in slower learning and risks divergence. Abhishek also presents a general technique to estimate the actual ‘centered’ value function rather than the value function plus an offset.

Ashley Dalrymple: Pavlovian Control of Walking

Spinal cord injury can cause paralysis of the legs. In this talk, Ashley introduces a spinal cord implant that her lab used to generate walking in a cat model. She then describes how they used general value functions (GVFs) and Pavlovian control to produce highly adaptable over-ground walking behaviour.

Alex Ayoub: Model-Based Reinforcement Learning with Value-Targeted Regression

In this talk, Alex discusses a model-based RL algorithm that is based on optimism principle: in each episode, the set of models that are “consistent” with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models.

Shivam Garg: Log-likelihood Baseline for Policy Gradient

Policy gradient methods have a critic baseline to reduce the variance of their estimate. In this talk, Shivam discusses a simple idea for an analogous baseline for the log-likelihood part of the policy gradient. First, Shivam shows that the softmax policy gradient in the case of bandits can be written in two different but equivalent expressions, which motivates the log-likelihood baseline. While one of these expressions is the widely-used regular expression, the other doesn't seem to be popular in the literature. Shivam then shows how these expressions can be extended to the full Markov decision process (MDP) case under certain assumptions.

The Tea Time Talks have now concluded for the year, but stay tuned as we will be uploading the remaining talks in the weeks ahead. In the meantime, you can rewatch or catch up on previous talks on our Youtube playlist.

Latest News Articles

Connect with the community

Get involved in Alberta's growing AI ecosystem! Speaker, sponsorship, and letter of support requests welcome.

Explore training and advanced education

Curious about study options under one of our researchers? Want more information on training opportunities?

Harness the potential of artificial intelligence

Let us know about your goals and challenges for AI adoption in your business. Our Investments & Partnerships team will be in touch shortly!