The Tea Time Talks 2021: Week One

The Tea Time Talks are back! Throughout the summer, take in 20-minute talks on early-stage ideas, prospective research and technical topics delivered by students, faculty and guests. Presented by Amii and the RLAI Lab at the University of Alberta, the talks are a relaxed and informal way of hearing leaders in AI discuss future lines of research they may explore.

Watch select talks from the first week of the series now:

RLAI Panel

The first Tea Time Talk of 2021 features a panel of reinforcement learning (RL) researchers -- all Amii Fellows, Canada CIFAR AI Chairs and UAlberta professors. Martha White moderates this panel featuring Adam White, Csaba Szepesvári, Matthew E. Taylor and Michael Bowling.

Richard S. Sutton: Gaps in the Foundations of Planning with Approximation

Abstract: Planning, a computational process widely thought essential to intelligence, consists of imagining courses of action and their consequences, and deciding ahead of time which ones to do. In the standard RLAI agent architecture, the component that does the imagining of consequences is called the model of the environment, and the deciding in advance is via a change in the agent’s policy. Planning and model learning have been studied for seven decades and yet remain largely unsolved in the face of genuine approximation—models that remain approximate (do not become exact) in the high-data limit. In this talk, Richard Sutton briefly assesses the challenges of extending RL-style planning (value iteration) in the most important ways: average reward, partial observability, stochastic transitions, and temporal abstraction (options). His assessment is that these extensions are straightforward until they are combined with genuine approximation in the model, in which case we have barely a clue how to proceed in a scalable way. Nevertheless, we do have a few clues; Rich suggests the ideas of expectation models, ‘meta data’, and search as general strategies for learning approximate environment models suitable for use in planning.

Rupam Mahmood: New Forms of Policy Gradients for Model-free Estimation

Abstract: Policy gradient methods are a natural choice for learning a parameterized policy, especially for continuous actions, in a model-free way. These methods update policy parameters with stochastic gradient descent by estimating the gradient of a policy objective. Many of these methods can be derived from or connected to a well-known policy gradient theorem that writes the true gradient in the form of the gradient of the action likelihood, which is suitable for model-free estimation. In this talk, Rupam Mahmood revisits this theorem and looks for other forms of writing the true gradient that may give rise to new classes of policy gradient methods.

Like what you’re learning here? Take a deeper dive into the world of RL with the Reinforcement Learning Specialization, offered by the University of Alberta and Amii. Taught by Martha White and Adam White, this specialization explores how RL solutions help solve real-world problems through trial-and-error interaction, showing learners how to implement a complete RL solution from beginning to end. Enroll in this specialization now!

Latest News Articles

Connect with the community

Get involved in Alberta's growing AI ecosystem! Speaker, sponsorship, and letter of support requests welcome.

Explore training and advanced education

Curious about study options under one of our researchers? Want more information on training opportunities?

Harness the potential of artificial intelligence

Let us know about your goals and challenges for AI adoption in your business. Our Investments & Partnerships team will be in touch shortly!