News
The Tea Time Talks are back! Throughout the summer, take in 20-minute talks on early-stage ideas, prospective research and technical topics delivered by students, faculty and guests. Presented by Amii and the RLAI Lab at the University of Alberta, the talks are a relaxed and informal way of hearing leaders in AI discuss future lines of research they may explore.
Watch select talks from the first week of the series now:
The first Tea Time Talk of 2021 features a panel of reinforcement learning (RL) researchers -- all Amii Fellows, Canada CIFAR AI Chairs and UAlberta professors. Martha White moderates this panel featuring Adam White, Csaba Szepesvári, Matthew E. Taylor and Michael Bowling.
Abstract: Planning, a computational process widely thought essential to intelligence, consists of imagining courses of action and their consequences, and deciding ahead of time which ones to do. In the standard RLAI agent architecture, the component that does the imagining of consequences is called the model of the environment, and the deciding in advance is via a change in the agent’s policy. Planning and model learning have been studied for seven decades and yet remain largely unsolved in the face of genuine approximation—models that remain approximate (do not become exact) in the high-data limit. In this talk, Richard Sutton briefly assesses the challenges of extending RL-style planning (value iteration) in the most important ways: average reward, partial observability, stochastic transitions, and temporal abstraction (options). His assessment is that these extensions are straightforward until they are combined with genuine approximation in the model, in which case we have barely a clue how to proceed in a scalable way. Nevertheless, we do have a few clues; Rich suggests the ideas of expectation models, ‘meta data’, and search as general strategies for learning approximate environment models suitable for use in planning.
Abstract: Policy gradient methods are a natural choice for learning a parameterized policy, especially for continuous actions, in a model-free way. These methods update policy parameters with stochastic gradient descent by estimating the gradient of a policy objective. Many of these methods can be derived from or connected to a well-known policy gradient theorem that writes the true gradient in the form of the gradient of the action likelihood, which is suitable for model-free estimation. In this talk, Rupam Mahmood revisits this theorem and looks for other forms of writing the true gradient that may give rise to new classes of policy gradient methods.
Like what you’re learning here? Take a deeper dive into the world of RL with the Reinforcement Learning Specialization, offered by the University of Alberta and Amii. Taught by Martha White and Adam White, this specialization explores how RL solutions help solve real-world problems through trial-and-error interaction, showing learners how to implement a complete RL solution from beginning to end. Enroll in this specialization now!
Mar 16th 2023
News
Learn how Amii made history as the first Official AI Partner at the JUNO Awards and how AI is being used in sound, music and creativity
Mar 15th 2023
News
JUNOS week brought an indelible buzz to Edmonton. Amii is proud to have been a part of it–as we continue to push what’s possible in AI and celebrate Canadian ideas and innovation.
Mar 13th 2023
News
This AI Meetup discussed using federated learning in healthcare projects and beyond, feat. Amii Fellows Ross Mitchell & Randy Goebel.
Looking to build AI capacity? Need a speaker at your event?