News
Now that the 2020 Tea Time Talks are on Youtube, you can always have time for tea with Amii and the RLAI Lab! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, these 20-minute talks on technical topics are delivered by students, faculty and guests. The talks are a relaxed and informal way of hearing leaders in AI discuss future lines of research they may explore, with topics ranging from ideas starting to take root to fully-finished projects.
Week ten of the Tea Time Talks features:
In this talk, Abhishek talks about a family of new learning and planning algorithms for average-reward Markov decision processes. Key to these algorithms is the use of the temporal-difference (TD) error to update the reward rate estimate instead of the conventional error, enabling proofs of convergence in the general off-policy case without recourse to any reference states. Empirically, this generally results in faster learning, while reliance on a reference state can result in slower learning and risks divergence. Abhishek also presents a general technique to estimate the actual ‘centered’ value function rather than the value function plus an offset.
Spinal cord injury can cause paralysis of the legs. In this talk, Ashley introduces a spinal cord implant that her lab used to generate walking in a cat model. She then describes how they used general value functions (GVFs) and Pavlovian control to produce highly adaptable over-ground walking behaviour.
In this talk, Alex discusses a model-based RL algorithm that is based on optimism principle: in each episode, the set of models that are “consistent” with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models.
Policy gradient methods have a critic baseline to reduce the variance of their estimate. In this talk, Shivam discusses a simple idea for an analogous baseline for the log-likelihood part of the policy gradient. First, Shivam shows that the softmax policy gradient in the case of bandits can be written in two different but equivalent expressions, which motivates the log-likelihood baseline. While one of these expressions is the widely-used regular expression, the other doesn't seem to be popular in the literature. Shivam then shows how these expressions can be extended to the full Markov decision process (MDP) case under certain assumptions.
The Tea Time Talks have now concluded for the year, but stay tuned as we will be uploading the remaining talks in the weeks ahead. In the meantime, you can rewatch or catch up on previous talks on our Youtube playlist.
Sep 27th 2023
News
A new report by Deloitte Canada on Canada’s national AI ecosystem finds that Canada tops world rankings in talent concentration, with patent growth and per-capita VC investments among the world’s highest.
Sep 25th 2023
News
Amii's Chief Scientific Advisor announces partnership with John Carmack to bring greater focus and urgency to the creation of artificial general intelligence (AGI).
Sep 21st 2023
News
On August 18, Kristen Yu —a PhD Candidate at the University of Alberta — presented “Adventures of AI Directors Early in the Development of Nightingale" at the AI Seminar.
Looking to build AI capacity? Need a speaker at your event?