News
Now that the 2020 Tea Time Talks are on Youtube, you can always have time for tea with Amii and the RLAI Lab! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, these 20-minute talks on technical topics are delivered by students, faculty and guests. The talks are a relaxed and informal way of hearing leaders in AI discuss future lines of research they may explore, with topics ranging from ideas starting to take root to fully-finished projects.
Week ten of the Tea Time Talks features:
In this talk, Abhishek talks about a family of new learning and planning algorithms for average-reward Markov decision processes. Key to these algorithms is the use of the temporal-difference (TD) error to update the reward rate estimate instead of the conventional error, enabling proofs of convergence in the general off-policy case without recourse to any reference states. Empirically, this generally results in faster learning, while reliance on a reference state can result in slower learning and risks divergence. Abhishek also presents a general technique to estimate the actual ‘centered’ value function rather than the value function plus an offset.
Spinal cord injury can cause paralysis of the legs. In this talk, Ashley introduces a spinal cord implant that her lab used to generate walking in a cat model. She then describes how they used general value functions (GVFs) and Pavlovian control to produce highly adaptable over-ground walking behaviour.
In this talk, Alex discusses a model-based RL algorithm that is based on optimism principle: in each episode, the set of models that are “consistent” with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models.
Policy gradient methods have a critic baseline to reduce the variance of their estimate. In this talk, Shivam discusses a simple idea for an analogous baseline for the log-likelihood part of the policy gradient. First, Shivam shows that the softmax policy gradient in the case of bandits can be written in two different but equivalent expressions, which motivates the log-likelihood baseline. While one of these expressions is the widely-used regular expression, the other doesn't seem to be popular in the literature. Shivam then shows how these expressions can be extended to the full Markov decision process (MDP) case under certain assumptions.
The Tea Time Talks have now concluded for the year, but stay tuned as we will be uploading the remaining talks in the weeks ahead. In the meantime, you can rewatch or catch up on previous talks on our Youtube playlist.
Apr 8th 2024
News
Amii Fellows share tips on how to make the most of your conference experience.
Mar 26th 2024
News
In this month's episode, Alona talks about how ChatGPT changed the public’s perception of what AI language models can do, instantly making most previous benchmarks seem out of date, and the excitement and intensity of working in a fast-moving field like AI.
Mar 18th 2024
News
Google.org announces new research grants to support critical AI research in Canada focused on areas such as sustainability and the responsible development of AI. The grant will provide a total of $2.7 million in grant funding to Amii, the Canadian Institute for Advanced Research (CIFAR) and the International Center of Expertise of Montreal on AI (CEIMIA).
Looking to build AI capacity? Need a speaker at your event?