Amii supports cutting-edge research by leveraging scientific advancement into industry adoption, enabling our world-leading researchers to focus on solving tough problems while our teams translate knowledge, talent and technology – creating an integrated system that allows both research and industry to thrive.

Such cutting-edge research is currently being featured at the Eighth International Conference on Learning Representations (ICLR), running online this year from April 26 to May 1. ICLR is the premier gathering of professionals dedicated to advancing the branch of AI called representation learning, also referred to as deep learning. The conference is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of AI, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.

Accepted papers from Amii researchers cover a range of topics including the reduction of overestimation bias in Q-learning, training RNNs more effectively by reformulating the training objective, and the reduction of selection bias when estimating treatment effects from observational data.

Learn more below about how Amii Fellows and researchers – professors and students at the University of Alberta – are contributing to this years’ proceedings.

Several papers co-authored by Amii Fellows and students have been accepted for publication by ICLR in 2020:

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning Qingfeng Lan and Yangchen Pan (Amii students), Alona Fyshe and Martha White (Amii Fellows) Abstract: Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph{Maxmin Q-learning}, which provides a parameter to flexibly control bias; 3) show theoretically that there exists a parameter choice for Maxmin Q-learning that leads to unbiased estimation with a lower approximation variance than Q-learning; and 4) prove the convergence of our algorithm in the tabular case, as well as convergence of several previous Q-learning variants, using a novel Generalized Q-learning framework. We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
Learning Disentangled Representations for CounterFactual Regression Negar Hassanpour (Amii student), Russell Greiner (Amii Fellow) Abstract: We consider the challenge of estimating treatment effects from observational data; and point out that, in general, only some factors based on the observed covariates X contribute to selection of the treatment T, and only some to determining the outcomes Y. We model this by considering three underlying sources of {X, T, Y} and show that explicitly modeling these sources offers great insight to guide designing models that better handle selection bias. This paper is an attempt to conceptualize this line of thought and provide a path to explore it further. In this work, we propose an algorithm to (1) identify disentangled representations of the above-mentioned underlying factors from any given observational dataset D and (2) leverage this knowledge to reduce, as well as account for, the negative impact of selection bias on estimating the treatment effects from D. Our empirical results show that the proposed method achieves state-of-the-art performance in both individual and population based evaluation measures.
Progressive Memory Banks for Incremental Domain Adaptation Nabiha Asghar, Lili Mou (Amii Fellow), Kira A. Selby, Kevin D. Pantasdo, Pascal Poupart, Xin Jiang Abstract: This paper addresses the problem of incremental domain adaptation (IDA) in natural language processing (NLP). We assume each domain comes one after another, and that we could only access data in the current domain. The goal of IDA is to build a unified model performing well on all the domains that we have encountered. We adopt the recurrent neural network (RNN) widely used in NLP, but augment it with a directly parameterized memory bank, which is retrieved by an attention mechanism at each step of RNN transition. The memory bank provides a natural way of IDA: when adapting our model to a new domain, we progressively add new slots to the memory bank, which increases the number of parameters, and thus the model capacity. We learn the new memory slots and fine-tune existing parameters by back-propagation. Experimental results show that our approach achieves significantly better performance than fine-tuning alone. Compared with expanding hidden states, our approach is more robust for old domains, shown by both empirical and theoretical results. Our model also outperforms previous work of IDA including elastic weight consolidation and progressive neural networks in the experiments.
Training Recurrent Neural Networks Online by Learning Explicit State Variables Somjit Nath (Amii alum), Vincent Liu, Alan Chan, Xin Li (Amii students), Adam White and Martha White (Amii Fellows) Abstract: Recurrent neural networks (RNNs) allow an agent to construct a state-representation from a stream of experience, which is essential in partially observable problems. However, there are two primary issues one must overcome when training an RNN: the sensitivity of the learning algorithm’s performance to truncation length and and long training times. There are variety of strategies to improve training in RNNs, the mostly notably Backprop Through Time (BPTT) and by Real-Time Recurrent Learning. These strategies, however, are typically computationally expensive and focus computation on computing gradients back in time. In this work, we reformulate the RNN training objective to explicitly learn state vectors; this breaks the dependence across time and so avoids the need to estimate gradients far back in time. We show that for a fixed buffer of data, our algorithm—called Fixed Point Propagation (FPP)—is sound: it converges to a stationary point of the new objective. We investigate the empirical performance of our online FPP algorithm, particularly in terms of computation compared to truncated BPTT with varying truncation levels.
Frequency-based Search-control in Dyna Yangchen Pan, Jincheng Mei (Amii students) and Amir-massoud Farahmand (Amii alum) Abstract: Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. In particular, Dyna is an elegant model-based architecture integrating learning and planning that provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action pairs from which we query the model to acquire simulated experiences. Search-control is critical in improving learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency regions of the value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient and hessian norms, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.

In addition, Amii is also organizing three socials throughout the conference:

Amii Chief Scientific Advisor Dr. Richard Sutton will host a session on what he calls The Bitter Lesson of AI research, that “general methods that leverage computation are ultimately the most effective, and by a large margin” and “[t]he eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.”
The RL Mixer brings together researchers interested in reinforcement learning for a sequence of randomly formed small group discussions. Participants will get opportunities to discuss a wide variety of topics with new people through Zoom breakout rooms, with 30 minutes per group discussion.
The Amii Fellows Meet & Greet is a chance to meet and engage with Amii Fellows in conversations relevant to their research areas and experience.