Amii is proud to feature the work of our researchers at the 19th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). Amii supports cutting-edge research by leveraging scientific advancement into industry adoption, enabling our world-leading researchers to focus on solving tough problems while our teams translate knowledge, talent and technology – creating an integrated system that allows both research and industry to thrive.
Such cutting-edge research is currently being featured at AAMAS, running online this year from May 9 to 13. AAMAS is a globally-renowned scientific conference for research in autonomous agents and multi-agent systems.
“Agents, entities that can interact with their environment or other agents, are an increasingly important field of artificial intelligence. Agents can learn, reason about others, adopt norms, and interact with humans in both virtual and physical settings,” explains Matthew E. Taylor, Amii Fellow at the University of Alberta, in a recent blog post. “This field includes contributions to many areas across artificial intelligence, including game theory, machine learning, robotics, human-agent interaction, modeling, and social choice.”
Accepted papers from Amii researchers cover a range of topics including: the interaction of online neural network training and interference in reinforcement learning; the introduction of deep anticipatory networks, which enable an agent to take actions to reduce its uncertainty without performing explicit belief inference; and multi agent deep reinforcement learning.
Learn more below about how Amii Fellows and researchers – professors and graduate students at the University of Alberta – are contributing to this years’ proceedings:
Dustin Morrill and Ryan D’Orazio (Amii researchers), James Wright and Michael Bowling (Amii Fellows)
Abstract: Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a flexible and simple algorithm for approximately solving imperfect information games with policies parameterized by a normalized rectified linear unit (ReLU). In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and has a regret bound with a better dependence on the number of actions in the tabular case. We derive approximation error-aware regret bounds for $(\Phi, f)$-regret matching, which applies to a general class of link functions and regret objectives. These bounds recover a tighter bound for RCFR and provides a theoretical justification for RCFR implementations with alternative policy parameterizations ($f$-RCFR), including softmax. We provide exploitability bounds for $f$-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games, and examine empirically how the link function interacts with the severity of the approximation to determine exploitability performance in practice. Although a ReLU parameterized policy is typically the best choice, a softmax parameterization can perform as well or better in settings that require aggressive approximation.
Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E. Taylor and Nidhi Hegde (Amii Fellows)
Abstract: Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field games, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field games: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.
Yash Satsangi, Sungsu Lim (Amii researcher), Shimon Whiteson, Frans Oliehoek, Martha White (Amii Fellow)
Abstract: Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent’s uncertainty. For example, the reward can be the negative entropy of the agent’s belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks. This paper tackles the challenge of using belief-based rewards for a deep RL agent, by offering a simple insight that maximizing any convex function of the belief of the agent can be approximated by instead maximizing a prediction reward: a reward based on prediction accuracy. In particular, we derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards—namely visual attention, question answering systems, and intrinsic motivation—and highlights their connection to the usually distinct fields of active perception, active sensing, and sensor placement. Based on this insight we present deep anticipatory networks (DANs), which enables an agent to take actions to reduce its uncertainty without performing explicit belief inference. We present two applications of DANs: building a sensor selection system for tracking people in a shopping mall and learning discrete models of attention on fashion MNIST and MNIST digit classification.
Sina Ghiassian and Banafsheh Rafiee (Amii researchers), Yat Long Lo (visitor), Adam White (Amii Fellow)
Abstract: Reinforcement learning systems require good representations to work well. For decades practical success in reinforcement learning was limited to small domains. Deep reinforcement learning systems, on the other hand, are scalable, not dependent on domain specific prior knowledge and have been successfully used to play Atari, in 3D navigation from pixels, and to control high degree of freedom robots. Unfortunately, the performance of deep reinforcement learning systems is sensitive to hyper-parameter settings and architecture choices. Even well tuned systems exhibit significant instability both within a trial and across experiment replications. In practice, significant expertise and trial and error are usually required to achieve good performance. One potential source of the problem is known as catastrophic interference: when later training decreases performance by overriding previous learning. Interestingly, the powerful generalization that makes Neural Networks (NN) so effective in batch supervised learning might explain the challenges when applying them in reinforcement learning tasks. In this paper, we explore how online NN training and interference interact in reinforcement learning. We find that simply re-mapping the input observations to a high-dimensional space improves learning speed and parameter sensitivity. We also show this preprocessing reduces interference in prediction tasks. More practically, we provide a simple approach to NN training that is easy to implement, and requires little additional computation. We demonstrate that our approach improves performance in both prediction and control with an extensive batch of experiments in classic control domains.
One Extended Abstract co-authored by an Amii Fellow has also been accepted for publication on the JAAMAS Track:
Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor (Amii Fellow)
Abstract: Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.