A History of Reinforcement Learning at Amii

A History of Reinforcement Learning at Amii

For over 20 years, Amii and its Fellows and researchers have been at the centre of Reinforcement Learning research and development.

From defining the very foundation of modern AI to tackling some of the world's most complex challenges, the journey of Reinforcement Learning (RL) is a compelling story of relentless innovation.

Timeline of RL

Reinforcement Learning: A Brief History

Many of the key moments and discoveries in the history of Reinforcement Learning happened with Amii and its people at the centre.

1998

Richard Sutton literally writes the book on RL

The entire story of modern AI learning starts with Reinforcement Learning (RL), thanks to the groundwork laid by Richard S. Sutton and long-time partner Andrew G. Barto. Their landmark book, Reinforcement Learning: An Introduction established the core principles that define how an intelligent system should learn: by making decisions (actions) in an environment, receiving feedback (rewards), and optimizing its strategy to maximize long-term rewards.

So what?!

This book remains the foundational text on Reinforcement Learning, not just as a field of academic study but in terms of its real-world applications.

2002

Amii is founded

Amii was originally founded in 2002 as the Alberta Ingenuity Centre For Machine Learning (AICML), by four visionary researchers.Established from within the Computing Science department at in the University of Alberta, AICML quickly becomes an early and influential centre for machine learning research.

So what?!

This put Edmonton on the map as a global hub for AI – long before "Artificial Intelligence" became a household term.

2003

Richard Sutton joins Amii

Professor Sutton is recruited to the University of Alberta and Amii (AICML), where he proceeds to launch the RLAI (Reinforcement Learning and Artificial Intelligence) Lab – and where he continues to teach, supervise, and conduct research today.

So what?!

This move solidified Edmonton’s status as the best place for Reinforcement Learning, attracting the world's brightest minds to Alberta.

2007

Amii's Jonathan Schaeffer solves checkers

Jonathan Schaeffer, Amii Fellow and University of Alberta Professor of Computing Science, led the team that developed the Chinook software that 'solved' the game of checkers. This landmark development in the history of AI specifically helped solve the delayed reward challenge in RL: that is, the difficulty of teaching an AI that a current action may not pay off until much later in the game.

So what?!

Comparable to the Deep Blue's breakthrough in Chess, this was at the time the most challenging game to be fully solved, and represented a major milestone in AI and computing science.

2013

Amii's Michael Bowling develops the Atari Benchmark

Michael Bowling, an Amii Fellow, Canada CIFAR AI Chair, and Professor at the University of Alberta, leads a team that establishes the Atari Benchmark for measuring the effectiveness of Reinforcement Learning algorithms.This benchmark, which really is based on gameplay of the original 57 Atari-2600 games, is officially known as the Arcade Learning Environment (ALE) – but is much more commonly known by its gaming-inspired name.

So what?!

The Atari Benchmark remains a widely cited standard used by researchers and institutions around the world – most notably by Google DeepMind, in their work on Deep Q-Networks (DQN).

2014

Amii's Patrick Pilarski begins Blinc Lab research

Since 2014, Amii Fellow Patrick Pilarski and other researchers at the Blinc Lab have been conducting groundbreaking research into prosthetic-limb control. This has culminated today in adaptive prosthetics that leverage Reinforcement Learning to actually learn from the people using them. As a result, instead of being stiff or hard to control, these prosthetics adapt to the user's movements and start to feel more like a real part of the body.

So what?!

This real-world application of Reinforcement Learning in prosthetics is restoring the independence – and transforming the lives – of its users.

2015

Michael Bowling solves poker - twice!

Bowling and his Computer Poker Research Group went on to solve Heads-Up Limit Texas Hold’em poker in 2015, using Cepheus, the first AI that solves this "imperfect information" game. Then, in 2017, this group made the massive leap of beating professional players at No-Limit Poker, with their model DeepStack.Cepheus proved its methods could handle hidden information, making its AI algorithms directly relevant to complex real-world strategy in areas like finance and negotiations.DeepStack showed that AI could handle the uncertainty of real-world gaming with imperfect information.

So what?!

Solving complex games is at the heart of AI research, with Poker – a much 'messier' game, being the next major frontier after Checkers and Chess.

2016

DeepMind’s AlphaGo solves Go

In one of the most famous moments in the history of AI, DeepMind’s AlphaGo defeats a world champion in the game of Go—a feat many thought was decades away.Many of the lead researchers on the AlphaGo team, such as David Silver, were graduates of the University of Alberta – and trained under Amii researchers such as Richard Sutton.

So what?!

Solving Go was considered a monumental challenge for computing and AI that was still thought to be years away – and served as "proof" of the effectiveness of Reinforcement Learning.

2017

DeepMind opens an Edmonton office

In 2017, DeepMind, now owned by Google (Alphabet), opened its first research base outside the UK – here in Edmonton, at Amii’s headquarters. Richard Sutton, DeepMind’s original scientific advisor, is named to lead the Edmonton-based institute.

So what?!

The Edmonton-based DeepMind office was the first Deepmind office to open outside of the UK. This move put Edmonton and Amii on the map as a global centre of AI excellence.

2018

RLCore Technologies Launches

Amii researchers Martha White and Adam launch their own startup, RL Core, to apply their expertise in Reinforcement Learning to the world of industrial controls. Their systems intelligently manage water quality and treatment, helping drive major efficiencies and maintaining critical environmental infrastructure.

So what?!

This example brings RL out of the realm of the theoretical and into the real world, using intelligence to protect our environment and manage critical resources more efficiently.