We’ve covered what the ML Process Lifecycle (MLPL) is and why it’s important in Part 1, and looked at the framework and its key aspects in Part 2. In this third and final post of the MLPL series, we’ll take a look at what the framework looks like in action!

It’s a given that whenever we take on a project, we want it to go smoothly. Unfortunately, the reality is that most (if not all) projects will run into obstacles and challenges. This is especially true for exploration tasks. 

Expectations vs. reality

This is what we expect:

But this is usually what we get:

As illustrated above, we frequently have to pump the brakes and go back, either to one of the previous phases or to one of the modules in the same phase. We refer to this as a lifecycle switch. A lifecycle switch forces you to revisit past stages in order to address new information and challenges as they come up.

That’s also the reason why we call this a “cycle”; one step follows the next, and each step affects all the others, especially the ones downstream. This means that if you go back and change or re-do a step, you’re going to have to revisit the subsequent steps, because they’re probably going to change, too. 

Framework in action

The below example shows the lifecycle switches of a past Amii ML project:

While phase one went smoothly (the first time around), in phase two we identified that we had data for only one season of the year. Temperature affected the business question we wanted to answer, so in order to have an accurate year-round model, we needed data from all four seasons. Once the new data was collected, we had to do most of the data analysis over again, such as checking assumptions, aligning different files, and cleaning the data. 

You can see in the fifth line that after we had that dataset in a good place for building our machine learning model, we ran into another lifecycle switch. The organization decided that the original problem they had defined was no longer a business priority; while this may seem odd, it’s actually quite common for business objectives to change in exploratory projects, and in this case it made sense to start from the beginning to pursue a business objective that provided greater business value.

As a result, the whole project was switched back to phase one to iterate again, and went through several more lifecycle switches before eventually concluding with a desirable ML Solution.

Reasons for a lifecycle switch

As you can see in the example above, there are a number of reasons why a lifecycle switch might be necessary. 

For example, there may be business reasons, such as the defined business problem not aligning properly with business goals, a change in the business objectives of the organization, or insufficient business value to justify the expense of an ML project. 

Data issues often result in lifecycle switches as well, such as insufficient data quantity or quality, the dataset to hand not addressing the defined business problem, or the historic data not being an accurate model of the current situation. And sometimes the ML development itself goes awry, or doesn’t give accurate enough answers. 

Lifecycle switches are part of the process of an ML project, inherent in its iterative nature. It is essential that organizations understand going into a project that these switches are going to happen.

Final words

We hope you have enjoyed our series on the MLPL! We know that ML projects are challenging; they’re iterative, exploratory, and often have unexpected obstacles. That is why having a framework, such as our MLPL, can be very helpful. It gives technical experts, non-technical managers, and other stakeholders such as clients and boards a clearer idea of what is involved, and it gives a framework and common vocabulary to communicate about value, objectives, expectations and obstacles.


Amii’s MLPL Framework leverages already-existing knowledge from the teams at organizations like Microsoft, Uber, Google, Databricks and Facebook. The MLPL has been adapted by Amii teams to be a technology-independent framework that is abstract enough to be flexible across problem types and concrete enough for implementation. To fit our clients’ needs, we’ve also decoupled the deployment and exploration phases, provided process modules within each stage and defined key artifacts that result from each stage. The MLPL also ensures we’re able to capture any learnings that come about throughout the overall process but that aren’t used in the final model.

If you want to learn more about this and other interesting ML topics, we highly recommend Amii’s recently launched ML Technician Certificate Course. Visit the Amii Educates page to learn about all of our educational offerings.

This article was written by Amii’s Applied ML Scientists: Talat Iqbal, Luke Kumar, Shazan Jabbar and Sankalp Prabhakar; as well as Amii’s Director of Explores, Anna Koop, and Amii Educator Heather von Stackelberg.

Now that we understand what the ML Process Lifecycle (MLPL) is and why it’s important (if you haven’t yet, read Part 1 here), we will take a look at the framework itself and go through the key aspects of each stage.

Stages of the MLPL

There are four stages in the MLPL:

1. Business Understanding and Problem Discovery:
This stage identifies a business problem and a corresponding ML problem. For example, if the business problem is to get existing customers to consume more streaming content, a corresponding ML solution could be to implement an algorithm which recommends content they should consume based on their viewing history.

2. Data Acquisition and Understanding:
This stage explores the available data and identifies the possibilities and restrictions for its use in ML. This would involve an in-depth analysis of the data and its potential.

3. ML Modelling and Evaluation:
This stage is where the ML algorithms come in. Many organizations start at this stage, assuming it’s the only part of the process that needs to be done to arrive at a solution. However, the first two stages are critical to determining what ML algorithm(s) and configurations to use.

4. Delivery and Acceptance:
This stage is where we validate if the ML problem is addressing the initial business problem. An ideal project should arrive at this stage only once, but given how quickly a project evolves for various reasons, there is a possibility that this stage may have to be revisited. Good communication among all the stakeholders and clarity in the problem definition will minimize the amount of times this stage will need to be visited.

There are several modules that fall under each of the four stages in the framework.

Business Understanding and Problem Discovery

Few key aspects to be taken care of during this stage are:

Objectives: Identify business objectives that ML techniques can address.

Problem Definition: Discover the ML problem that would help solve the business problem. Sometimes, there is one exact problem to address one ML problem, and sometimes multiple ML problems together address the business problem.

Data Sources: Identify existing data sources. In the real world, the data typically comes from different sources and has been combined from these sources. Identifying the data sources will help in narrowing down the data that can be useful. Data sources can be proprietary in-house data, publicly available data, or data that can be bought from third parties.

Current Practices: Identify what business process or practices are in place that are addressing the business problem in the current setting, if any. The business problem could be completely new or an existing one.

Development Environment: Define development and collaborative environment (code/data repos, programming languages, etc.).

Communication: Agree on methods of communication and the frequency of communication.

Milestones: Define milestones, timelines and deliverables. Sometimes it’s not feasible to arrive at definite milestones, given this is an exploration task. But thinking in that direction will help to add structure.

Resources: Identify the resources that will be required. The resources can be time, money, employees (e.g. data engineers, analysts, scientists) or computational resources. 

Stakeholders: Identify internal/external stakeholders and their roles. There are usually multiple stakeholders who should be a part of this process continuously. For example, the management team that decided that an ML approach should be tried, a technical team that is actively exploring the solution, teams that would own the different stages of exploration, and third parties associated with the development and deployment of the final solution. All the teams involved in each of the stages of MLPL should be on the same page.

Constraints: Identify the constraints that are acceptable for the problem. Do we need ML solutions that are interpretable? Is there any part of data that should be removed due to privacy concerns?

By the end of this stage, we would have identified and defined our goals to help us understand the problem better and dive deeper into subsequent stages of MLPL. Worksheets and other tools can be helpful at this stage.

Data Acquisition and Understanding

Acquisition: Acquiring the data is an important task. After the data sources have been identified (in the Business Understanding and Problem Discovery stage), the data sources must be combined into one data source. In some cases, aligning and combining the data sources may require in-depth domain knowledge and expertise.

Pre-Processing: Data that has been acquired may not be in a form that is readable by tools and libraries used to create machine learning models. There are usually two steps to it. The first one is to translate the data to a form that is related to the ML problem domain. For example, for text processing, if your original data is in a scanned image format, the first step will be to convert those images of text documents into text that can be used by text algorithms. The second step will be to convert the data to either support the specific algorithms (for example: change categorical variables to numerical) or other transformation techniques (for example: standardization, scaling) that will help in improving the results. 

Cleaning: In the real world, data is usually corrupt due to various reasons. Inaccurate readings from sensors, inconsistencies across readings and invalid data are some of the data issues that can be found. A thorough analysis on how to fix these values with the help of a domain and data expert should be carried out.

Pipeline: A pipeline is a sequence of tasks that can be used to automate repeated tasks. The tasks involved could be extracting data from different sources to a single place, pre-processing data into a form that can be stored and retrieved efficiently and the data can be loaded into a necessary format that can be used by the machine learning algorithms.

Exploratory Data Analysis: Engage in exploratory data analysis to gain understanding about the data. Understanding the data is very important and could lead to better design and selection of ML process. Also, it gives an in-depth understanding of what could be useful information for further steps.

Feature Engineering: Feature engineering is a continuous process and would occur in various stages of an ML process. In the Data Acquisition and Understanding phase, feature engineering might be to identify the features that are irrelevant and do not add any information. For example, in high dimensional data, feature engineering may look to remove features whose variance is close to 0.

Data Split: The data should be split in such a way that there is a portion of data called ‘training data’. Training data is used for training the QuAM and there is a separate portion of data called ‘test data’ to evaluate how good the QuAM is.


ML Modelling & Evaluation

Algorithm Selection: Algorithm Selection is a process of narrowing down a suite of algorithms that would suit the problem and data. With many algorithms across various domains in ML, narrowing down helps us to focus on certain selected algorithms and working with them to arrive at a solution.

Feature engineering: This part of feature engineering focuses on preparing the dataset to be compatible with the ML algorithm. Data transformation, dimensionality reduction, handling of outliers, handling of categorical variables are some examples of feature engineering techniques.

QuAM/Model Training: Once an algorithm has been selected and data is prepared for the algorithm, we need to build the Question and Answer Machine (QuAM) — a combination of an algorithm and data. In the ML world, a QuAM is also referred to as a model. QuAM training includes using the training data to learn a QuAM that can generalize well. 

Evaluation: Identifying the evaluation criteria is an important task. If your task is classification and the success of a model is defined by the number of currently identified instances, then you can use accuracy as your evaluation metric. If there is a cost associated with identifying false positive or false negatives, then other measures such as precision and recall can be used.

Refinement: Refine the model by identifying the best parameters for each of the algorithms on which you have trained the QuAM. This step is called hyperparameter tuning and is used to find the optimal parameters for a model.


Delivery and Acceptance

This is the stage where we confirm if the ML problem is addressing the business problem. Having a conversation with the employer or client is vital to understanding if the business problem is addressed. 

ML Solution: From the delivery perspective, an ML solution is to be delivered to the client. The solution could be in one or all three of the forms below:

Prototype: Source code of the prototype is provided along with readme and dependency files on how to use the prototype. The prototype need not necessarily be a production-level code, but should be clean enough with comments, and relatively stable so that the engineering teams can use it to build a product.

Documentation: Good documentation always accompanies a prototype. Some of the technical details should be listed and explained. 

Project Report: This is a complete list of methodologies used and decisions taken along the lifetime of the project, as well as the reason(s) behind those decisions. This gives a high-level idea of what was achieved in the project.

Knowledge Transfer: Identify in-house training required for understanding ML solution and present it to the client. This is the appropriate time to clarify questions regarding the ML solution, and acts as a feedback checkpoint prior to incorporating the ML solution into full operation.

Handoff: Turn over all materials to client so that they can execute it.

Total MLPL Framework

Viewed together, the entire framework looks like this:


In the third and final part of the MLPL Series, we will be taking a look at lifecycle switches and what you can realistically expect on your journey to an ML solution. Stay tuned!

Amii’s MLPL Framework leverages already-existing knowledge from the teams at organizations like Microsoft, Uber, Google, Databricks and Facebook. The MLPL has been adapted by Amii teams to be a technology-independent framework that is abstract enough to be flexible across problem types and concrete enough for implementation. To fit our clients’ needs, we’ve also decoupled the deployment and exploration phases, provided process modules within each stage and defined key artifacts that result from each stage. The MLPL also ensures we’re able to capture any learnings that come about throughout the overall process but that aren’t used in the final model.

If you want to learn more about this and other interesting ML topics, we highly recommend Amii’s recently launched online course Machine Learning: Algorithms in the Real World Specialization, taught by our Director of Amii Explores, Anna Koop. Visit the Amii Educates page to learn about all of our educational offerings.

This article was written by Amii’s Applied ML Scientists: Talat Iqbal, Luke Kumar, Shazan Jabbar and Sankalp Prabhakar; as well as Amii’s Director of Explores, Anna Koop.

Machine Learning (ML) has been experiencing explosive growth in popularity due to its ability to learn from data automatically with minimal human intervention. As ML is implemented and applied more in business settings, ML practitioners need to develop methods to describe the timing of their project work to their employers or clients.

One tool which is particularly useful in this regard is the ML Process Lifecycle, a process framework adapted by the Amii team (see note below). In this three-part blog series, we will be exploring what it is, why it’s important and how you can implement it.

What is the MLPL ?

The ML Process Lifecycle (MLPL) is a framework that captures the iterative process of developing an ML solution for a specific problem. 

ML project development and implementation is an exploratory and experimental process where different learning algorithms and methods are tried before arriving at a satisfactory solution. The journey to reach an ML solution that meets business expectations is rarely linear – as an ML practitioner advances through different stages of the process and more information is generated or uncovered, they may need to go back to make changes or start over completely. 

The MLPL tries to capture this dynamic workflow between different stages and the sequence in which these stages are carried out. 

Where does the MLPL fit ?

When business organizations develop new software systems or introduce new features to existing systems, they go through two major phases: 

  1. Business analysis: making assessments and business decisions regarding the value and feasibility of a new software product or feature; and 
  2. Product development: developing the solution (usually following one of the existing software development methodologies) and putting it in the production. 

However, when an organization thinks about adopting ML – either to complement their current software products/services or to address a fresh business problem – there is an additional exploration phase between the business analysis and product development phases. The MLPL streamlines and defines this process.

The Machine Learning Process Lifecycle comprises Business Decisions, Machine Learning Exploration and Development, Deployment & Maintenance

The MLPL is an iterative methodology to execute ML exploration tasks, generalizing the process so that it is flexible and modular enough to be applied to different problems in different domains, while at the same time having enough modules to fully describe relevant decision points and milestones.

What does an ML Exploration Process entail?

Ideally, an organization would want to know all possibilities and consequences of an ML solution before introducing it into a system. The ML Exploration Process seeks to determine whether or not an ML solution is the best business decision by addressing the following questions:

  1. Can ML address my business problem?
  2. Is there a supporting data?
  3. Can algorithms take advantage of the data?
  4. What is the value added by introducing ML?
  5. What is the technical feasibility of arriving at a solution with ML?
What doesn’t the MLPL capture?

When an organization first begins to think about adopting ML, often the first thing it will do is perform a business analysis. This involves identifying business workflows, business problems, resource assessment, identifying tasks and decision points which ML solutions could fit into and return business value. The MLPL does not capture all aspects of this, only addressing those pieces which directly impact ML problem definition.

After ML Exploration is complete, an organization may decide to develop the ML solution into a product or service as a tangible component, deploying it into production and maintaining it. This phase is also not captured in the MLPL. 

The MLPL only deals with the exploration phase where different methods are tried to arrive at a proof-of-concept solution which can be later adapted to develop a complete ML system.

Why do we need the MLPL?

We have seen an overview of what the MLPL captures and what it does not. But why do we need a process to capture an exploration task? There are a few important reasons why an organization should use the MLPL:

  • Risk Mitigation: The MLPL standardizes the stages of an ML project and defines standard modules for each of those stages, thereby minimizing the risk of missing out on important ML practices. 
  • Standardization: Standardizing the workflow across teams through an end-to-end framework enables the users to easily build and operate ML systems while being consistent, and allows the inter-team tasks to be carried out smoothly.
  • Tracking: The MLPL allows you to track the different stages and the modules inside each of the stages. This being an exploration task, there are a lot of attempts that will never be used in the final ML solution, but have required significant investment. The MLPL allows you to track the resources that have been spent on these experiments and to evaluate for future iterations.
  • Reproducibility: Having a standardized process enables an organization to build pipelines for creating and managing experiments, which can be compared and reproduced for future projects.
  • Scalability: A standard workflow also allows an organization to manage multiple experiments simultaneously.
  • Governance: Well-defined stages and modules for each stage will help in better audits to assess if the ML systems are designed appropriately and operating effectively.
  • Communication: A standard guideline helps in setting the expectations and effectively facilitate communication between teams about the workflow of the projects.

In Part 2 of the MLPL Series, we will be taking an in-depth look at the MLPL framework and go through the key aspects of each stage. Stay tuned!

Amii’s MLPL Framework leverages already-existing knowledge from the teams at organizations like Microsoft, Uber, Google, Databricks and Facebook. The MLPL has been adapted by Amii teams to be a technology-independent framework that is abstract enough to be flexible across problem types and concrete enough for implementation. To fit our clients’ needs, we’ve also decoupled the deployment and exploration phases, provided process modules within each stage and defined key artifacts that result from each stage. The MLPL also ensures we’re able to capture any learnings that come about throughout the overall process but that aren’t used in the final model.

If you want to learn more about this and other interesting ML topics, we highly recommend Amii’s recently launched online course Machine Learning: Algorithms in the Real World Specialization, taught by our Director of Amii Explores, Anna Koop. Visit the Amii Educates page to learn about all of our educational offerings.

This article was written by Amii’s Applied ML Scientists: Talat Iqbal, Luke Kumar, Shazan Jabbar and Sankalp Prabhakar; as well as Amii’s Director of Explores, Anna Koop.

We are incredibly excited to announce three new courses on artificial intelligence and machine learning, co-developed by Amii and the University of Alberta Faculty of Extension!

Produced in collaboration between UAlberta’s Faculty of Extension and Amii, this three-course series is ideal for technically-inclined participants who wish to build foundational knowledge in machine intelligence, develop an applied understanding for approaching machine learning projects, and gain an introduction to intermediate and advanced techniques.

Participants can expect to gain a working knowledge around important machine learning areas such as supervised learning, unsupervised learning, neural networks and reinforcement learning.

Prior knowledge of basic programming, linear algebra and statistics is expected. Experience with mathematics, statistics and analytics is strongly recommended. Participants will be expected to have the ability to read and code trace existing code; be comfortable with conditionals, loops, variables, lists, dictionaries and arrays; and should be able to produce “hello world.”

Once all three courses have been successfully completed, an official University of Alberta Notice of Completion will be issued. The courses can also be used towards the Amii Machine Learning Technician Certification program, beginning in September 2019. Students completing the Faculty of Extension series will be grandfathered into the Machine Learning Technician program with a prorated tuition.

Learn more about the individual courses below:

Introduction to Machine Learning and Artificial Intelligence

EXCPE4784

(21 hours)
April 24 – 26, 2019
8:30 a.m.− 5 p.m.
Enterprise Square, Edmonton
$1695

Students will gain an overview of machine learning and artificial intelligence, beginning with discussing supervised learning applied to a classification problem. Students will develop a working knowledge of this type of application, and how it might look in a project from start to finish. Prior knowledge of basic programming, linear algebra and statistics is expected.


Applied Machine Learning

EXCPE4785

(21 hours)
May 22 – 24, 2019
8:30 a.m.− 5 p.m.
Enterprise Square, Edmonton
$1695

This course will begin the discussion of problem definition in machine learning projects, and other issues with data acquisition, cleaning and exploratory data analysis. Students will also discuss unsupervised learning in the context of developing data for successful machine learning modelling. Prior knowledge of basic programming, linear algebra and statistics is expected.


Intermediate Machine Learning Techniques

EXCPE4786

(21 hours)
June 19 – 21, 2019
8:30 a.m.− 5 p.m.
Enterprise Square, Edmonton
$1695

This course continues from the previous, discussing more advanced techniques of machine learning, such as neural networks and support vector machines. Students will also get a brief introduction to reinforcement learning. Prior knowledge of basic programming, linear algebra and statistics is expected.


For more information, please visit the UAlberta Faculty of Extension – AI & ML Courses page: https://www.ualberta.ca/extension/continuing-education/programs/technology/ai

Part of the Alberta Machine Intelligence Institute, Marlos C. Machado is a 4th year Ph.D. student in the University of Alberta’s Department of Computing Science, supervised by Amii’s Michael Bowling.

Marlos’ research interests lie broadly in artificial intelligence with a particular focus on machine learning and reinforcement learning. Marlos is also a member of the Reinforcement Learning & Artificial Intelligence research group, led by Amii’s Richard S. Sutton.

In 2013, Amii researchers proposed the Arcade Learning Environment (ALE), a framework that poses the problem of general competency in AI. The ALE allows researchers and hobbyists to evaluate artificial intelligence (AI) agents in a variety of Atari games, encouraging agents to succeed without game-specific information. While this may not seem like a difficult feat, up to now, intelligent agents have excelled at performing a single task at a time, such as checkers, chess and backgammon – all incredible achievements!

The ALE, instead, asks the AI to perform well at many different tasks: repelling aliens, catching fish and racing cars, among others. Around 2011, Amii’s Michael Bowling began advocating in the AI research community for an Atari-based testbed and challenge problem. The community has since recognized the importance of arcade environments, shown by the release of other, similar platforms such as the GVG-AI, the OpenAI Gym & Universe,  as well as the Retro Learning Environment.

Atari 2600 games
1. Atari 2600 games: Space Invaders, Bowling, Fishing Derby and Enduro

The ALE owes some of its success to a Google DeepMind algorithm called Deep Q-Networks (DQN), which recently drew world-wide attention to the learning environment and to reinforcement learning (RL) in general. DQN was the first algorithm to achieve human-level control in the ALE.

In this post, adapted from our paper, “State of the Art Control of Atari Games Using Shallow Reinforcement Learning,” published earlier this year, we examine the principles underlying DQN’s impressive performance by introducing a fixed linear representation that achieves DQN-level performance in the ALE.

The steps we took while developing this representation illuminate the importance of biases being encoded in neural networks’ architectures, which improved our understanding of deep reinforcement learning methods. Our representation also frees agents from necessarily having to learn representations every time an AI is evaluated in the ALE. Researchers can now use a good fixed representation while exploring other questions, which allows for better evaluation of the impact of their algorithms because the interaction with representation learning solutions can be put aside.

Impact of Deep Q-Networks

In reinforcement learning, agents must estimate how “good” a situation is based on current observations. Traditionally, we humans have had to define in advance how an agent processes the input stream based on the features we think are informative. These features can include anything from the position and velocity of an autonomous vehicle to the pixel values the agent sees in the ALE.

Before DQN, pixel values were frequently used to train AI in the ALE. Agents learned crude bits of knowledge like “when a yellow pixel appears on the bottom of the screen, going right is good.”  While useful, knowledge represented in this way cannot encode certain pieces of information such as in-game objects.

Because the goal of the ALE is to avoid extracting information particular to a single game, researchers faced the challenge of determining how an AI can succeed in multiple games without providing it game-specific information. To meet this challenge, the agent should not only learn how to act but also learn useful representations of the world.

DQN was one of the first RL algorithms capable of doing so with deep neural networks.

For our discussion, the important aspect of DQN is that its performance is due to the neural network’s estimation of how “good” each screen is, in other words how likely it is that a particular screen will result in a favourable outcome.

Importantly, the neural network has several convolutional layers with the ability to learn powerful internal representations. The layers are built around simple architectural biases such as position/translation invariance and the size of the filters used. We asked ourselves how much of DQN’s performance results from the internal representations learned and how much from the algorithm’s network architecture. We implemented, in a fixed linear representation, the biases encoded in DQN’s architecture and analyzed the gap between our bias-encoded performance and DQN’s performance.

To our surprise, our fixed linear representation performed nearly as well as DQN!

Basic & Blob-Prost Features

To create our representation, we first needed to define its building blocks. We used the method mentioned earlier of representing screens as “there is a yellow pixel at the bottom of the screen.”

As Figure 2 (inspired by the original article on the ALE) indicates, screens were previously defined in terms of the existence of colours in specific patches of the image. Researchers would divide the image in 14×16 patches and, for each patch, encode the colours available in that tile.

Screenshot and basic features of the game Space Invaders
2. Left: Screenshot of the game Space Invaders; Centre: Tiling used in all games; Right: Representation of Basic Features

In this example, two colours are present in the tile in the top left corner of the screen: black and green. Thus, the agent sees the whole tile as black and green with the “amount” of each colour being unimportant. This representation, called Basic, was introduced in the original paper on the ALE. However, Basic features don’t encode the relation between tiles, that is, “a green pixel is above a yellow pixel.” BASS features, which are not discussed in this post, can be used as a fix but with less than satisfactory results.

When DQN was proposed, it outperformed the state-of-the-art in the vast majority of games. But the question still remained: why?

One of our first insights was that convolutional networks apply the same filter in all different patches of the image, meaning observations aren’t necessarily encoded for a specific patch. In other words, instead of knowing “there is a green pixel in tile 6 and an orange pixel in tile 8,” the network knows “there is a green pixel one tile away from an orange pixel somewhere on the screen.”

This knowledge is useful as we no longer need to observe events at specific locations and can generalize them at the moment they occur. That is, the agent doesn’t need to be hit by an alien projectile in every possible pixel space to learn it’s bad. The AI quickly learns “a pixel above the green pixel (the player’s ship) is bad”, no matter the screen position. We modified Basic features to also encode such information, calling the new representation B-PROS.

Representation of B-PROS features
3. Representation of B-PROS features

B-PROS is limited in that it doesn’t encode objects movement. If there is a projectile on the screen, is it moving upwards from the agent’s ship or downwards from an alien’s?

We can easily answer the question by using two consecutive screens to infer an object’s direction, which is what DQN does. Instead of only using offsets from the same screen, we also looked at the offsets between different screens, encoding things like: “there was a yellow pixel two blocks above where the green pixel is now.” We call this representation B-PROST.

Representation of B-PROST features
4. Representation of B-PROST features

Finally, as is the case with DQN, we needed a way to identify objects. The filter sizes in the convolutional network had the typical size of objects in Atari games built into the system, so we made a simple change to our algorithm: instead of dividing the screen into tiles, we divided it into objects to examine the offsets between objects. But how to find the objects?

We did the simplest thing possible: call all segments with the same coloured pixels an object. If one colour was surrounding another, up to a certain threshold, we assumed the whole object had the surrounding colour and ignored the colour inside. By taking the offsets in space and time of these objects, we obtained a new feature set called Blob-PROST. Figure 5 is a simplification of what we ended up with.

Representation of objects identified for the Blob-PROST feature set
5. Representation of objects identified for the Blob-PROST feature set

So how good are Blob-PROST features? Well, they score better than DQN in 21 out of 49 games (43 per cent of the games) with the score of three of the remaining games having no statistically significant difference from that of DQN. Even when an algorithm is compared against itself, we would expect it to win 50 per cent of the time, making our 43 per cent a comparable result.

Conclusion

We started by asking how much of DQN’s original performance resulted from the representations it learns versus the biases already encoded in the neural network: position/translation invariance, movement information and object detection. To our surprise, the biases explain a big part of DQN’s performance. By simply encoding the biases without learning any representation, we were able to achieve similar performance to DQN.

The ability to learn representations is essential for intelligent agents: fixed representations, while useful, are an intermediate step on the path to artificial general intelligence. Although DQN’s performance may be explained by the convolutional network’s biases, the algorithm is a major milestone, and subsequent work has shown the importance of the principles introduced by the research team. The state-of-the-art is now derived from DQN, achieving even higher scores in Atari games and suggesting that better representations are now being learned.

For a more detailed discussion of each of the evaluated biases, as well as of DQN’s performance as compared to Blob-PROST, read our paper: “State of the Art Control of Atari Games Using Shallow Reinforcement Learning.”