Machine Learning (ML) has been experiencing explosive growth in popularity due to its ability to learn from data automatically with minimal human intervention. As ML is implemented and applied more in business settings, ML practitioners need to develop methods to describe the timing of their project work to their employers or clients.

One tool which is particularly useful in this regard is the ML Process Lifecycle. In this three-part blog series, we will be exploring what it is, why it’s important and how you can implement it.

What is the MLPL ?

The ML Process Lifecycle (MLPL) is a framework that captures the iterative process of developing an ML solution for a specific problem. 

ML project development and implementation is an exploratory and experimental process where different learning algorithms and methods are tried before arriving at a satisfactory solution. The journey to reach an ML solution that meets business expectations is rarely linear – as an ML practitioner advances through different stages of the process and more information is generated or uncovered, they may need to go back to make changes or start over completely. 

The MLPL tries to capture this dynamic workflow between different stages and the sequence in which these stages are carried out. 

Where does the MLPL fit ?

When business organizations develop new software systems or introduce new features to existing systems, they go through two major phases: 

  1. Business analysis: making assessments and business decisions regarding the value and feasibility of a new software product or feature; and 
  2. Product development: developing the solution (usually following one of the existing software development methodologies) and putting it in the production. 

However, when an organization thinks about adopting ML – either to complement their current software products/services or to address a fresh business problem – there is an additional exploration phase between the business analysis and product development phases. The MLPL streamlines and defines this process.

The Machine Learning Process Lifecycle comprises Business Decisions, Machine Learning Exploration and Development, Deployment & Maintenance

The MLPL is an iterative methodology to execute ML exploration tasks, generalizing the process so that it is flexible and modular enough to be applied to different problems in different domains, while at the same time having enough modules to fully describe relevant decision points and milestones.

What does an ML Exploration Process entail?

Ideally, an organization would want to know all possibilities and consequences of an ML solution before introducing it into a system. The ML Exploration Process seeks to determine whether or not an ML solution is the best business decision by addressing the following questions:

  1. Can ML address my business problem?
  2. Is there a supporting data?
  3. Can algorithms take advantage of the data?
  4. What is the value added by introducing ML?
  5. What is the technical feasibility of arriving at a solution with ML?
What doesn’t the MLPL capture?

When an organization first begins to think about adopting ML, often the first thing it will do is perform a business analysis. This involves identifying business workflows, business problems, resource assessment, identifying tasks and decision points which ML solutions could fit into and return business value. The MLPL does not capture all aspects of this, only addressing those pieces which directly impact ML problem definition.

After ML Exploration is complete, an organization may decide to develop the ML solution into a product or service as a tangible component, deploying it into production and maintaining it. This phase is also not captured in the MLPL. 

The MLPL only deals with the exploration phase where different methods are tried to arrive at a proof-of-concept solution which can be later adapted to develop a complete ML system.

Why do we need the MLPL?

We have seen an overview of what the MLPL captures and what it does not. But why do we need a process to capture an exploration task? There are a few important reasons why an organization should use the MLPL:

  • Risk Mitigation: The MLPL standardizes the stages of an ML project and defines standard modules for each of those stages, thereby minimizing the risk of missing out on important ML practices. 
  • Standardization: Standardizing the workflow across teams through an end-to-end framework enables the users to easily build and operate ML systems while being consistent, and allows the inter-team tasks to be carried out smoothly.
  • Tracking: The MLPL allows you to track the different stages and the modules inside each of the stages. This being an exploration task, there are a lot of attempts that will never be used in the final ML solution, but have required significant investment. The MLPL allows you to track the resources that have been spent on these experiments and to evaluate for future iterations.
  • Reproducibility: Having a standardized process enables an organization to build pipelines for creating and managing experiments, which can be compared and reproduced for future projects.
  • Scalability: A standard workflow also allows an organization to manage multiple experiments simultaneously.
  • Governance: Well-defined stages and modules for each stage will help in better audits to assess if the ML systems are designed appropriately and operating effectively.
  • Communication: A standard guideline helps in setting the expectations and effectively facilitate communication between teams about the workflow of the projects.

In Part 2 of the MLPL Series, we will be taking an in-depth look at the MLPL framework and go through the key aspects of each stage. Stay tuned!

If you want to learn more about this and other interesting ML topics, we highly recommend Amii’s recently launched online course Machine Learning: Algorithms in the Real World Specialization, taught by our Director of Amii Explores, Anna Koop. Visit the Amii Educates page to learn about all of our educational offerings, and keep an eye out for our ML Technician Program starting in the new year!

This article was written by Amii’s Applied ML Scientists: Talat Iqbal, Luke Kumar, Shazan Jabbar and Sankalp Prabhakar; as well as Amii’s Director of Explores, Anna Koop.

We are incredibly excited to announce three new courses on artificial intelligence and machine learning, co-developed by Amii and the University of Alberta Faculty of Extension!

Produced in collaboration between UAlberta’s Faculty of Extension and Amii, this three-course series is ideal for technically-inclined participants who wish to build foundational knowledge in machine intelligence, develop an applied understanding for approaching machine learning projects, and gain an introduction to intermediate and advanced techniques.

Participants can expect to gain a working knowledge around important machine learning areas such as supervised learning, unsupervised learning, neural networks and reinforcement learning.

Prior knowledge of basic programming, linear algebra and statistics is expected. Experience with mathematics, statistics and analytics is strongly recommended. Participants will be expected to have the ability to read and code trace existing code; be comfortable with conditionals, loops, variables, lists, dictionaries and arrays; and should be able to produce “hello world.”

Once all three courses have been successfully completed, an official University of Alberta Notice of Completion will be issued. The courses can also be used towards the Amii Machine Learning Technician Certification program, beginning in September 2019. Students completing the Faculty of Extension series will be grandfathered into the Machine Learning Technician program with a prorated tuition.

Learn more about the individual courses below:

Introduction to Machine Learning and Artificial Intelligence


(21 hours)
April 24 – 26, 2019
8:30 a.m.− 5 p.m.
Enterprise Square, Edmonton

Students will gain an overview of machine learning and artificial intelligence, beginning with discussing supervised learning applied to a classification problem. Students will develop a working knowledge of this type of application, and how it might look in a project from start to finish. Prior knowledge of basic programming, linear algebra and statistics is expected.

Applied Machine Learning


(21 hours)
May 22 – 24, 2019
8:30 a.m.− 5 p.m.
Enterprise Square, Edmonton

This course will begin the discussion of problem definition in machine learning projects, and other issues with data acquisition, cleaning and exploratory data analysis. Students will also discuss unsupervised learning in the context of developing data for successful machine learning modelling. Prior knowledge of basic programming, linear algebra and statistics is expected.

Intermediate Machine Learning Techniques


(21 hours)
June 19 – 21, 2019
8:30 a.m.− 5 p.m.
Enterprise Square, Edmonton

This course continues from the previous, discussing more advanced techniques of machine learning, such as neural networks and support vector machines. Students will also get a brief introduction to reinforcement learning. Prior knowledge of basic programming, linear algebra and statistics is expected.

For more information, please visit the UAlberta Faculty of Extension – AI & ML Courses page:

Part of the Alberta Machine Intelligence Institute, Marlos C. Machado is a 4th year Ph.D. student in the University of Alberta’s Department of Computing Science, supervised by Amii’s Michael Bowling.

Marlos’ research interests lie broadly in artificial intelligence with a particular focus on machine learning and reinforcement learning. Marlos is also a member of the Reinforcement Learning & Artificial Intelligence research group, led by Amii’s Richard S. Sutton.

In 2013, Amii researchers proposed the Arcade Learning Environment (ALE), a framework that poses the problem of general competency in AI. The ALE allows researchers and hobbyists to evaluate artificial intelligence (AI) agents in a variety of Atari games, encouraging agents to succeed without game-specific information. While this may not seem like a difficult feat, up to now, intelligent agents have excelled at performing a single task at a time, such as checkers, chess and backgammon – all incredible achievements!

The ALE, instead, asks the AI to perform well at many different tasks: repelling aliens, catching fish and racing cars, among others. Around 2011, Amii’s Michael Bowling began advocating in the AI research community for an Atari-based testbed and challenge problem. The community has since recognized the importance of arcade environments, shown by the release of other, similar platforms such as the GVG-AI, the OpenAI Gym & Universe,  as well as the Retro Learning Environment.

Atari 2600 games
1. Atari 2600 games: Space Invaders, Bowling, Fishing Derby and Enduro

The ALE owes some of its success to a Google DeepMind algorithm called Deep Q-Networks (DQN), which recently drew world-wide attention to the learning environment and to reinforcement learning (RL) in general. DQN was the first algorithm to achieve human-level control in the ALE.

In this post, adapted from our paper, “State of the Art Control of Atari Games Using Shallow Reinforcement Learning,” published earlier this year, we examine the principles underlying DQN’s impressive performance by introducing a fixed linear representation that achieves DQN-level performance in the ALE.

The steps we took while developing this representation illuminate the importance of biases being encoded in neural networks’ architectures, which improved our understanding of deep reinforcement learning methods. Our representation also frees agents from necessarily having to learn representations every time an AI is evaluated in the ALE. Researchers can now use a good fixed representation while exploring other questions, which allows for better evaluation of the impact of their algorithms because the interaction with representation learning solutions can be put aside.

Impact of Deep Q-Networks

In reinforcement learning, agents must estimate how “good” a situation is based on current observations. Traditionally, we humans have had to define in advance how an agent processes the input stream based on the features we think are informative. These features can include anything from the position and velocity of an autonomous vehicle to the pixel values the agent sees in the ALE.

Before DQN, pixel values were frequently used to train AI in the ALE. Agents learned crude bits of knowledge like “when a yellow pixel appears on the bottom of the screen, going right is good.”  While useful, knowledge represented in this way cannot encode certain pieces of information such as in-game objects.

Because the goal of the ALE is to avoid extracting information particular to a single game, researchers faced the challenge of determining how an AI can succeed in multiple games without providing it game-specific information. To meet this challenge, the agent should not only learn how to act but also learn useful representations of the world.

DQN was one of the first RL algorithms capable of doing so with deep neural networks.

For our discussion, the important aspect of DQN is that its performance is due to the neural network’s estimation of how “good” each screen is, in other words how likely it is that a particular screen will result in a favourable outcome.

Importantly, the neural network has several convolutional layers with the ability to learn powerful internal representations. The layers are built around simple architectural biases such as position/translation invariance and the size of the filters used. We asked ourselves how much of DQN’s performance results from the internal representations learned and how much from the algorithm’s network architecture. We implemented, in a fixed linear representation, the biases encoded in DQN’s architecture and analyzed the gap between our bias-encoded performance and DQN’s performance.

To our surprise, our fixed linear representation performed nearly as well as DQN!

Basic & Blob-Prost Features

To create our representation, we first needed to define its building blocks. We used the method mentioned earlier of representing screens as “there is a yellow pixel at the bottom of the screen.”

As Figure 2 (inspired by the original article on the ALE) indicates, screens were previously defined in terms of the existence of colours in specific patches of the image. Researchers would divide the image in 14×16 patches and, for each patch, encode the colours available in that tile.

Screenshot and basic features of the game Space Invaders
2. Left: Screenshot of the game Space Invaders; Centre: Tiling used in all games; Right: Representation of Basic Features

In this example, two colours are present in the tile in the top left corner of the screen: black and green. Thus, the agent sees the whole tile as black and green with the “amount” of each colour being unimportant. This representation, called Basic, was introduced in the original paper on the ALE. However, Basic features don’t encode the relation between tiles, that is, “a green pixel is above a yellow pixel.” BASS features, which are not discussed in this post, can be used as a fix but with less than satisfactory results.

When DQN was proposed, it outperformed the state-of-the-art in the vast majority of games. But the question still remained: why?

One of our first insights was that convolutional networks apply the same filter in all different patches of the image, meaning observations aren’t necessarily encoded for a specific patch. In other words, instead of knowing “there is a green pixel in tile 6 and an orange pixel in tile 8,” the network knows “there is a green pixel one tile away from an orange pixel somewhere on the screen.”

This knowledge is useful as we no longer need to observe events at specific locations and can generalize them at the moment they occur. That is, the agent doesn’t need to be hit by an alien projectile in every possible pixel space to learn it’s bad. The AI quickly learns “a pixel above the green pixel (the player’s ship) is bad”, no matter the screen position. We modified Basic features to also encode such information, calling the new representation B-PROS.

Representation of B-PROS features
3. Representation of B-PROS features

B-PROS is limited in that it doesn’t encode objects movement. If there is a projectile on the screen, is it moving upwards from the agent’s ship or downwards from an alien’s?

We can easily answer the question by using two consecutive screens to infer an object’s direction, which is what DQN does. Instead of only using offsets from the same screen, we also looked at the offsets between different screens, encoding things like: “there was a yellow pixel two blocks above where the green pixel is now.” We call this representation B-PROST.

Representation of B-PROST features
4. Representation of B-PROST features

Finally, as is the case with DQN, we needed a way to identify objects. The filter sizes in the convolutional network had the typical size of objects in Atari games built into the system, so we made a simple change to our algorithm: instead of dividing the screen into tiles, we divided it into objects to examine the offsets between objects. But how to find the objects?

We did the simplest thing possible: call all segments with the same coloured pixels an object. If one colour was surrounding another, up to a certain threshold, we assumed the whole object had the surrounding colour and ignored the colour inside. By taking the offsets in space and time of these objects, we obtained a new feature set called Blob-PROST. Figure 5 is a simplification of what we ended up with.

Representation of objects identified for the Blob-PROST feature set
5. Representation of objects identified for the Blob-PROST feature set

So how good are Blob-PROST features? Well, they score better than DQN in 21 out of 49 games (43 per cent of the games) with the score of three of the remaining games having no statistically significant difference from that of DQN. Even when an algorithm is compared against itself, we would expect it to win 50 per cent of the time, making our 43 per cent a comparable result.


We started by asking how much of DQN’s original performance resulted from the representations it learns versus the biases already encoded in the neural network: position/translation invariance, movement information and object detection. To our surprise, the biases explain a big part of DQN’s performance. By simply encoding the biases without learning any representation, we were able to achieve similar performance to DQN.

The ability to learn representations is essential for intelligent agents: fixed representations, while useful, are an intermediate step on the path to artificial general intelligence. Although DQN’s performance may be explained by the convolutional network’s biases, the algorithm is a major milestone, and subsequent work has shown the importance of the principles introduced by the research team. The state-of-the-art is now derived from DQN, achieving even higher scores in Atari games and suggesting that better representations are now being learned.

For a more detailed discussion of each of the evaluated biases, as well as of DQN’s performance as compared to Blob-PROST, read our paper: “State of the Art Control of Atari Games Using Shallow Reinforcement Learning.”