This article was written by Alona Fyshe, an Amii Fellow and Canada CIFAR AI Chair who serves as an Assistant Professor of Computing Science and Psychology at the University of Alberta. She combines her interests in computational linguistics, machine learning and neuroscience to study how the human brain processes language.

Why does AI predict what it does?

As AI has become more powerful, we have begun to depend on it in new ways. We also expect more of the AI in our lives – including wanting to know why it makes the predictions it does.

Examples range from the mundane “why is TikTok serving me videos about Dungeons & Dragons?” to very serious “why did my self-driving car suddenly swerve to the right?” Knowing the ‘why of AI' helps us better anticipate the future actions of the technology in our lives. This intuition builds trust between users and AI, allowing AI to fit more seamlessly into our lives.

Researchers use the term interpretability to refer to examining the inner workings of a model to see why it made a particular prediction.

Original attempts at interpretability appealed to folk psychology – we wanted to be able to tell a story about why a model made a specific prediction in a way that is understandable to humans.

There was a time when this seemed like a plausible goal, but with the advent of larger, more complex models, this may no longer be possible. The predictions our models make are no longer attributable to single factors, but rather multiple things combined in a way that is hard to describe or capture.

Why interpretability is hard

Think about the last time you tried to choose a place to eat out. Maybe you really wanted sushi – but why did you want sushi? Sometimes the reasons are obvious: perhaps you hadn’t had sushi for a while, maybe you saw someone else eating sushi at lunch.

But sometimes you just want sushi and you’re not sure why, and there’s probably many things that contributed to your desire, some of which may have happened days ago. Maybe you can even conjure an explanation that is related to some of your recent experiences, but the true reason is that your body is sending you subtle signals that make sushi seem appealing.

Understanding AI models used to be like explaining a day when your reasons for craving sushi were pretty clear. Our models were simple, and the input to the models was also simple. But with current AI systems, explaining why a model makes a particular decision is much more like those weird days when you want sushi and you’re not exactly sure why.

Our models are complex, and the input to the models has also become more complex. In response to this, a new field has emerged called Explainable AI, or X-AI.

AI, but make it explainable

X-AI researchers note that even when the model itself is difficult to explain, there may be a way to introduce another model or method to assist in the explanation. This additional model is trained to look at the behaviour of the original model and create a human-understandable explanation for why the model made a particular decision.

One X-AI approach is to produce counterfactual explanations – to generate new data points by slightly altering an existing example.

In the case of a self-driving car suddenly swerving to the right, we could create counterfactual explanations by taking the visual input before the swerve and altering it slightly. For example, we can turn up or down the contrast, zoom in, or make some groups of pixels black.

This creates a set of examples, some of which will cause the model to produce a different prediction, allowing us to reason about why the model made the original prediction. These explanations don’t explain how the model made a certain prediction, but they do provide insight into why the prediction was made.

Another X-AI approach harkens back to the simpler times of interpretable models. The general idea is that, if we consider only a small sample of very similar data points, the behaviour of a complex model can be captured with a simple model (Ribeiro et al., 2016).

This method uses a single data sample to generate a whole dataset of counterfactual examples. Then, we use the complex model to predict the category of the counterfactual examples. With this newly constructed dataset, we train a simpler model that is easier to interpret.

This simpler model only has to capture the patterns of a limited set of examples, and so it can often provide predictions that closely mirror the complex model on that limited set. This simpler model is easily interpretable and helps us to understand the behaviour of a complex model for a single data example.

Evolving our understanding

X-AI is not without its detractors. Some critics claim that X-AI approaches are kind of like when you conjure up an explanation for your sushi cravings on a day when there isn’t a clear reason: you can dream up a rationale, but there isn’t an easy way to ensure it’s correct.

Similarly, critics of the X-AI approach point out that it’s very difficult to know if externally-generated explanations are trustworthy explanations. So, we can use X-AI to help us explore and reason about our models, but it’s not a silver bullet and we should take its explanations with a grain of salt.

It’s also true that understanding a model by exploring counterfactual examples requires a fair amount of thinking from the user. It's still not as simple and intuitive as reading a one-sentence explanation.

But, X-AI will continue to evolve, developing new ways to help us understand AI in more natural ways. Building intuition about AI helps us trust the AI in our lives, which in turn allows us to leverage AI for our greatest benefit.

References

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Model-Agnostic Interpretability of Machine Learning.” arXiv, June 16, 2016. https://doi.org/10.48550/arXiv.1606.05386.