Model-Agnostic Methods for Interpreting any Machine Learning Model

February 5, 2025

More and more companies are using complex machine learning models, like neural networks and gradient boosting machines. The reason they use complex models is because they outperform traditional models, like decision trees or logistic regression. The negative side effect of using complex models is that you can’t interpret those models directly. You don’t want a biased model or a model that makes choices based on strange or unrelated knowledge. Experiences in the past, like what happened at Amazon¹ or with teachers² show the importance of interpreting complex models. There are positive side effects in knowing why your model makes it’s decision, e.g. you can understand new patterns the model finds and learn more about your data.

A lot of research has been done on the Interpretability of machine learning models. There are different ways to interpret machine learning models. The easiest split is between interpretable models and model-agnostic methods. Interpretable models are models who explain themselves, for instance from a decision tree you can easily extract decision rules. Model-agnostic methods are methods you can use for any machine learning model, from support vector machines to neural networks. In this article the focus will be on model-agnostic methods. There’s another article about interpretable models.

Dataset

A field where you can use model interpretability is health care. To find out how a model decides whether or not a person has a heart disease, we use a dataset from the Cleveland database with the following features:

We will try to predict the target with a random forest and interpret this model with model-agnostic methods. You can find the Heart Disease UCI dataset on Kaggle.

Code

The code for this article about model-agnostic methods (and for the article about interpretable models) can be found on GitHub.

Model-agnostic methods

Unfortunately it’s not possible to directly interpret the most machine learning models. For popular models like random forests, gradient boosted machines and neural networks you need model-agnostic methods. At the moment there are some interesting methods available, like permutation feature importance³, Partial Dependence Plots (PDPs), Individual Conditional Expectation (ICE) plots, global surrogate models, Local Interpretable Model-agnostic Explanations (Lime) and Shapley Additive Explanations (SHAP). We will dive into these methods and discuss their advantages and disadvantages.

Permutation Feature Importance

Do you use feature importances from scikit-learn⁴? These feature importances are based on the mean decrease in criterion, like gini impurity (for decision trees and random forests). It’s better to use permutation feature importances. With this method, the importances are based on measuring the increase of the prediction error when you permute the feature’s values. So you compute the prediction error two times, before and after permutation of the feature. The higher the difference between the prediction errors, the more important the feature.

Now we’re going to compare the scikit-learn feature importances for the random forest with the permutation feature importance:

Scikit-learn feature importances for the Random Forest
Permutation feature importance for the Random Forest

Wow! Everything is shuffled if you compare the order of features from the scikit-learn feature importances with the permutation feature importance! According to the last picture we should try to exclude chol and exang, because when these features are permuted the model performs better! And for the features fbs, trestbps and age nothing happens (if we ignore the variance).

Partial Dependence Plots (PDPs)

These plots help visualize the average partial relationship between the predicted target and one or more features. The plots are created by forcing all the instances to have the same feature value. Then you make predictions for these instances and you average them, this gives the average prediction for this feature value. Due to visualization most of the time only one or two features are investigated.

To have an idea what PDPs look like, below you can see examples of PDPs on the heart disease dataset. The first two images have one feature on the x-axis and the probability of having a heart disease on the y-axis. In the third image you can see two features (one on the x-axis and one on the y-axis).

If a higher heart rate is achieved, the probability of having a heart disease increases, in general.
Chest pain type is a categorical variable. If this value is equal to one, two or three, chances are higher that you have a heart disease then when this value is equal to zero.
This plot shows how oldpeak and cp are related with the average prediction. When chest pain type is equal to one, two or three and oldpeak has a low value, the probability having a heart disease is much higher (> 0.63) than when cp is equal to zero and oldpeak has a high value (< 0.37).

In PDPs, you force all the instances to have the same feature value. The plots can be misleading if you only have a small amount of instances who have a certain feature value. It’s better to include data distributions in your plot, so you can see if the data is equally distributed.

Watch out with PDPs! It is assumed that features for which you compute the Partial Dependence are independent. So they shouldn’t be correlated with other features. You can also easily miss complexity of the model, because the predictions are averaged.

Individual Conditional Expectation (ICE)

A way to deal with the problem of missing complexity in the model with PDPs is to show them in combination with ICE plots⁵. ICE plots are more detailed and show how the prediction of each instance changes when a feature value is changed. In the following images, every blue line represents one instance.

Combined PDP and ICE plot for the ‘sex’ variable.

The ICE plot for the sex (female = 0, male =1) variable is shown above. The average is the thick line in the middle (same line as in the PDPs). You see that for some instances the prediction changes a lot when sex is changed to male, but for some instances the prediction almost stays the same, although it always has a negative effect to be female. On the bottom of the image you see the data distribution.

Combined PDP and ICE plot for the ‘chol’ variable.

This is interesting! The cholesterol variable shows that the pattern is more complicated than you would expect from the PDP, because the instances are spread out all over the place and often they don’t follow the pattern of the thick line. Sometimes a higher cholesterol has a (small) positive effect and sometimes the effect is negative. There are not that many instances that have a cholesterol value above 400 (check out the distribution on the bottom) so we should be careful here!

With ICE plots we solved the problem of PDPs by showing more complexity, but what about the independent features problem? That problem isn’t solved with ICE plots, because the feature you plot still needs to be uncorrelated with the other features.

Global Surrogate Models

Workflow for a global surrogate model.

Global surrogates are really easy to understand, that’s an advantage of this method. First you build a black box model on the training data with the real labels. Then you let the model predict the labels for the same data and you build an interpretable model on the data with the predicted labels. Because the surrogate model is interpretable and build on the predictions of the black box model you learn how the black box model makes its prediction.

This method is nice, because it’s intuitive. There are some disadvantages. The interpretable model will perform worse than the black model (otherwise you should replace the black box model). You need to decide what an acceptable metric is for the performance of the interpretable model. Besides that, the interpretable model draws conclusions about the black box model, not about the data.

Local Interpretable Model-Agnostic Explanations (LIME)

When you want to explain an individual prediction, you can use LIME. With LIME, a local surrogate model is trained. This interpretable surrogate model can be used to explain the individual prediction. You can use LIME not only on tabular data, but also on images or text⁶.

The following image shows the way LIME works in an intuitive way. The red pluses and the blue dots are samples from the different classes. The border between the pink and the blue area is the decision boundary of the black box model. If you want to explain the big red plus on the image below, you can create other instances that are close to make a local decision boundary (dotted line) with an interpretable model. This local decision boundary is a lot easier to explain than the boundary between the pink and blue area (the decision boundary of the black box model).

Let’s take a new record from the test set:

And now, let’s use LIME to explain the prediction of this record:

Close call! We see that the prediction probability is slightly higher for true (heart disease = 1) than for false (heart disease = 0). On the right we can see which features contributed most to the prediction.

Just like with other interpretation methods, you need to be careful with LIME. If you explain the same record twice, the explanations can be different! Another disadvantage is that you can only explain one instance, so it’s not possible to interpret the whole black box model.

Shapley Additive Explanations (SHAP)

If you want a really fancy and good way to display how a feature value contributes to the prediction, you should use Shap. For this method, shapley values (from game theory) and LIME are combined⁷. In short, shapley values use coalitions to see what the contribution of a feature value is to the final prediction.

The record that’s been investigated is the same record we used for LIME. The predicted probability is equal to 0.53. That value is slightly below the base value of 0.5486. The red features, like cp and oldpeak increase the probability having a heart disease, while ca and thal decrease the probability.

If we turn all the samples from the test set 90 degrees, stack them and order them by similarity, we get the image above.

The summary plot above shows the different SHAP values for high or low feature values. If you look at the ca feature, you see that when this feature has a low value the SHAP value is high, this means a higher probability having a heart disease. This plot also shows us the most important features to the least important ones (top to bottom).

SHAP values have some advantages, because we can use them for local and global explanations. And they have a strong theoretical foundation. In the beginning they couldn’t handle dependent features, but research shows that it’s possible to use them with dependent features⁸. A problem for SHAP values is the computation speed, when you have many features the time for the computations will increase significantly.

Hopefully you can use these methods to investigate your data and models!

[1] J. Dastin, Amazon scraps secret AI recruiting tool that showed bias against women (2018), Reuters

[2] C. O’Neil, Weapons Of Math Destruction (2016), Crown New York

[3] A. Altmann and L. Toloşi, Permutation importance: a corrected feature importance measure (2010), Bioinformatics

[4] T. Parr, K. Turgutlu, C. Csiszar and J. Howard, Beware Default Random Forest Importances (2018), explained.ai

[5] A. Goldstein, A. Kapelner, J. Bleich and E. Pitkin, Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (2014), Journal of Computational and Graphical Statistics

[6] M. T. Ribeiro, S. Singh and C. Guestrin, “Why Should I Trust You?” Explaining the Predictions of Any Classiﬁer (2016), ResearchGate

[7] S. Lundberg and S. Lee, A Uniﬁed Approach to Interpreting Model Predictions (2017), NIPS

[8] K. Aas and M. Jullum, Explaining Individual Predictions when Features are Dependent: More Accurate Approximations to Shapley Values (2019), ArXiv

The post Model-Agnostic Methods for Interpreting any Machine Learning Model appeared first on Towards Data Science.