shapr: An R-package for explaining machine learning models with dependence-aware Shapley values

Publication details

Journal: Journal of Open Source Software (JOSS), vol. 5, 2020
International Standard Numbers:
- Electronic: 2475-9066
Links:
- DOI: doi.org/10.21105/joss.02027
- ARKIV: hdl.handle.net/11250/2731038

A common task within machine learning is to train a model to predict an unknown outcome
(response variable) based on a set of known input variables/features. When using such models
for real life applications, it is often crucial to understand why a certain set of features lead to
a specific prediction. Most machine learning models are, however, complicated and hard to
understand, so that they are often viewed as “black-boxes”, that produce some output from
some input.
Shapley values (Shapley, 1953) is a concept from cooperative game theory used to distribute
fairly a joint payoff among the cooperating players. Štrumbelj & Kononenko (2010) and later
Lundberg & Lee (2017) proposed to use the Shapley value framework to explain predictions by
distributing the prediction value on the input features. Established methods and implementations for explaining predictions with Shapley values like Shapley Sampling Values (Štrumbelj
& Kononenko, 2014), SHAP/Kernel SHAP (Lundberg & Lee, 2017), and to some extent
TreeSHAP/TreeExplainer (Lundberg et al., 2020; Lundberg, Erion, & Lee, 2018), assume
that the features are independent when approximating the Shapley values. The R-package
shapr, however, implements the methodology proposed by Aas, Jullum, & Løland (2019),
where predictions are explained while accounting for the dependence between the features,
resulting in significantly more accurate approximations to the Shapley values.