Efficient and simple prediction explanations with groupShapley: A practical perspective


Shapley values has established itself as one of the most appropriate and theoretically sound frameworks
for explaining predictions from complex machine learning models. The popularity of Shapley values
in the explanation setting is probably due to Shapley values’ unique theoretical properties. The main
drawback with Shapley values, however, is that the computational complexity grows exponentially in
the number of input features, making it unfeasible in many real world situations where there could be
hundreds or thousands of features. Furthermore, with many (dependent) features, presenting/visualizing
and interpreting the computed Shapley values also become challenging. The present paper introduces
and showcases a method that we call groupShapley. The idea of the method is to group features and then
compute and present Shapley values for these groups instead of for all individual features. Reducing
hundreds or thousands of features to half a dozen or so feature groups makes precise computations
practically feasible, and the presentation and knowledge extraction greatly simplified. We give practical
advice for using the approach and illustrate its usability in three different real world examples. The
examples vary in both data type (regular tabular data and time series), feature dimension (medium to
high), and application (insurance, genetics, and banking).