Causal Post-hoc Explainable AI

Publication details

Supervised by: Langseth, Helge; Strumke, Inga; Bach, Kerstin
Publisher: Norges teknisk-naturvitenskapelige universitet
International Standard Numbers:
- Printed: 9788235300225
Link:
- ARKIV: hdl.handle.net/11250/5514043

Highly complex machine learning models are becoming integrated components in advanced decision making processes across many different domains in society.
These models achieve impressive accuracy at the cost of transparency, which hinders understanding of the decision logic underlying the automated process.
To address this challenge, the field of Explainable Artificial Intelligence (XAI) has emerged, aiming to provide reliable explanations for model predictions. One of the main approaches taken to accomplish this is post-hoc XAI, where explanations are generated for pre-trained, non-interpretable models without impact on model performance.

The field of causality, meanwhile, provides a formal approach to explanation based on causal modelling of relevant variables. Causal explanations highlight causes to explain effects through contrasting actual events with counterfactual events, and are considered especially favourable for facilitating understanding by accommodating an implicit understanding of how the world works.

This thesis explores the intersection of post-hoc XAI and causal explanation. A comprehensive systematic literature review is conducted in order to provide an overview of this subfield. A categorisation of post-hoc XAI methods is presented, and two categories of causal post-hoc XAI are defined that differ in choice of explanation vocabulary and abstraction of the causal model for explanation: Internally causal methods, which focus on the direct causal effect of model input features on model output; and externally causal methods, which focus on the indirect causal effect of high-level, real-world concepts on model output.

The concept of counterfactual explanation is further central to this thesis. The modelling requirements necessary to ensure causally sound generation of counterfactual explanations are discussed, and steps are taken towards generalising the applicability of methods for counterfactual post-hoc explanations. A new method is presented that calculates approximate bounds for counterfactual queries given partially specified causal models under limited access to domain knowledge. Compared to existing techniques, the method is shown to improve approximation accuracy and lower the computational cost.

Finally, building on the previous results, a conceptual method for causal and counterfactual post-hoc XAI is presented. The approach taken is in line with the definition given of externally causal XAI, employing a meaningful concept-based explanation vocabulary. Different types of explanations that build on counterfactual queries are considered, and a proof-of-concept model is implemented, for which causal and counterfactual explanations generated under both complete and partial domain knowledge are evaluated. These results demonstrate potential towards increased applicability of causal post-hoc XAI in a meaningful vocabulary.