DIB-X: Formulating Explainability Principles for a Self-Explainable Model Through Information Theoretic Learning

Publikasjonsdetaljer

Journal: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, p. 7170–7174, 2024
Utgiver: IEEE (Institute of Electrical and Electronics Engineers)
Internasjonale standardnumre:
- Trykt: 1520-6149
- Elektronisk: 2379-190X
Lenke:
- DOI: doi.org/10.1109/ICASSP48485.2024.10447094

The recent development of self-explainable deep learning approaches has focused on integrating well-defined explainability principles into learning process, with the goal of achieving these principles through optimization. In this work, we propose DIB-X, a self-explainable deep learning approach for image data, which adheres to the principles of minimal, sufficient, and interactive explanations. The minimality and sufficiency principles are rooted from the trade-off relationship within the information bottleneck framework. Distinctly, DIB-X directly quantifies the minimality principle using the recently proposed matrix-based Rényi’s α-order entropy functional, circumventing the need for variational approximation and distributional assumption. The interactivity principle is realized by incorporating existing domain knowledge as prior explanations, fostering explanations that align with established domain understanding. Empirical results on MNIST and two marine environment monitoring datasets with different modalities reveal that our approach primarily provides improved explainability with the added advantage of enhanced classification performance.