Publikasjonsdetaljer
- Utgiver: Norsk Regnesentral
This study investigates the applicability of the Mammo-CLIP Dissect framework from Salahuddin et al. for concept-based explainability in deep learning models for mammography. We first reproduced key results from the original paper using the Mammo-CLIP image encoder and a curated mammography concept vocabulary, confirming the expected layer-wise emergence of clinically meaningful concepts. We then applied the framework to an in-house ResNet101 classifier developed within the AIforscreening project. Compared with Mammo-CLIP, the ResNet101 model exhibited lower semantic alignment and a higher prevalence of non-mammography concepts, reflecting differences in the absence of multimodal training. These findings suggest that models trained solely on images may provide less interpretable explanations for clinicians than multimodal vision-language models. We highlight the importance of jointly considering accuracy and interpretability, noting that model performance was not evaluated on the probe set in this study. Future work includes applying the framework to Cancer Registry data and exploring multimodal training for improved clinical relevance.