Opening the Black Box: Explainable Spatio-Temporal Graph Convolutional Networks for Human Movement Analysis and Early Cerebral Palsy Detection

Publication details

  • Supervised by: Ihlen, Espen Alexander Furst; Ramampiaro, Heri; Adde, Lars; Strumke, Inga
  • Publisher: NTNU Norges teknisk-naturvitenskapelige universitet
  • International Standard Numbers:
    • Printed: 9788232692279
  • Link:

This thesis investigates the explainability of deep learning models for human movement analysis, with a specific focus on early detection of cerebral palsy (CP) in infants. As artificial intelligence (AI) increasingly assists in medical diagnostics, understanding the reasoning behind AI predictions becomes crucial, especially in high-stakes applications. The research focuses on spatio-temporal graph convolutional networks (ST-GCNs), which analyze skeletal/pose data over time but typically function as “black boxes,” offering high accuracy predictions without transparent explanations.
The work consists of three interconnected studies. The first study establishes a methodological foundation by evaluating Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) in human action recognition tasks using the NTU RGB+D dataset. It implements mathematical metrics for faithfulness (how accurately an explanation reflects a model’s decision process) and stability (how consistently an explanation behaves despite minor input variations). Results show that while both methods demonstrate good stability, they perform worse compared to a baseline random feature attribution method when tested using the EfficientGCN-B4 architecture.
The second study extends this evaluation framework to early CP prediction using infant movement data from the DeepInMotion project. It applies the same XAI methods to an ensemble model trained to detect CP risk from infant pose data. Both CAM and Grad-CAM consistently outperform random attribution baselines across all metrics, though they show different strengths. This validates their potential for clinical applications.
The third study moves beyond metric evaluation toward practical clinical application. It employs XAI-guided perturbation techniques to identify potential movement biomarkers for CP. By systematically perturbing movement features in body joints identified as significant by CAM and Grad-CAM, it reveals that velocity characteristics of limbs strongly influence CP risk assessment. Specifically, faster limb movements strongly correlate with low-risk predictions, while decreased velocity in extremities raises the risk assessment. Additionally, angular range of motion shows a less pronounced impact.
Together, these studies demonstrate that ST-GCNs, when coupled with appropriate XAI methods, can provide reliable and meaningful explanations for human movement analysis. The work establishes a quantitative framework for evaluating XAI methods in skeleton-based models and takes initial steps toward identifying potential movement biomarkers for early CP detection. These findings contribute to the development of more transparent and trustworthy AI systems in healthcare.
The thesis acknowledges several limitations and suggests future directions. The ultimate goal is to bridge computational insights with clinical knowledge to create AI-assisted diagnostic tools that clinicians can confidently use, while understanding the reasoning behind machine predictions.