Integrating Integrated Gradients and Attention Visualization for Explainable ClinicalBERT in Depression Detection
Main Article Content
Early and accurate detection of depression from clinical narratives is crucial for timely intervention and improved patient outcomes. However, most existing Natural Language Processing (NLP) models for Electronic Health Records (EHRs) operate as black boxes, limiting their clinical trustworthiness. This study proposes an Explainable ClinicalBERT framework that integrates Integrated Gradients (IG) and attention visualization to provide interpretable predictions for depression detection from EHR text. The model was fine-tuned on anonymized clinical notes labeled by psychiatrists according to DSM-5 diagnostic criteria. Experimental results demonstrate that the Explainable ClinicalBERT achieves high predictive performance (Accuracy = 0.87, F1-score = 0.85, AUC = 0.91), comparable to non-explainable models while offering transparent reasoning through token-level attributions and attention maps. Expert evaluations confirmed that the model’s highlighted features align with clinically meaningful depressive symptoms, enhancing interpretability and professional confidence. The findings underscore the potential of explainable NLP systems to improve transparency, reliability, and ethical deployment of AI-based mental health diagnostics in real-world healthcare environments.