Deep Learning-Based Detection of Neonatal Intracranial Hemorrhage Using Cranial Ultrasound
Abstract
Purpose
Intracranial hemorrhage in premature infants is a frequent and clinically significant finding, with cranial ultrasound serving as the first-line imaging modality. Interpretation can be challenging due to subtle and variable imaging appearances. This study aimed to develop and evaluate a deep learning model for detection of neonatal intracranial hemorrhage on cranial ultrasound.
Methods
Ultrasound images were retrospectively collected from 2,127 neonatal patients, including 124 hemorrhage-positive cases (5.8% prevalence). A total of 1,488 examinations were used for model development. A ResNet-50-based deep learning model was pretrained on public ultrasound datasets and fine-tuned on the study cohort. Data were split into training, validation, and test sets (70:15:15) with patient-level separation and five-fold cross-validation. A multiple instance learning framework was applied, representing each examination as a set of ultrasound frames to enable patient-level classification. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), precision-recall area under the curve (PR-AUC), and sensitivity and specificity at selected decision thresholds.
Results
Across five cross-validation folds, AUROC ranged from 0.68 to 0.75 (mean 0.72 ± 0.03), and PR-AUC ranged from 0.12 to 0.24 (mean 0.18 ± 0.06). At thresholds determined by Youden’s J statistic, mean sensitivity was 0.64 ± 0.18 and specificity was 0.66 ± 0.16. When thresholds were adjusted to prioritize high sensitivity (target 0.90), mean sensitivity increased to 0.77 ± 0.16 with a corresponding decrease in specificity to 0.48 ± 0.21.
Conclusion
Despite substantial class imbalance, the deep learning model demonstrated meaningful discrimination for detection of neonatal intracranial hemorrhage on cranial ultrasound. Performance reflected expected sensitivity-specificity trade-offs when prioritizing hemorrhage detection. These results support the feasibility of deep learning-based decision support for neonatal cranial ultrasound and motivate further optimization to improve clinical utility.