A Visually Interpretable Deep Learning Framework for Colorectal Cancer Detection In Contrast-Enhanced CT Images
Abstract
Purpose
Colorectal cancer (CRC) ranks as the third most diagnosed malignancy in the world. This study aims to investigate an automated, visually interpretable framework for intelligent identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for CRC staging, pre-operative evaluation, and predicting the outcome of surgical resection, as well as neoadjuvant chemoradiotherapy.
Methods
We propose a deep learning-based framework using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. We also developed gradient-weighted class activation maps (Grad-CAM) to visualize the regions of interest (ROIs) within the head component of the YOLOv11 network to improve transparency and interpretability.
Results
The proposed framework achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our framework, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods.
Conclusion
The excellent visual interpretability and state-of-the-art generalization performance on the external dataset (with a 18.97 percentage point improvement in detection sensitivity (recall) over the baseline method) highlights its potential translational value for CRC clinical decision support.