From Phantom Measurements to Clinical Cohorts: Predicting CT Image Quality at Scale
Abstract
Purpose
Accurate characterization of image quality (IQ) enables cohort stratification and optimization of downstream tasks in imaging and image analysis. A recent multi-institutional study curated CT images of an IQ phantom across diverse scanners and sites, capturing variations in chest CT acquisition and reconstruction. This report details and evaluates an approach that leverages such rigorously assessed phantom measurements to predict corresponding IQ metrics in large-scale clinical CT imaging datasets.
Methods
The phantom dataset comprised >250 CT images of the Corgi phantom from 6 institutions. Automatically computed IQ metrics were used to drive a DICOM metadata-based protocol mapping between phantom scans and 1,083 clinical chest CT images from the MIDRC dataset. Mapping performance was evaluated by comparing predictions against measurements of image noise and spatial resolution. Noise was measured using the Global Noise algorithm, and spatial resolution was characterized using edge spread function (ESF) profiles extracted along the skin-air boundary. Agreement between predicted and measured metrics was assessed via correlation analysis, and the accuracy of IQ-based cohort stratification was quantified using quadratic weighted kappa (QWK).
Results
Predicted IQ metrics agreed well with direct clinical measurements. Predicted and measured noise showed a strong linear relationship (R² = 0.64). Noise-based cohort stratification was stable across K quantile bins; for K = 3-4, within-one-bin accuracy was >80% with QWK = 0.6. Spatial resolution predictions using ESF similarly achieved 84% accuracy for K = 3-bin stratification.
Conclusion
The study demonstrates the feasibility of phantom-derived measurements and metadata-driven mapping for stratifying CT cohorts by IQ. While validated here for noise and spatial resolution, the approach can also predict advanced 3D IQ metrics that are difficult to measure directly in routine clinical images. The findings support automated, IQ-informed cohort selection and quality control for applications including data-driven (AI) model training and evaluation of IQ-dependent model behavior.