Infrastructure for Building a Population-Level Atlas of the Spatial Distribution of Lung Nodules Using the National Lung Screening Trial Dataset
Abstract
Purpose
Lung nodule distribution is non-homogeneous, yet localized risk factors are poorly understood. This study aims to develop a computational pipeline to construct a 3D lung atlas, identifying population-level spatial trends to guide clinical diagnosis and treatment.
Methods
Low-dose CT scans from the National Lung Screening Trial (N=26,722, three annual screenings) were utilized in this project. Scans were segmented using TotalSegmentator, a pretrained, multi-organ anatomical segmentation model, to delineate the lung and five lobes. Image registration was performed using the UniGradICON foundation model to map individual scans to the atlas space. A standardized lung template defined the atlas space, which was created using population-based image registration and statistical averaging of 30 CT scans from the National Lung Screening Trial dataset. Accuracy was quantified using Dice Similarity Coefficient (DSC), Mean Squared Error (MSE), and volume differences of each lung region before and after registration.
Results
Image registration significantly improved anatomical alignment across the dataset. For total lung volume, the mean DSC increased from 0.43 to 0.97, and the MSE decreased from 0.11 to 0.01 following registration. Across individual lobes, the mean lobar DSC improved from 0.38 to 0.91. The average relative volume difference was maintained at 4.4% for the whole lung and 7.9% across all lobes, indicating minimal geometric distortion during warping. These metrics demonstrate that the pipeline successfully maps diverse patient morphologies to a standardized coordinate system.
Conclusion
This study ensembles advanced tools into an automated pipeline, integrating state-of-the-art segmentation and registration models with a population-derived 3D template. The registration results demonstrate the framework’s high fidelity in standardizing diverse anatomical data, a critical foundation for the next stage, which employs a novel vision-language model to perform prompt-guided nodule segmentation. By combining these methods, we enable a high-throughput approach to uncovering population-level patterns between lung nodule spatial distribution and clinical phenotypes.