Transformer-Based Multi-Channel Target Decomposition for Markerless Lung Tumor Tracking
Abstract
Purpose
This study proposes a transformer-based deep learning framework for markerless lung tumor tracking that improves localization accuracy, robustness, and computational efficiency of real-time intrafraction motion management for seamless clinical integration.
Methods
We developed a transformer-based framework that maps raw kV projection images into multiple decomposed target images (DTIs), each encoding a distinct spatial context relative to the tumor location in the lung. These DTIs are synthesized by digitally reconstructing thin volumetric slabs with varying thicknesses and spatial offsets relative to the target. The model was trained on paired digitally reconstructed radiographs (DRRs) and DTI images generated from simulation CT and evaluated on actual kV images acquired using a Varian TrueBeam On-Board Imager (OBI). For each kV image, multiple synthetic DTIs were produced using the trained model and processed with parallel template matching to generate candidate tumor positions. These candidates are subsequently fused using an adaptive extended Kalman filter, incorporating current measurements and one-step motion history to estimate the most probable tumor location and associated localization uncertainty. This method was validated using a chest motion phantom with known ground-truth tumor motion, as well as clinical data of 4,312 images from nine patients with implanted Calypso beacons adjacent to the tumor serving as ground truth.
Results
For this cohort, our method achieved a maximum tracking error of 1.14 mm for the phantom study. In patient studies, the method achieved a 94.5% tracking success rate. Tracking success was defined as a localization error of < 2.0 mm in the superior–inferior (SI) direction.
Conclusion
This study demonstrates that transformer-based spatial decomposition substantially enhances the localization of low-contrast lung tumors in kV projection images. The high accuracy, robustness, and built-in uncertainty estimation achieved by the proposed framework indicate strong potential for real-time intrafraction motion management in high-precision radiotherapy.