Computer Vision Surface Guidance (CV-SGRT): A Novel Occlusion-Robust Multi-View Markerless Sgrt Framework with Automated Semantic ROI Definition
Abstract
Purpose
To evaluate whether a multi-view computer vision–based surface-guided radiotherapy framework (CV-SGRT) can overcome occlusion-related limitations of conventional SGRT systems and improve surface reconstruction and respiratory motion accuracy.
Methods
In a study involving 20 healthy volunteers, we evaluated the accuracy of a novel automated surface-guidance approach that uses a synchronized multi-view RGB array. Our approach acquires depth from standard RGB cameras, with a state-of-the-art image-to-depth model, and is paired with a real-time semantic segmentation network to dynamically mask background artifacts (e.g., treatment couches, immobilization devices), thereby isolating the patient anatomy prior to reconstruction. The result is then fed to a deep learning-based volumetric reconstruction algorithm to generate a high-fidelity, time-resolved patient, without requiring manual definition of the Region of Interest (ROI). System performance was evaluated in terms of surface coverage, geometric reconstruction accuracy, and respiratory motion tracking, using infrared marker–based abdominal and thoracic breathing measurements as ground truth.
Results
CV-SGRT achieved complete surface reconstruction coverage within the clinically relevant region, effectively mitigating line-of-sight occlusions common in standard SGRT setups. The system achieved 100% surface reconstruction coverage in the region of interest, with a geometric reconstruction error ranging from 0.5% to 3%. The semantic segmentation module demonstrated robust artifact suppression, segmenting the patient surface with a Dice Similarity Coefficient (DSC) = 0.93. CV-SGRT quantified both abdominal and thoracic excursions with high accuracy, yielding a Mean Absolute Error (MAE) of 8.9% and Peak Ratio quantifies peak amplitude agreement (CV-SGRT detection peak/ground‑truth peak)=1.018 ± 0.083.
Conclusion
CV-SGRT provides accurate, occlusion-resilient 4D surface and respiratory motion information without manual ROI definition, addressing a key limitation of current SGRT systems and supporting more reliable motion management in complex clinical setups.