Poster Poster Program Therapy Physics

Rosamllib: An Open-Source Software Package for Large-Scale Radiotherapy Dicom Ingestion, Indexing, Visualization and Preprocessing

Abstract
Purpose

Large-scale radiotherapy research increasingly relies on heterogeneous DICOM datasets containing complex cross-references across imaging and radiotherapy objects. Beyond ingestion, downstream preprocessing for analysis and machine learning requires consistent organization, validation, and transformation of these data. Efficient frameworks that unify ingestion, relationship resolution, visualization, and preprocessing remain limited. This work presents rosamllib, an open-source Python framework designed to support scalable radiotherapy DICOM processing, visualization, and data preparation for big-data research workflows.

Methods

rosamllib provides complementary in-memory and database-backed ingestion pathways optimized for different data scales. DICOM objects are ingested from local filesystems or retrieved via standard DICOM query and retrieve operations and organized into a hierarchical, graph-based data model representing patients, studies, series, and instances. Cross-object relationships are explicitly resolved using DICOM identifiers and frame-of-reference associations. To support large datasets, the database-backed workflow employs a streaming producer–consumer architecture that decouples reading, parsing, and database writes, with selective tag extraction and value-representation–aware normalization. Built-in querying and visualization utilities enable cohort-level filtering and graphical inspection of series-to-series relationships. These capabilities support reproducible preprocessing workflows, including structure mask generation, dose alignment, data normalization, and export of model-ready datasets for deep learning training.

Results

The framework was applied to approximately one year of institutional radiotherapy DICOM data, comprising 1,869 patients, 12,406 studies, 349,246 series, and 6,370,243 instances. rosamllib enabled scalable ingestion, relationship validation, visualization, and automated preprocessing across large cohorts, supporting both exploratory analysis and machine learning–oriented data preparation.

Conclusion

rosamllib provides a scalable and extensible foundation for big-data radiotherapy research. By unifying ingestion, metadata indexing, relationship resolution, visualization, and preprocessing within a single framework, it enables efficient large-scale data preparation and supports downstream analytics and machine learning workflows in radiation oncology.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested