Abstract
Objectives The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency.
Methods 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty.
Results On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV.
Conclusion ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.
Highlights
ROBI is a feature selection tool capable of screening thousands of features.
It enables systematic evaluation of numerous radiomic features while minimizing false discoveries.
ROBI was validated on synthetic and real datasets, outperforming other selection techniques.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Louis Rebaud, Nicolo Capobianco and Bruce Spottiswoode are Siemens Healthineers employees. This project is part of Louis Rebaud PhD which is funded by Siemens Healthineers and the French Government through the CIFRE agreement.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Patients data came from two clinical trials registered at www.clinicaltrials.gov : REMARC (NCT01122472) and LNH073B (NCT00498043) aggregating data from multiple countries. The studies were approved by local- and country-specific ethics review committees.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Abbreviations
- ROBI
- Robust and Optimized Biomarker Identifier
- FP
- False Positive
- TP
- True Positive
- CB
- Candidate biomarker
- TST
- two-stage linear step-up procedure
- FDR
- False discovery rate
- CCO
- Correlation Clustering Optimization
- DLBCL
- Diffuse Large B Cell Lymphoma
- TMTV
- Total Metabolic Tumor Volume
- Dmax
- maximum distance between two lesions
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.