Abstract
Objective The identification/development of a machine learning (ML)-based classifier that utilizes metabolic profiles of serum samples to accurately identify individuals with ovarian cancer (OC).
Methods Serum samples collected from 431 OC patients and 133 normal women at four geographic locations were analyzed by mass spectrometry. Reliable metabolites were identified using recursive feature elimination (RFE) coupled with repeated cross-validation (CV) and used to develop a consensus classifier able to distinguish cancer from non-cancer. The probabilities assigned to individuals by the model were used to create a clinical tool that assigns a likelihood that an individual patient sample is cancer or normal.
Results Our consensus classification model is able to distinguish cancer from control samples with 93% accuracy. The frequency distribution of individual patient scores was used to develop a clinical tool that assigns a likelihood that an individual patient does or does not have cancer.
Conclusions An integrative approach using metabolomic profiles and ML-based classifiers has been employed to develop a clinical tool that assigns a probability that an individual patient does or does not have OC. This personalized/probabilistic approach to cancer diagnostics is more clinically informative and accurate than traditional binary (yes/no) tests and represents a promising new direction in the early detection of OC.
HIGHLIGHTS
Predictive models derived from machine learning (ML) analyses of serum metabolic profiles can accurately (PPV 93%) detect ovarian cancer (OC).
Only a minority of the most predictively informative metabolites are currently annotated (7%).
Lipids predominate among the most predictively informative metabolites currently annotated.
The frequency distribution of model-derived patient scores can be used to develop a useful clinical tool for the diagnosis of OC.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was funded by the Ovarian Cancer Institute (Atlanta), the Laura Crandall Brown Foundation, the Deborah Nash Endowment Fund, Northside Hospital (Atlanta), and the Mark Light Integrated Cancer Research Student Fellowship.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee/IRB of Georgia Institute of Technology gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.