Abstract
Cancers are highly heterogeneous diseases and large molecular datasets are increasingly part of describing an individual’s unique experience. Gene expression is particularly attractive because it captures both genetic and environmental consequences. Our new approach, SPECTRA, provides a framework of agnostic multi-gene linear equations to calculate variables tuned to the needs of genomic epidemiology studies. SPECTRA variables are not supervised to an outcome. They are quantitative, linearly uncorrelated variables that retain integrity to the original data and cumulatively explain the majority of the global population variance. Together these variables represent a deep dive into the transcriptome, including both large and small sources of variance. The latter is often over-looked, but holds potential for the identification of smaller groups of individuals with large effects and important for developing precision strategies. Each SPECTRA variable is a quantitative tissue phenotype that can be considered a phenotypic outcome providing new avenues to explore disease risk. Also, as a set of SPECTRA variables, they are ideal for modeling alongside other variables as predictors for any clinical outcome of interest. We demonstrate the flexibility of SPECTRA variables for multiple endpoints using RNA sequencing from 767 myeloma patients in the CoMMpass study. Quantitative transcriptome SPECTRA variables enhance the tools researchers have available for incorporating expression in studies to advance precision screening, prevention, intervention, and survival.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Research reported in this publication was supported by the National Cancer Institute (Award Numbers F99CA234943, K00CA234943, and P30CA042014-29S9), the National Center for Advancing Translational Sciences (Award Number UL1TR002538), the National Library of Medicine (Award Number T15LM007124) of the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Publicly available RNAseq data were used in this project from the Multiple Myeloma Research Foundation (MMRF) CoMMpass trial (NCT145429). To be eligible, a patient must have read, understood and signed informed consent. RNAseq data was downloaded directly from the MMRF Researcher Gateway (https://research.themmrf.org) after creating an account. Permission was also obtained to access these data via dbGaP under accessions phs000348.v2.p1 and phs000748.v4.p3.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data is publicly available through the MMRF Researcher Gateway (https://research.themmrf.org).