Abstract
We describe SPECTRA, a novel approach to measure variation in the transcriptome, providing unsupervised quantitative variables to model with any clinical, demographic, or biological endpoint. Complex diseases, including cancer, are highly heterogeneous, and large molecular datasets are increasingly part of describing an individual’s unique experience. Gene expression is particularly attractive because it captures both genetic and environmental consequences. SPECTRA provides a framework of agnostic multi-gene linear transformations to calculate variables tuned to the needs of complex disease studies. SPECTRA variables are not supervised to an outcome and are quantitative, linearly uncorrelated variables that retain integrity to the original data and cumulatively explain the majority of the global population variance. Together these variables represent a deep dive into the transcriptome, including both large and small sources of variance. The latter is often overlooked but holds the potential for the identification of smaller groups of individuals with large effects, important for developing precision strategies. Each spectrum is a quantitative variable that can also be considered a phenotypic outcome, providing new avenues to explore disease risk. As a set, SPECTRA variables are ideal for modeling alongside other predictors for any clinical outcome of interest. We demonstrate the flexibility of SPECTRA variables for multiple endpoints, and the potential to out-perform existing methods, using 767 myeloma patients in the CoMMpass study. SPECTRA provides an approach to incorporate deep, transcriptome variability in studies to advance research in precision screening, prevention, intervention, and survival.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Research reported in this publication was supported by the National Cancer Institute (Award Numbers F99CA234943, K00CA234943, and P30CA042014-29S9), the National Center for Advancing Translational Sciences (Award Number UL1TR002538), the National Library of Medicine (Award Number T15LM007124) of the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Publicly available RNAseq data were used in this project from the Multiple Myeloma Research Foundation (MMRF) CoMMpass trial (NCT145429). To be eligible, a patient must have read, understood and signed informed consent. RNAseq data was downloaded directly from the MMRF Researcher Gateway (https://research.themmrf.org) after creating an account. Permission was also obtained to access these data via dbGaP under accessions phs000348.v2.p1 and phs000748.v4.p3.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data is publicly available through the MMRF Researcher Gateway (https://research.themmrf.org).