Abstract
Background Copy number aberrations (CNA) have proved to be of clinical and therapeutic significance for many diseases including breast cancer, since they drive numerous key underlying biological processes, by regulating molecular phenotypes like gene expression and others. To comprehensively assess the effect of CNAs, it is not sufficient to only identify significant CNA-gene expression pairs, but also to identify the overall gene networks and regulatory structures that are influenced by CNAs, subsequently producing change in outcomes.
Methods In this article, we adopt a two-step analysis approach to identify CNA regulated genes whose expression levels affect breast cancer related outcomes: (1) we identify gene modules that are regulated by CNAs through sparse canonical correlation analysis (sCCA) which selects a set of closely located CNAs that regulates the expression levels of selected genes. (2) then, we use a using generalized linear model, to identify which genes within the gene modules are associated with breast cancer related outcomes.
Results Analyzing clinical and genomic data on 1904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. The identification of gene modules was further validated using independent data on individuals in a study of breast invasive carcinoma from The Cancer Genome Atlas (TCGA). Association analysis on 7 different breast cancer related outcomes identified several novel and interpretable regulatory associations which highlights how CNA can impact key biological pathways and process in context of breast cancer. Through downstream analysis of two example outcomes: estrogen receptor status and overall survival, we show that the identified genes were enriched in relevant biological pathways and the key advantage of our method is that we additionally identify the CNA that regulate these genes. Due to the availability of multiple types of outcomes, we further meta-analyzed the results to identify genes that had potentially associations with multiple outcomes.
Conclusions Overall we present a generalizable analysis approach to identify genes associated to different outcomes that are regulated by sets of CNA and can further be used to combine results across various types of outcomes. The results show that our method can identify novel and interpretable associations, by providing mechanistic insights on how the effects of CNA are cascaded via gene expression to impact breast cancer and related outcomes.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
DD was supported by R01 grant from the National Human Genome Research Institute [1 R01 HG010480-01; PI Dr. Nilanjan Chatterjee]. AS and JS were supported by R01 grant from the National Cancer Institute [7 R01 CA197402-05; PI: Jaya Satagopan].
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
No new data was generated.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
We used clinical and genomic data publicly available at cBioPortal catalog. Website: https://www.cbioportal.org/