Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations

Diptavo Dutta; Ananda Sen; Jaya Satagopan

doi:10.1101/2021.08.29.21262811

Abstract

Background Copy number aberrations (CNAs) in cancer affect disease outcomes by regulating molecular phenotypes, such as gene expressions, that drive important biological processes. To gain comprehensive insights into molecular biomarkers for cancer, it is critical to identify key groups of CNAs, the associated gene networks, regulatory modules, and their downstream effect on outcomes.

Methods In this paper, we demonstrate an innovative use of sparse canonical correlation analysis (sCCA) to effectively identify the ensemble of CNAs, gene networks and regulatory modules in the context of binary and censored disease endpoints. Our approach detects potentially orthogonal gene expression modules which are highly correlated with sets of CNA and then identifies the genes within these modules that are associated with the outcome.

Results Analyzing clinical and genomic data on 1,904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. We validated this finding using an independent set of 1,077 breast invasive carcinoma samples from The Cancer Genome Atlas (TCGA). Our analysis on 7 clinical endpoints identified several novel and interpretable regulatory associations, highlighting the role of CNAs in key biological pathways and processes for breast cancer. Genes significantly associated with the outcomes were enriched for early estrogen response pathway, DNA repair pathways as well as targets of transcription factors such as E2F4, MYC and ETS1 that have recognized roles in tumor characteristics and survival. Subsequent meta-analysis across the endpoints further identified several genes through aggregation of weaker associations.

Conclusions Our findings suggest that sCCA analysis can aggregate weaker associations to identify interpretable and important genes, networks and pathways that are clinically consequential.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

DD was supported by R01 grant from the National Human Genome Research Institute [1 R01 HG010480-01; PI Dr. Nilanjan Chatterjee]. AS and JS were supported by R01 grant from the National Cancer Institute [7 R01 CA197402-05; PI: Jaya Satagopan].

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

No new data was generated.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

We used clinical and genomic data publicly available at cBioPortal catalog. Website: https://www.cbioportal.org/

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.