Abstract
While numerous common variants have been linked to breast cancer (BCa) risk, they explain only partially the total BCa heritability. Inference from the Nordic population-based twin data indicates that rare high-risk loci are the chief determinant of BCa risk. Here, we use haplotypes, rather than single variants, to identify rare high-risk loci for BCa. With computationally phased genotypes from 181,034 white British women in the UK Biobank, we conducted a genome-wide haplotype-BCa association analysis using sliding windows of 5-500 consecutive array-genotyped variants. In the discovery stage, haplotype associations with BCa risk were evaluated retrospectively in the pre-study-enrollment portion of data including 5,487 BCa cases. BCa hazard ratios (HRs) for additive haplotypic effects were estimated using Cox regression. Our replication analysis included women free of BCa at enrollment, of whom 3,524 later developed BCa. This two-stage analysis detected 13 rare loci (frequency <1%), each associated with an appreciable BCa risk increase (discovery: HRs=2.84-6.10, P-value<5×10−8; replication: HRs=2.08-5.61, P-value<0.01). In contrast, the variants that formed these rare haplotypes individually exhibited much smaller effects. Functional annotation revealed extensive cis-regulatory DNA elements in BCa-related cells underlying the replicated rare haplotypes. Using phased, imputed genotypes from 30,064 cases and 25,282 controls in the DRIVE OncoArray case-control study, six of the 13 rare-loci associations proved generalizability (odds ratio estimates: 1.48-7.67, P-value<0.05). This study demonstrates the complementary advantage of utilizing rare haplotypes to capture novel risk loci and possible discoveries of more genetic elements contributing to BCa heritability once large, germline whole-genome sequencing data become available.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
National Cancer Institute Grant R01 CA216354, American Lebanese Syrian Associated Charities, and Alberta Machine Intelligence Institute.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
1. UK Biobank: https://www.ukbiobank.ac.uk/ 2. DRIVE study: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001265.v1.p1
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The data generated in this study are available within the article and its supplemental data files. Individual-level genotype and phenotype data are publicly available by submitting request to the UK Biobank and dbGaP (study accession: phs001265.v1.p1).