Abstract
Background Homologous Recombination Deficiency (HRD) is a pan-cancer predictive biomarker that identifies patients who benefit from therapy with PARP inhibitors (PARPi). However, testing for HRD is highly complex. Here, we investigated whether Deep Learning can predict HRD status solely based on routine Hematoxylin & Eosin (H&E) histology images in ten cancer types.
Methods We developed a fully automated deep learning pipeline with attention-weighted multiple instance learning (attMIL) to predict HRD status from histology images. A combined genomic scar HRD score, which integrated loss of heterozygosity (LOH), telomeric allelic imbalance (TAI) and large-scale state transitions (LST) was calculated from whole genome sequencing data for n=4,565 patients from two independent cohorts. The primary statistical endpoint was the Area Under the Receiver Operating Characteristic curve (AUROC) for the prediction of genomic scar HRD with a clinically used cutoff value.
Results We found that HRD status is predictable in tumors of the endometrium, pancreas and lung, reaching cross-validated AUROCs of 0.79, 0.58 and 0.66. Predictions generalized well to an external cohort with AUROCs of 0.93, 0.81 and 0.73 respectively. Additionally, an HRD classifier trained on breast cancer yielded an AUROC of 0.78 in internal validation and was able to predict HRD in endometrial, prostate and pancreatic cancer with AUROCs of 0.87, 0.84 and 0.67 indicating a shared HRD-like phenotype is across tumor entities.
Conclusion In this study, we show that HRD is directly predictable from H&E slides using attMIL within and across ten different tumor types.
Competing Interest Statement
JNK reports consulting services for Owkin, France, Panakeia, UK and DoMore Diagnostics, Norway and has received honoraria for lectures by MSD, Eisai and Fresenius. JSRF reports a leadership (board of directors) role at Grupo Oncoclinicas, stock or other ownership interests at Repare Therapeutics and Paige.AI, and a consulting or Advisory Role at Genentech/Roche, Invicro, Ventana Medical Systems, Volition RX, Paige.AI, Goldman Sachs, Bain Capital, Novartis, Repare Therapeutics, Lilly, Saga Diagnostics, Swarm and Personalis. No other potential conflicts of interest are reported by any of the authors.
Funding Statement
JNK is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111) and the Max-Eder-Programme of the German Cancer Aid (grant #70113864), the German Federal Ministry of Education and Research (PEARL, 01KD2104C), and the German Academic Exchange Service (SECAI, 57616814). This research was supported by the National Institute for Health and Care Research (NIHR, NIHR213331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. JSRF is funded in part by the Breast Cancer Research Foundation, a Susan G Komen Leadership Grant, the NIH/NCI P50 CA247749 01 grant and by the NIH/NCI Cancer Center Core Grant P30-CA008748.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
https://portal.gdc.cancer.gov/ https://www.cbioportal.org/ https://www.cancerimagingarchive.net/
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵* shared last authorship
Data Availability
The WSI, molecular and clinical data for TCGA and CPTAC cohorts are publicly accessible at https://portal.gdc.cancer.gov/ and https://www.cbioportal.org/ (accessed, 08 March 2022). Script for calculating the HRD score is available under https://github.com/sztup/scarHRD (accessed 06 June 2022). All other source codes can be downloaded under https://github.com/KatherLab/marugoto. Our calculated HRD score is publicly available in Supplementary Table 2. Moreover, our custom TCGA-BRCA HRD-H and HRD-L group can be accessed for the PanCancer Atlas cohort at https://www.cbioportal.org/ (Supplementary 3).
https://portal.gdc.cancer.gov/
https://github.com/sztup/scarHRD
List of Abbreviations
- AI
- artificial intelligence
- ASCAT
- Allele-Specific Copy number Analysis of Tumors
- attMIL
- attention-weighted multiple instance learning
- AUROC
- Area Under the Receiver Operating Characteristic curve
- BRCA
- breast invasive carcinoma
- BRCA1/2
- Breast Cancer genes 1 and 2
- CI
- confidence interval
- CIOMS
- Council for International Organizations of Medical Sciences
- CPTAC
- Clinical Proteomic Tumor Analysis Consortium
- CRC
- colorectal cancer
- DL
- Deep Learning
- DSB
- DNA double-strand breaks
- ER-
- estrogen receptor negative
- ER+
- estrogen receptor positive
- FDA
- U.S. Food and Drug Administration
- GBM
- glioblastoma
- GDC
- Genomic Data Commons
- GIS
- genomic instability score
- H&E
- Hematoxylin & Eosin
- HR
- Homologous recombination
- HRD-H
- HRD high
- HRD-L
- HRD low
- HRD
- Homologous Recombination Deficiency
- HRR
- Homologous recombination repair
- LIHC
- liver hepatocellular carcinoma
- LOH
- loss of heterozygosity
- LSCC
- squamous cell carcinoma of the lung
- LST
- large-scale state transitions
- LUAD
- adenocarcinoma of the lung
- LUSC
- squamous cell carcinoma of the lung
- OV
- ovarian cancer (OV)
- PAAD
- pancreatic adenocarcinoma
- PDA
- pancreatic adenocarcinoma
- PARP
- Poly(ADP-Ribose)-polymerase
- PARPi
- Poly(ADP-Ribose)-polymerase inhibitor
- PRAD
- prostate adenocarcinoma
- PRC
- precision recall curve
- ROC
- receiving operating curve
- SBS3
- single base substitution 3
- SNP
- single nucleotide polymorphism
- SSDBs
- single strand DNA breaks
- SSL
- self-supervised learning
- TAI
- telomeric allelic imbalance
- TCGA
- The Cancer Genome Atlas
- TRIPOD
- Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis
- UCEC
- endometrial carcinoma
- WSI
- whole slide images