Abstract
Background Deep Learning (DL) has emerged as a powerful tool to predict genetic biomarkers directly from digitized Hematoxylin and Eosin (H&E) slides in colorectal cancer (CRC). However, few studies have systematically investigated the predictability of biomarkers beyond routinely available alterations such as microsatellite instability (MSI), and BRAF and KRAS mutations.
Methods Our primary dataset comprised H&E slides of CRC tumors across five cohorts totaling 1,376 patients who underwent comprehensive panel sequencing, with an additional 536 patients from two public datasets for validation. We developed a DL model using a single transformer model to predict multiple genetic alterations directly from the slides. The model’s performance was compared against conventional single-target models, and potential confounders were analyzed.
Findings The multi-target model was able to predict numerous biomarkers from pathology slides, matching and partly exceeding single-target transformers. The Area Under the Receiver Operating Characteristic curve (AUROC, mean ± std) on the primary external validation cohorts was: BRAF (0·78 ± 0·01), hypermutation (0·88 ± 0·01), MSI (0·93 ± 0·01), RNF43 (0·86 ± 0·01); this biomarker predictability was mirrored across metrics and co-occurrence analyses. However, biomarkers with high AUROCs largely correlated with MSI, with model predictions depending considerably on MSI-associated morphology upon pathological examination.
Interpretation Our study demonstrates that multi-target transformers can predict the biomarker status for numerous genetic alterations in CRC directly from H&E slides. However, their pre-dictability is mainly associated with MSI phenotype, despite indications of slight biomarker-inherent contributions to a phenotype. Our findings underscore the need to analyze confounders in AI-based oncology biomarkers. To enable this, we developed a validated model applicable to other cancers and larger, diverse datasets.
Funding The German Federal Ministry of Health, the Max-Eder-Programme of German Cancer Aid, the German Federal Ministry of Education and Research, the German Academic Exchange Service, and the EU.
Competing Interest Statement
JNK declares consulting services for Bioptimus, France; Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany; has received a research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. MG has received honoraria for lectures sponsored by Techniker Krankenkasse (TK) and AstraZeneca. SF has received honoraria for lectures by BMS and MSD. UP declares consulting services for AbbVie and her husband is holding individual stocks for the following companies: BioNTech SE - ADR, Amazon, CureVac BV, NanoString Technologies, Google/Alphabet Inc Class C, NVIDIA Corp, Microsoft Corp.. No other potential conflicts of interest are reported by any of the authors.
Funding Statement
JNK is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111), the Max-Eder-Programme of German Cancer Aid (grant #70113864), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan), the German Academic Exchange Service (SECAI, 57616814), the German Federal Joint Committee (Transplant.KI, 01VSF21048) the European Union`s Horizon Europe and innovation programme (ODELIA, 101057091; GENIAL, 101096312) and the National Institute for Health and Care Research (NIHR, NIHR213331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. SF is supported by the German Federal Ministry of Education and Research (SWAG, 01KD2215C), the German Cancer Aid (DECADE, 70115166 and TargHet, 70115995) and the German Research Foundation (504101714). The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is funded by: National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (U01 CA137088, R01 CA488857, P20 CA252733). Genotyping/Sequencing services were provided by the Center for Inherited Disease Research (CIDR) contract number HHSN268201700006I. This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA015704. Scientific Computing Infrastructure at Fred Hutch funded by ORIP grant S10OD028685. The CORSA study was funded by Austrian Research Funding Agency (FFG) BRIDGE (grant 829675, to Andrea Gsur), the Herzfeldersche Familienstiftung (grant to Andrea Gsur) and was supported by COST Action BM1206. CRA was supported by the National Institutes of Health grant R01 CA068535. The coordination of EPIC is financially supported by the International Agency for Research on Cancer (IARC) and also by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts are supported by: Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Generale de l`Education Nationale, Institut National de la Sante et de la Recherche Medicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), German Institute of Human Nutrition Potsdam- Rehbruecke (DIfE), Federal Ministry of Education and Research (BMBF) (Germany); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Compagnia di SanPaolo and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); Health Research Fund (FIS) - Instituto de Salud Carlos III (ISCIII), Regional Governments of Andalucia, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology - ICO (Spain); Swedish Cancer Society, Swedish Research Council and and Region Skane and Region Vaesterbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk; MR/M012190/1 to EPIC-Oxford) (United Kingdom). The IWHS study was supported by NIH grants CA107333 (R01 grant awarded to P.J. Limburg) and HHSN261201000032C (N01 contract awarded to the University of Iowa). The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts 75N92021D00001, 75N92021D00002, 75N92021D00003, 75N92021D00004, 75N92021D00005.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was performed in accordance with the Declaration of Helsinki. This study is a retrospective analysis of scanned images of anonymized tissue samples of various cohorts of cancer patients. Data were collected and anonymized and ethical approval was obtained. The overall analysis was approved by the Ethics board of the Medical Faculty of Technical University Dresden under the ID BO-EK-444102022.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
List of abbreviations
- AUROC
- Area under the Receiver Operating Characteristic Curve
- AUPRC
- Area Under the Precision-Recall Curve
- CIN
- Chromosomal instability
- CPTAC
- Clinical Proteomic Tumor Analysis Consortium
- CRC
- Colorectal cancer
- DL
- Deep Learning
- H&E
- Hematoxylin and eosin
- Mb
- Megabases
- MSI
- Microsatellite instability
- MSS
- Microsatellite stable
- MUT
- Mutated
- NOS
- Not otherwise specified
- px
- Pixel
- ROC
- Receiver Operating Characteristic Curve
- TCGA
- The Cancer Genome Atlas
- TILs
- Tumor infiltrating lymphocytes
- ViT
- Vision Transformer
- WSI
- Whole Slide Image
- WT
- Wild type
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.