Abstract
Allogeneic hematopoietic cell transplantation (HCT) is used to treat many blood-based disorders and malignancies. While this is an effective treatment, it can result in serious adverse events, such as the development of acute graft-versus-host disease (aGVHD). This study aimed to develop a donor-specific epigenetic classifier that could be used in donor selection in HCT to reduce the incidence of aGVHD.
The discovery cohort of the study consisted of 288 donors from a population receiving HLA-A, -B, -C and -DRB1 matched unrelated donor HCT with T cell replete peripheral blood stem cell grafts for treatment of acute leukaemia or myelodysplastic syndromes after myeloablative conditioning. Donors were selected based on recipient aGVHD outcome; this cohort consisted of 144 cases with aGVHD grades III-IV and 144 controls with no aGVHD that survived at least 100 days post-HCT matched for sex, age, disease and GVHD prophylaxis.
Genome-wide DNA methylation was assessed using the Infinium Methylation EPIC BeadChip (Illumina), measuring CpG methylation at >850,000 sites across the genome. Following quality control, pre-processing and exploratory analyses, we applied a machine learning algorithm (Random Forest) to identify CpG sites predictive of aGVHD. Receiver operating characteristic (ROC) curve analysis of these sites resulted in a classifier with an encouraging area under the ROC curve (AUC) of 0.91.
To test this classifier, we used an independent validation cohort (n=288) selected using the same criteria as the discovery cohort. Different attempts to validate the classifier using the independent validation cohort failed with the AUC falling to 0.51. These results indicate that donor DNA methylation may not be a suitable predictor of aGVHD in an HCT setting involving unrelated donors, despite the initial promising results in the discovery cohort.
Our work highlights the importance of independent validation of machine learning classifiers, particularly when developing classifiers intended for clinical use.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This project was funded by the National Institute for Health Research (NIHR) Blood & Transplant Research Unit (BTRU) (NIHR-BTRU-2014-10074). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. The CIBMTR is supported primarily by Public Health Service U24CA076518 from the National Cancer Institute (NCI), the National Heart, Lung and Blood Institute (NHLBI) and the National Institute of Allergy and Infectious Diseases (NIAID); HHSH250201700006C from the Health Resources and Services Administration (HRSA); and N00014-20-1-2705 and N00014-20-1-2832 from the Office of Naval Research; Support is also provided by Be the Match Foundation, the Medical College of Wisconsin, and the National Marrow Donor Program.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee of UCL gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The participants involved in the study had been recruited under different consents which require different levels of data access. According to consent given, the corresponding data are being made available in a three-tiered data access approach: 1. Processed data (beta matrix) for all individuals (n=570) are available from the open access 'Gene Expression Omnibus' under accession number GSE196696. To reduce the chance of reidentification, all non-cg probes, including SNP targeting rs probes have been removed. The data are provided in both raw (unnormalized) and SWAN normalised formats. 2. Raw data (IDAT files) are available for individuals with appropriate consent (n=403 in total) from the controlled access 'European Genome-Phenome Archive' under accession number EGAxxxxx. 3. Raw data (IDAT files) and associated phenotype information are available for all individuals included in this study (n=570) directly from CIBMTR. Data are available under controlled access release upon reasonable request and execution of a data use agreement. Requests should be submitted to CIBMTR at info-request@mcw.edu and include the study reference IB17-04.