RT Journal Article SR Electronic T1 Integration of DNA methylation datasets for individual prediction JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2023.03.22.23287572 DO 10.1101/2023.03.22.23287572 A1 Merzbacher, Charlotte A1 Ryan, Barry A1 Goldsborough, Thibaut A1 Hillary, Robert F A1 Campbell, Archie A1 Murphy, Lee A1 McIntosh, Andrew M A1 Liewald, David A1 Harris, Sarah E A1 McRae, Allan F A1 Cox, Simon R A1 Cannings, Timothy I A1 Vallejos, Catalina A1 McCartney, Daniel L A1 Marioni, Riccardo E YR 2023 UL http://medrxiv.org/content/early/2023/03/22/2023.03.22.23287572.abstract AB Background Epigenetic scores (EpiScores) can provide blood-based biomarkers of lifestyle and disease risk. Projecting a new individual onto a reference panel would aid precision medicine and risk communication but is challenging due to the separation of technical and biological sources of variation with array data. Normalisation methods can standardize data distributions but may also remove population-level biological variation.Methods We compared two independent birth cohorts (Lothian Birth Cohorts of 1921 and 1936 – nLBC1921 = 387 and nLBC1936 = 498) with DNA methylation assessed at the same chronological age (79 years) and processed in the same lab but in different years and experimental batches. We examined the effect of 15 normalisation methods on a BMI EpiScore (trained in an external cohort of 18,413 individuals) when the cohorts were normalised separately and together.Results The BMI EpiScore explained a maximum variance of R2=24.5% in BMI in LBC1936 after SWAN normalisation. Although there were differences in the variance explained across cohorts, the normalisation methods made minimal differences to the estimates within cohorts. Conversely, a range of absolute differences were seen for individual-level EpiScore estimates when cohorts were normalised separately versus together. While within-array methods result in identical BMI EpiScores whether a cohort was normalised on its own or together with the second dataset, a range of differences were observed for between-array methods.Conclusions Using normalisation methods that give similar EpiScores whether cohorts are analysed separately or together will minimise technical variation when projecting new data onto a reference panel. These methods are especially important for cases where when raw data and joint normalisation of cohorts is not possible or is computationally expensive.Competing Interest StatementREM is a scientific advisor to the Epigenetic Clock Development Foundation and Optima Partners. RFH is a scientific advisor to Optima Partners. LM has received payment from Illumina for presentations and consultancy.Funding StatementThis research was funded in whole, or in part, by the Wellcome Trust (104036/Z/14/Z and 216767/Z/19/Z). For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates [CZD/16/6] and the Scottish Funding Council [HR03006] and is currently supported by the Wellcome Trust [216767/Z/19/Z]. Genotyping of the GS samples was carried out by the Genetics Core Laboratory at the Edinburgh Clinical Research Facility, University of Edinburgh, Scotland and was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award "STratifying Resilience and Depression Longitudinally" (STRADL) Reference 104036/Z/14/Z). The authors thank all LBC study participants and research team members who have contributed, and continue to contribute, to the ongoing LBC study. The LBC1936 is supported by the Biotechnology and Biological Sciences Research Council, and the Economic and Social Research Council [BB/W008793/1], Age UK (Disconnected Mind project), the Milton Damerel Trust, and the University of Edinburgh. The LBC1921 was supported by the Biotechnology and Biological Sciences Research Council [SR176], the Chief Scientist Office of the Scottish Government [CZB/4/505; ETM/55], the Royal Society and the Medical Research Council [R42550]. Methylation typing was supported by the Centre for Cognitive Ageing and Cognitive Epidemiology (Pilot Fund award), Age UK, The Wellcome Trust Institutional Strategic Support Fund, The University of Edinburgh, and The University of Queensland. CM, TG, and BR are supported by the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. REM is supported by Alzheimer's Society major project grant AS-PG-19b-010. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:All components of GS received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89). GS has also been granted Research Tissue Bank status by the East of Scotland Research Ethics Service (REC Reference Number: 20-ES-0021), providing generic ethical approval for a wide range of uses within medical research. Ethical approval for the LBC1921 and LBC1936 studies was obtained from the Multi-Centre Research Ethics Committee for Scotland (MREC/01/0/56) and the Lothian Research Ethics committee (LREC/1998/4/183; LREC/2003/2/29). In both studies, all participants provided written informed consent. These studies were performed in accordance with the Helsinki declaration.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAccording to the terms of consent for Generation Scotland (GS) participants, access to data must be reviewed by the GS Access Committee. Applications should be made to access{at}generationscotland.org.Lothian Birth Cohort data are available on request from the Lothian Birth Cohort Study, University of Edinburgh (https://www.ed.ac.uk/lothian-birth-cohorts/data-access-collaboration). Lothian Birth Cohort data are not publicly available due to them containing information that could compromise participant consent and confidentiality.All code is available with open access at the following GitHub repository: https://github.com/marioni-group/DNAm_EpiScore_Projections