Abstract
The Long-read POG dataset comprises a cohort of 189 patient tumours and 41 matched normal samples sequenced using the Oxford Nanopore Technologies PromethION platform. This dataset from the Personalized Oncogenomics (POG) program and the Marathon of Hope Cancer Centres Network includes accompanying DNA and RNA short-read sequence data, analytics, and clinical information. We show the potential of long-read sequencing for resolving complex cancer-related structural variants, viral integrations, and extrachromosomal circular DNA. Long-range phasing of variants facilitates the discovery of allelically differentially methylated regions (aDMRs) and allele-specific expression, including recurrent aDMRs in the cancer genes RET and CDKN2A. Germline promoter methylation in MLH1 can be directly observed in Lynch syndrome. Promoter methylation in BRCA1 and RAD51C is a likely driver behind patterns of homologous recombination deficiency where no driver mutation was found. This dataset demonstrates applications for long-read sequencing in precision medicine, and is available as a resource for developing analytical approaches using this technology.
Competing Interest Statement
The following authors disclose relevant potential competing interests: Kieran ONeill, Vanessa Porter, Luis F Paulin, Katherine Dixon and Janessa Laskin and Steven J.M. Jones received travel funding from Oxford Nanopore Technologies to present at conferences in 2022 and 2023.
Funding Statement
This work would not be possible without the participation of our patients and families, the POG team, Canadas Michael Smith Genome Sciences Centre technical platforms, the generous support of the BC Cancer Foundation and their donors, the Terry Fox Research Institute Marathon of Hope Cancer Centres Network, and Genome British Columbia (project B20POG). We acknowledge contributions from Genome Canada and Genome BC (projects 202SEQ M.A.M & S.M.J, 212SEQ M.A.M & S.M.J, 12002 GBC M.A.M, S.M.J & J.L), Canada Foundation for Innovation (projects 20070 M.A.M & S.M.J, 30981 M.A.M, S.M.J & J.L, 30198 M.A.M, 33408 M.A.M & S.M.J, 40104 M.A.M & S.M.J, 42362 S.M.J) including the CGEN platform (35444 S.M.J) and the BC Knowledge Development Fund. We acknowledge the generous support of the CIHR Foundation Grants program (FDN 143288, M.A.M). The results published here are in part based upon analyses of data generated by the following projects and obtained from dbGaP (http://www.ncbi.nlm.nih.gov/gap): Genotype-Tissue Expression (GTEx) Project, supported by the Common Fund of the Office of the Director of the National Institutes of Health (https://commonfund.nih.gov/GTEx).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The POG program, registered under clinical trial number NCT02155621, was approved by the University of British Columbia - BC Cancer Research Ethics Board (H12-00137, H14-00681), and approved by the institutional review board.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Genomic and transcriptomic sequence datasets for long-read and short-read platforms have been deposited at the European Genome-phenome Archive (EGA, https://ega-archive.org/) as part of the study EGAS00001001159 with accession numbers as listed in Supplementary Table 1. Processed data from Long-read POG, figure source data and accompanying short-read variants can be downloaded from https://www.bcgsc.ca/downloads/nanopore_pog/. Data on mutations, copy changes and expression from tumour samples in the POG program are also accessible from https://www.personalizedoncogenomics.org/cbioportal/. Code used to generate figures in this manuscript is available in containerized, reproducible form at https://github.com/bcgsc/long_read_pog. WGBS data, ENCODE accessions and samples from GSE186458 that were used as normal tissues are included in Supplementary Table 7.