PT - JOURNAL ARTICLE AU - Gorzynski, John E. AU - Marwaha, Shruti AU - Reuter, Chloe AU - Jensen, Tanner D. AU - Ferrasse, Alexis AU - Raja, Archana Natarajan AU - Fernandez, Liliana AU - Kravets, Elijah AU - Carter, Jennefer AU - Bonner, Devon AU - Sutton, Shirley AU - Network, Undiagnosed Diseases AU - Ruzhnikov, Maura AU - Hudgins, Louanne AU - Fisher, Paul G AU - Bernstein, Jonathan A. AU - Wheeler, Matthew T. AU - Ashley, Euan A. TI - Clinical application of Complete Long Read genome sequencing identifies a 16kb intragenic duplication in EHMT1 in a patient with suspected Kleefstra syndrome AID - 10.1101/2024.03.28.24304304 DP - 2024 Jan 01 TA - medRxiv PG - 2024.03.28.24304304 4099 - http://medrxiv.org/content/early/2024/03/29/2024.03.28.24304304.short 4100 - http://medrxiv.org/content/early/2024/03/29/2024.03.28.24304304.full AB - Long read sequencing offers benefits for the detection of structural variation in Mendelian disease. Here, we applied a new technology that generates contiguous long reads via tagmentation and sequencing by synthesis to a small cohort of patients with undiagnosed disease from the Undiagnosed Diseases Network. We first compare sequencing from the HG002 benchmark sample from Genome In A Bottle using nanopore sequencing (R10.4.1, duplex reads, Oxford Nanopore), single molecule real time sequencing (Revio SMRT cell, Pacific Biosciences) and complete long read sequencing (S4 flowcell, Novaseq, Illumina). Coverage was 33-35x across platforms. Read length N50 was 6.5kb (ICLR), 16.9kb (SMRT), and 33.8kb (ONT). We noted small differences in single nucleotide variant F1 scores across long read technologies with single nucleotide variant F1 scores (0.985-0.999) exceeding indel scores (0.78-0.99) and structural variant scores (0.74-0.96). We applied CLR sequencing to seven undiagnosed patients. In one patient, we detected and prioritized a novel 16kb intragenic duplication encompassing exons 5 and 6 in EHMT1. Resolution of the breakpoints and examination of flanking sequences revealed that the duplication was present in tandem and was predicted to result in a frameshift of the amino acid sequence and an early termination codon. It resulted in a diagnosis of Kleefstra syndrome. The variant was confirmed with targeted EHMT1 clinical testing and detected via nanopore and SMRT sequencing. In summary, we report the early clinical application of complete long read sequencing to a small cohort of undiagnosed patients.Competing Interest StatementEuan A. Ashley has received support in kind from Illumina, PacBio, and Nanopore.Funding StatementThis study was funded by the Undiagnosed Diseases Network.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Participants sequenced in this study were enrolled in the Undiagnosed Diseases Network (UDN) at Stanford Medicine and provided informed consent. The study was granted ethical approved by the central IRB at the National Institutes of Health.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe data on HG002 Genome in a Bottle sample is available publicly and referenced in the Methods section. The Ilumina complete long read sequencing data will be available in dbGaP in accordance with Undiagnosed Diseases Network data sharing policies. As data is deposited from the Undiagnosed Disease Network Data Management Coordinating Center Gateway Database to dbGaP, the complete long read sequencing data is not immediately available. Aggregate short read genome sequencing data on all UDN participants can be accessed via dbGaP Study Accession: phs001232.v5.p2.