Summary
Clinical short-read exome and genome sequencing approaches have positively impacted diagnostic testing for rare diseases. Yet, technical limitations associated with short reads challenge their use for detection of disease-associated variation in complex regions of the genome. Long-read sequencing (LRS) technologies may overcome these challenges, potentially qualifying as a first-tier test for all rare diseases. To test this hypothesis, we performed LRS (30x HiFi genomes) for 100 samples with 145 known clinically relevant germline variants that are challenging to detect using short-read sequencing and necessitate a broad range of complementary test modalities in diagnostic laboratories.
We show that relevant variant callers readily re-identify the majority of variants (120/145, 83%), including ∼90% of structural variants, SNVs/InDels in homologous sequences and expansions of short tandem repeats. Another 10% (n=14) was visually apparent in the data but not automatically detected. Our analyses also identified systematic challenges for the remaining 7% (n=11) of variants such as the detection of AG-rich repeat expansions. Titration analysis showed that 89% of all automatically called variants could also be identified using 15-fold coverage.
Thus, long-read genomes identified 93% of pathogenic variants that are most challenging to detect using short-read technologies. Even with reduced coverage, the vast majority of variants remained detectable, possibly enhancing cost-effective diagnostic implementation. Most importantly, we show the potential to use a single technology to accurately identify all types of clinically relevant variants.
Competing Interest Statement
TM, ED, XC and MAE are employees and shareholders of Pacific Biosciences, a company commercializing DNA sequencing technologies. Pacific Biosciences also kindly provided part of the reagents required for this study. The remaining authors declare that they have no competing interest.
Funding Statement
Technical and financial support was received from the Klinisch Genetisch Centrum Nijmegen, Radboud Genome Technology Center and the Netherlands X-omics Initiative NWO (project 184.034.019). The aims of this study contribute to the Solve-RD project (to AH, CG and LELMV), which has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No 779257.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study is in agreement with the principles of the Helsinki declaration. In addition, the study was performed as part of a local validation study for the implementation of LRS under ISO15189 accreditation and assessed as a diagnostic innovation by the Medical Ethics Review Committee Arnhem-Nijmegen under dossier number 2020-7142.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All relevant data supporting the conclusions of this study, including anonymized variant calls, are contained in the manuscript. The raw sequencing data are not publicly available due to patient confidentiality, but can be made available upon reasonable request to the authors, in accordance with institutional and ethical guidelines.
Abbreviations
- FISH
- Fluorescence In Situ Hybridization
- IGV
- Integrative Genomics Viewer
- NGS
- Next Generation Sequencing
- LRS
- Long read Sequencing
- MEI
- Mobile Element Insertion
- MLPA
- Multiplex Ligation Probe Amplification
- ONT
- Oxford Nanopore Technologies
- RD
- Rare Disease
- ROH
- Region of Homozygosity
- SNV
- Single Nucleotide Variant
- SRS
- Short read Sequencing
- SR-WGS
- Short read Whole Genome Sequencing
- STR
- Short Tandem Repeat
- SV
- Structural Variant
- UPD
- Uniparental Disomy
- WGS
- Whole Genome Sequencing