Abstract
Rare structural variants (SVs) — insertions, deletions, and complex rearrangements — can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don’t incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Competing Interest Statement
SBM is an advisor to BioMarin, Myome and Tenaya Therapeutics. AB is a co-founder of CellCipher, Inc, is a shareholder in Alphabet, Inc, and has consulted for Third Rock Ventures, LLC. EAA is the founder of Personalis, Deepcell, Svexa, RCD Co, Parameter Health, an advisor for SequenceBio, Foresite Labs, PacBio, a non-executive director at AstraZeneca, hold stocks in Oxford Nanopore, Pacific Biosciences, AstraZeneca, and offers collaborative support in kind to Illumina, Pacific Biosciences, Oxford Nanopore
Funding Statement
We would like to thank Benjamin Strober, Taibo Li for suggestions on Watershed-SV model, Joshua Weinstock and Rebecca Keener for editing of this manuscript and helpful conversations related to this work. Research reported in this manuscript was in part supported through the Undiagnosed Diseases Network (Award U01HG010218) and the GREGoR Consortium (Award U01HG011762). TDJ is supported by U01HG011762, T32HG000044. BN is supported by R35GM139580, U24HG010263, OT2OD034190, U01CA253481, R03CA272952, U01HG012069. JEG is supported by U01HG010218, U01HG011762. SF is supported by 1R21HG013397, 5R01NS072248. CMR is supported by U01HG010218, U01HG011762. DEB is supported by U01HG011762, U01NS134358. RAU is supported by U01HG011762. PCG is supported by U01HG011762. ANR is supported by U01HG010218. EAA is supported by U01HG010218, U01HG011762 JAB is supported by U01HG011762 and U01NS134358. SZ is supported by 1R21HG013397, 5R01NS072248. MDG is supported by R35AG072290, P30AG066515, R01AG074339, R01AG048076. SBM is supported by U01HG011762, U01AG072573, R01AG066490, and R01MH125244, and U01HG012069. MCS is supported by U24HG010263, OT2OD034190, U01CA253481, R03CA272952. MW is supported by U01HG010218, U01HG011762. AB is supported by R35GM139580, U01HG012069. This work utilized computing resources provided by the Stanford Genetics Bioinformatics Service Center, supported by NIH Instrumentation Grant S10OD025082, and would not have been possible without the support of the Stanford SCG cluster system administrators.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRB of the National Human Genome Research Institute (NHGRI; Protocol 15-HG-0130) gave ethical approval for this work. The IRB of Stanford University (Protocols 23066, 32641, 38046) gave ethical approval for this work. Consent was obtained from patients or their parent/guardian as appropriate for use of their samples.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All UDN data produced in the present study are being uploaded to dbGaP under the accession phs001232. All GTEx data are under dbGaP accession phs000424.