Abstract
Background Both promoters and untranslated regions (UTRs) have critical regulatory roles, yet variants in these regions are largely excluded from clinical genetic testing due to difficulty in interpreting pathogenicity. The extent to which these regions may harbour diagnoses for individuals with rare disease is currently unknown.
Methods We present a framework for the identification and annotation of potentially deleterious proximal promoter and UTR variants in known dominant disease genes. We use this framework to annotate de novo variants (DNVs) in 8,040 undiagnosed individuals in the Genomics England 100,000 genomes project, which were subject to strict region-based filtering, clinical review, and validation studies where possible. In addition, we performed region and variant annotation-based burden testing in 7,862 unrelated probands against matched unaffected controls.
Results We prioritised eleven DNVs and identified an additional variant overlapping one of the eleven. Ten of these twelve variants (82%) are in genes that are a strong match to the individual’s phenotype and six had not previously been identified. Through burden testing, we did not observe a significant enrichment of potentially deleterious promoter and/or UTR variants in individuals with rare disease collectively across any of our region or variant annotations.
Conclusions Overall, we demonstrate the value of screening promoters and UTRs to uncover additional diagnoses for previously undiagnosed individuals with rare disease and provide a framework for doing so without dramatically increasing interpretation burden.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
NWhiffin and ACM-G are supported by a Sir Henry Dale Fellowship awarded to NWhiffin, jointly funded by the Wellcome Trust and the Royal Society (220134/Z/20/Z). NW, END, and MF are supported by research grant funding from the Rosetrees Trust (PGL19-2/10025). AJMB is supported by a Wellcome PhD Training Fellowship for Clinicians and the 4Ward North PhD Programme for Health Professionals (223521/Z/21/Z). JL is supported by a University of Southampton Anniversary Fellowship. DB is supported by a National Institute for Health Research (NIHR) (RP-2016-07-011) research professorship. SJS is supported by grant funding from the National Institutes of Health (NIH) (R01 MH116999 S.J.S. and U01 MH122681)
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This research was approved by the relevant Genomics England ethics committee. For all cases, written informed consent for research use of clinical and genetic data was obtained from patients, their parents, or legal guardians in the case of those with intellectual disability.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All publicly accessible data produced are available online in the near_coding_annotation GitHub repository. Any additional data is located in the National Genomic Research Library, which is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care).
https://github.com/Computational-Rare-Disease-Genomics-WHG/Near_coding_annotation