Abstract
Background 5’ untranslated regions (5’UTRs) are essential modulators of protein translation. Predicting the impact of 5’UTR variants is challenging and typically not performed in routine diagnostics. Here, we present a combined approach of a comprehensive prioritization strategy and subsequent functional assays to evaluate 5’UTR variation in two large cohorts of patients with inherited retinal diseases (IRDs).
Methods We performed an isoform-level re-analysis of retinal RNA-seq data to identify the protein-coding transcripts of 378 IRD genes with highest expression in retina. We evaluated the coverage of these 5’UTRs by different whole exome sequencing (WES) capture kits. The selected 5’UTRs were analyzed in whole genome sequencing (WGS) and WES data from IRD sub-cohorts from the 100,000 Genomes Project (n = 2,417 WGS) and an in-house database (n = 1,682 WES), respectively. Identified variants were annotated for 5’UTR-relevant features and classified into 7 distinct categories based on their predicted functional consequence. We developed a variant prioritization strategy by integrating population frequency, specific criteria for each category, and family and phenotypic data. A selection of candidate variants underwent functional validation using diverse experimental approaches.
Results Isoform-level re-quantification of retinal gene expression revealed 76 IRD genes with a non-canonical retina-enriched isoform, of which 20 display a fully distinct 5’UTR compared to that of their canonical isoform. Depending on the probe-design 3-20% of IRD genes have 5’UTRs fully captured by WES. After analyzing these regions in both IRD cohorts we prioritized 11 (likely) pathogenic variants in 10 genes (ARL3, MERTK, NDP, NMNAT1, NPHP4, PAX6, PRPF31, PRPF4, RDH12, RD3), of which 8 were novel. Functional analyses further supported the pathogenicity of 2 variants. The MERTK:c.-125G>A variant, overlapping a transcriptional start site, was shown to significantly reduce both luciferase mRNA levels and activity. The RDH12:c.-123C>T variant was found in cis with the reported hypomorphic RDH12:c.701G>A (p.Arg234His) variant in 11 patients. This 5’UTR variant, predicted to introduce an upstream open reading frame, was shown to result in reduced RDH12 protein but unaltered mRNA levels.
Conclusions This study demonstrates the importance of 5’UTR variants implicated in IRDs and provides a systematic approach for 5’UTR annotation and validation that is applicable to other inherited diseases.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by the Ghent University Special Research Fund (BOF20/GOA/023; BOF/STA/201909/016) (EDB, BPL, FCP); H2020 Marie Sklodowska-Curie Innovative Training Networks (ITN) StarT (grant No. 813490) (ADR, EDB, FC); Ghent University Hospital under the NucleUZ Grant (EDB, FC); Foundation Fighting Blindness (TA-GT-0621-0810-UGENT) (FCP); EJPRD19-234 Solve-RET (EDB); Fundacion Alfonso Martin Escudero (MDPV); Instituto de Salud Carlos III (ISCIII) of the Spanish Ministry of Health (CA; FIS: PI22/00321); University Chair UAM-IIS-FJD of Genomic Medicine (CA). GA is funded by a Fight For Sight UK Early Career Investigator Award (5045/46), National Institute of Health Research Biomedical Research Centre (NIHR-BRC) at Moorfields Eye Hospital and UCL Institute of Ophthalmology and NIHR-BRC at Great Ormond Street Hospital Institute for Child Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The 100,000 Genomes Project Protocol has ethical approval from the HRA Committee East of England - Cambridge South (REC Ref 14EE1112). This study was registered with Genomics England within the Hearing and sight domain under Research Registry Projects 465. This study was approved by the ethics committee for Ghent University Hospital (B6702021000312) and performed in accordance with the tenets of the Helsinki Declaration and subsequent reviews.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵* Shared second co-authors
Data Availability
The data that support the findings of this study are available within the Genomics England (protected) Research Environment but restrictions apply to the availability of these data, as access to the Research Environment is limited to protect the privacy and confidentiality of participants. Likewise, healthcare and genomic data derived from individuals included in the Center for Medical Genetics Ghent (CMGG) cohort are not publicly available to comply with the consent given by those participants. De-identified data as well as analysis scripts are available from the authors upon reasonable request. Extended data generated in this study are available in the supplementary materials.
Abbreviations
- 5’UTR
- 5’ untranslated region
- AD
- Autosomal dominant
- AR
- Autosomal recessive
- CAGE-seq
- Cap analysis gene expression sequencing
- CMGG
- Center for Medical Genetics Ghent
- CDS
- Coding sequence
- DMEM
- Dulbecco’s minimal essential medium
- ERDC
- European Retinal Disease Consortium
- FC
- Fold change
- GE
- Genomics England
- IRD
- Inherited retinal disease
- IRES
- Internal ribosomal entry sites
- MAF
- Minor allele frequency
- OD/OS
- Oculus dexter/sinister (right/left eye)
- qPCR
- Quantitative polymerase chain reaction
- SNV
- Single-nucleotide variant
- SV
- Structural variant
- TSS
- Transcription start site
- TPM
- Transcripts per million
- TE
- Translational efficiency
- uORF
- Upstream open reading frame
- VUS
- Variant of uncertain significance
- WES/WGS
- Whole exome/genome sequencing