Abstract
Background RNA-sequencing of patient biosamples is a promising approach to delineate the impact of genomic variants on splicing, but variable gene expression between tissues complicates selection of appropriate tissues. Relative expression level is often used as a metric to predict RNA-sequencing utility. Here, we describe a gene- and tissue-specific metric to inform the feasibility of RNA-sequencing, overcoming some issues with using expression values alone.
Results We derive a novel metric, Minimum Required Sequencing Depth (MRSD), for all genes across three human biosamples (whole blood, lymphoblastoid cell lines (LCLs) and skeletal muscle). MRSD estimates the depth of sequencing required from RNA-sequencing to achieve user-specified sequencing coverage of a gene, transcript or group of genes of interest. MRSD predicts levels of splice junction coverage with high precision (90.1-98.2%) and overcomes transcript region-specific sequencing biases. Applying MRSD scoring to established disease gene panels shows that LCLs are the optimum source of RNA, of the three investigated biosamples, for 69.3% of gene panels. Our approach demonstrates that up to 59.4% of variants of uncertain significance in ClinVar predicted to impact splicing could be functionally assayed by RNA-sequencing in at least one of the investigated biosamples.
Conclusions We demonstrate the power of MRSD as a metric to inform choice of appropriate biosamples for the functional assessment of splicing aberrations. We apply MRSD in the context of Mendelian genetic disorders and illustrate its benefits over expression-based approaches. We anticipate that the integration of MRSD into clinical pipelines will improve variant interpretation and, ultimately, diagnostic yield.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
C.F.R. is funded by the Medical Research Council (MRC; 1926882) as part of a CASE studentship with QIAGEN. The Baralle lab is supported by an NIHR Research Professorship to D.B. (RP-2016-07-011). W.G.N. is supported by the NIHR Manchester Biomedical Research Centre (IS-BRC-1215-20007). We acknowledge funding from the Wellcome Trust Transforming Genomic Medicine Initiative (200990/Z/16/Z) and the Medical Research Foundation. J.M.E is funded by a postdoctoral research fellowship from the Health Education England Genomics Education Programme (HEE GEP). The views expressed in this publication are those of the authors and not necessarily those of the HEE GEP.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
South Central-Hampshire A (ref: 17/SC/0026), South Central-Oxford B (ref:11/SC/0269), South Manchester (ref:11/H10003/3) and Scotland A (refs: 06/MRE00/76 and 16/SS/0201) Research Ethics Committees.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The control datasets used to generate the MRSD model are available through the dbGaP repository as part of the GTEx v8 release (accession phs000424.v8.p2). Publicly available muscle-derived RNA-seq datasets to test the model are available at dbGaP (accession phs000655.v3.p1.c1). Source code will be made available upon publication. All MRSD scores are available at http://mcgm-mrsd.github.io/.