Abstract
Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.
Author Summary Congenital heart disease (CHD) is the most common birth defect. With the development of new generation sequencing technology, germline mutations such as de novo mutations (DNMs) with deleterious effects can be identified to aid in discovering the genetic causes for early on-set diseases such as CHD. However, the statistical power is still limited by the small sample size of DNM studies due to the high cost of recruiting and sequencing samples, and the low occurrence of DNMs given its rarity. Compared to DNM analysis for other diseases, it is even more challenging for CHD given its genetic heterogeneity. Recent research has suggested shared disease mechanisms between early-onset neurodevelopmental diseases and CHD based on findings from DNMs. Currently, there are few methods that can jointly analyze DNM data on multiple traits. Therefore, we develop a framework to identify risk genes for multiple traits simultaneously for DNM data. The new method is applied to CHD and autism as a case study to demonstrate its improved power in identifying risk genes compared with single-trait analysis. Our results lead to new insights on the disease etiology of CHD, and the shared etiological mechanisms between CHD and autism.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Supported in part by NIH grant R03HD100883-01A1.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study is approved by Yale Human Research Protection Program Institutional Review Boards (IRB protocol ID 2000028735). The IRB has determined that this protocol presents minimal risk to subjects.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
CHD data were downloaded from the supplement of Jin et al (PMID: 28991257). Autism data were acquired from denovo-db.