PT - JOURNAL ARTICLE AU - Stojšin, Rastko AU - Chen, Xiangning AU - Zhao, Zhongming TI - A Model-agnostic Computational Method for Discovering Gene–Phenotype Relationships and Inferring Gene Networks via <em>in silico</em> Gene Perturbation AID - 10.1101/2024.02.21.24303141 DP - 2024 Jan 01 TA - medRxiv PG - 2024.02.21.24303141 4099 - http://medrxiv.org/content/early/2024/02/23/2024.02.21.24303141.short 4100 - http://medrxiv.org/content/early/2024/02/23/2024.02.21.24303141.full AB - Background Deep learning architectures have advanced genotype‒phenotype mappings with precision but often obscure the roles of specific genes and their interactions. Our research introduces a model-agnostic computational methodology, capitalizing on the analytical strengths of deep learning models to serve as biological proxies, enabling interpretation of key gene interactions and their impact on phenotypic outcomes. The objective of this research is to refine the understanding of genetic networks in complex traits by leveraging the nuanced decision-making of advanced models.Results Testing was conducted across several computational models representing varying levels of complexity trained on gene expression datasets for the prediction of the Ki-67 biomarker, which is known for its prognostic value in breast cancer. The methodology is capable of using models as proxies to identify biologically significant genes and to infer relevant gene networks from an entirely data-driven analysis. Notably, the model-derived biomarkers (p-values of 0.013 and 0.003) outperformed the conventional Ki-67 biomarker (0.021) in terms of prognostic efficacy. Moreover, our analysis revealed high congruence between model precision and the biological relevance of the genes and gene relationships identified. Furthermore, we demonstrated that the complexity of the identified gene relationships was consistent with the decision-making intricacy of the model, with complex models capturing greater proportions of complex gene–gene interactions (61.2% and 31.1%) than simpler models (4.6%), reinforcing that the approach effectively captures biologically relevant in-model decision-making processes.Conclusions This methodology offers researchers a powerful tool to examine the decision-making processes within their genotype–phenotype mapping models. It accurately identifies critical genes and their interactions, revealing the biological rationale behind model decisions. It also enables comparisons of decision-making between different models. Furthermore, by discovering in-model critical gene networks, our approach helps bridge the gap between research and clinical applications. It facilitates the translation of complex, model-driven genetic discoveries into actionable clinical insights. This capability is pivotal for advancing personalized medicine, as it leverages the precision of deep learning models to uncover biologically relevant genes and gene networks and opens pathways for discovering new gene biomarker combinations and previously unknown gene interactions.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was partially supported by National Institutes of Health grants (U01AG079847, R01LM012806, and R01LM012806-07S1). We are thankful for the technical support from the Cancer Genomics Core funded by the Cancer Prevention and Research Institute of Texas (CPRIT RP180734). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study analyzed ONLY openly and publicly available, de-identified human gene expression datasets from the NCBI GEO database (accession numbers GSE96058 and GSE81538). GSE96058 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96058 GSE81538 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81538I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe datasets generated and analysed in the study are available in the Gene_Network_Project repository, https://github.com/ok-tsar/Gene_Network_Project.