Abstract
Gene expression profiles that connect drug perturbations, disease gene expression signatures, and clinical data are important for discovering potential drug repurposing indications. However, the current approach to gene expression reversal has several limitations. First, most methods focus on validating the reversal expression of individual genes. Second, there is a lack of causal approaches for identifying drug repurposing candidates. Third, few methods for passing and summarizing information on a graph have been used for drug repurposing analysis, with classical network propagation and gene set enrichment analysis being the most common. Fourth, there is a lack of graph-valued association analysis, with current approaches using real-valued association analysis one gene at a time to reverse abnormal gene expressions to normal gene expressions.
To overcome these limitations, we propose a novel causal inference and graph neural network (GNN)-based framework for identifying drug repurposing candidates. We formulated a causal network as a continuous constrained optimization problem and developed a new algorithm for reconstructing large-scale causal networks of up to 1,000 nodes. We conducted large-scale simulations that demonstrated good false positive and false negative rates.
To aggregate and summarize information on both nodes and structure from the spatial domain of the causal network, we used directed acyclic graph neural networks (DAGNN). We also developed a new method for graph regression in which both dependent and independent variables are graphs. We used graph regression to measure the degree to which drugs reverse altered gene expressions of disease to normal levels and to select potential drug repurposing candidates.
To illustrate the application of our proposed methods for drug repurposing, we applied them to phase I and II L1000 connectivity map perturbational profiles from the Broad Institute LINCS, which consist of gene-expression profiles for thousands of perturbagens at a variety of time points, doses, and cell lines, as well as disease gene expression data under-expressed and over-expressed in response to SARS-CoV-2.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Drs. Jinying Zhao and Tao Xu are supported by NIH grants R01Dk107532 and 7RF1AG052476.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All datasets used in this work are publicly available from the following sources: The gene expression data for SARS-CoV-2 were obtained from GSE147507 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147507) . The CMap and L1000 data were downloaded from GSE 92742 and GSE 70138 (perturbational profiles from Broad Institute LINCS center, phase I and II) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92742 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70138).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability
All datasets used in this work are publicly available from the following sources: The gene expression data for SARS-CoV-2 were obtained from GSE147507 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147507). The CMap and L1000 data were downloaded from GSE 92742 and GSE 70138 (perturbational profiles from Broad Institute LINCS center, phase I and II) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92742 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70138).