Abstract
Computer-aided drug discovery (CADD) is a widely used method for drug discovery with many successes. Meanwhile, CADD has the limitation of analyzing multi-level scores such as docking results of multiple proteins with multiple drugs. We propose a method of PageRank to solve the problem. This method can make a comprehensive ranking based on multi-level scores. Then we take an example of therapeutic hypothermia (TH). Three levels of TH data were used in the article: the log2 foldchange (logFC) of proteins, the relative expression values of mRNA, and the docking scores of proteins and molecules. After calculation, we get the comprehensive drug rank and drug combination rank of each group of TH, which means we can generate the rank of drug directly from bioinformatics. Based on this method, we raised the concept of bioinfo-pharmacology. Given the high rationality and compatibility of bioinfo-pharmacology, it can effectively enhance popular drug discovery techniques such as the docking or pharmacophore model. Besides, it could advance the application of precision medicine.
Introduction
Drug discovery is an expensive and time-demanding process that faces many challenges, including low hit discovery rates for high-throughput screening, among many others.[1,2] Methods of computer-aided drug discovery (CADD) can significantly speed up the pace of such screening and reduce the cost. Until now, CADD has achieved important results. [3–5]
Meanwhile, CADD also has limitations. Researchers can only get the best match for a particular target (drug development), or the best match for a particular molecule (network pharmacology[6]). As a result, most pharmacological studies currently work on a single target. However, according to bioinformatics databases, diseases/treatments exist multiple targets, which generate complex regulation functions, in the different stages of diseases/treatments.[7]
Take therapeutic hypothermia as an example. Therapeutic hypothermia (TH) can limit the degree of some kinds of injuries in randomized trials[8] and animal experiments[9], and is even the only effective method for some diseases especially hypoxic-ischemic encephalopathy (HIE). HIE often causes severe neurological sequelae, which is the main reason for the poor prognosis of patients with stroke, shock, carbon monoxide poisoning, cerebral hemorrhage, and cardiac arrest.[10–12] In the research based on TH, cold shock proteins especially cold-induced RNA binding protein (CIRP) show high expression [13] and rapid response [14]. CIRP has been shown to promote the translation of genes involved in DNA repair [15,16], telomerase maintenance[17], and genes associated with the translational machinery[18].
However, if CIRP leaks to the intercellular substance with cell swelling and rupture, it will become a harmful protein. Extracellular CIRP (eCIRP) showed a strong pro-inflammatory effect, leading to a heavier hypoxic injury.[19,20] Because of the habit of clinical medication, we cannot determine whether there are drugs that affect the therapeutic effect before and after the beginning of TH.
With the development of protein prediction technologies, especially AlphaFold2[21] and RoseTTAFold[22], we can obtain the three-dimensional structure of proteins more quickly and accurately. All target proteins’ structures can be predicted, and their best antagonists can be obtained by molecular docking. However, there is no technology for comprehensively ranking the cross-level data of numerical evaluation. To solve the complex function differences by temporal and spatial distribution differences of proteins, we use personalization-weight-PageRank to rank drugs targeting proteins predicted by AlphaFold2 and RoseTTAFold at different groups to predict the best drugs or drug combinations for each group. Based on these, we came up with the concept of bioinfo-pharmacology.
Method
Experiment design
As shown in Figure 1, the representative experiment of bioinfo-pharmacology is divided into 5 processes: 1. Protein or mRNA chosen by bioinformatics analysis; 2. Protein and drugs 3D structure acquisition and prediction; 3. Proteins’ active sites prediction; 4. Drug/molecular group evaluation with target proteins; 5. PageRank of docking results, protein logFC, and mRNA expression. The experiment of animals or cells is referred by authors, but not forced. The biggest difference from previous studies is PageRank.
The data source of bioinformatics analysis
We retrieved the original data of mRNA expression under hypothermia treatment from the website of The National Center for Biotechnology Information (NCBI) (GSE54229). The research was reported by Sten et.al.[14] In their research, mouse embryonic fibroblasts were exposed to mild hypothermia (32°C) or normothermia (37°C) to gain the transcription response induced by hypothermia.
Expression Profile Analysis
The log2 fold-change (log2FC) and p-value were calculated for the normothermia group. Top 3 log2FC mRNA with q-value < 0.05 were selected from each group to enter the next step. If there exists mRNA with failed protein structure prediction, the mRNA would be skipped.
R 3.6.1 was used to detect differential expressed compared to matched normothermia samples. The clustering of genes was calculated by the “dist” and “hclust” function of R. The visualization of gene expression and clustering is performed by the “dendextend” package.
3D Data of proteins and small molecular drugs
All proteins were first searched on PubMed to see if there was protein clipping like cleaved caspase-3[23].
Then the 3D structures of proteins were firstly searched from Protein Data Bank (PDB), which is used for biological-related ligand-protein interaction. In this article, no protein structure is listed on the PDB website. All the protein structures were predicted by AlphaFold2 and RoseTTAFold.
AlphaFold2 is developed by Google and is the champion of the 14th Critical Assessment of Structure Prediction (CASP14). In August 2021, AlphaFold submitted a structure prediction database for all proteins. RoseTTAFold is based on the Rosetta software which is designed for macromolecular modeling, docking, and design[24] RoseTTAFold also has good application[25] in the research of protein structure prediction. Finally, protein structures with fewer irregular regions will be selected for the next step.
The 3D structures of 8,697 drugs (DrugBank, 5.1.8) were downloaded from DrugBank Online (https://go.drugbank.com/). Approved, experimental, nutraceutical, and investigational drugs by Food and Drug Administration (FDA) are included. We split each drug molecule into a PDBQT-format file and minimized the energy separately for docking with proteins.
Visualize evolutionary conservation and active site prediction
Visualize evolutionary conservation was performed by the ConSurf server[26]. In a typical ConSurf application, through BLASTed[27] against the UNIREF-90 database[28] and aligning using MAFFT[29], the evolutionarily conserved positions are analyzed by the Rate4Site algorithm.
Then, the Consensus approach-D (COACH-D) [30] was used to predict the active site of target proteins. The COACH-D use five different methods to predict the binding sites of protein ligands. Four of these methods are COFAC-TOR[31], FINDSITE[32], TM-SITE[33], and S-SITE[33]. These methods predict binding sites by matching the query structure and sequence with the ligand-binding template in BioLiP[34], which is a semi-manual functional database[35] based on the PDB.
Virtual screening of potential compounds
To evaluate the hit compounds obtained from DrugBank and calculate their interaction and binding posture in the active site of target proteins, the molecular docking method was carried out through QuickVina 2[36]. QuickVina 2 uses the calculation of shape and electrostatic potential similarity of binding pockets to select molecules, which may exhibit binding patterns like those of binding pockets. 3D files of target proteins were dehydrated, hydrogenated. Then proteins were saved as PDBQT files using AutoDock. AutoDock assisted in assigning Gasteiger charges and adding polar hydrogen atoms to both the proteins and the compounds.
Molecular dynamics simulation
The molecular dynamics (MD) simulation was performed by Gromacs[37]. Firstly, a protein-drug complex was prepared, including adding hydrogenation and balancing charge. Then, we add a solvent so that the target protein and drug small molecules are coated. The forcefield was Chemistry at HARvard Macromolecular Mechanics 36 (CHARMm 36). The simulation time is set as 50ns for the speed of calculation. The simulation temperature is 309.15K (36°C) and the pressure is 1 atm. Root mean square deviation (RMSD) and root mean square fluctuation (RMSF) were calculated based on the first frame.
Personalization-weight-PageRank
We use personalization-weight-PageRank to rank cross level data. PageRank is a comprehensive rank algorithm designed by Google and named after Larry Page.[38] It is one of the most famous ranking algorithms of network nodes based on Markov process. PageRank has been applied in medical domains with success.[39,40] Personalization and weight represent 3 different levels of score data. The weight of PageRank allows all nodes to be initially assigned different weights/probabilities.[41] In this article, the weights of rank were set to docking values of proteins and drugs.
The higher the docking value, the higher the connection rate of the complex. Personalization of PageRank reinforces the connection intensity between the nodes, which makes the result more personalized and realistic[42]. In this article, personalization is influenced by protein functions. If the protein performs a negative influence such as promoting apoptosis, the personalization will be calculated by 2^(fold change) to ensure they are more than 1. Meanwhile, if the protein plays a positive role in the group, the personalization will be set as 1/(fold change + 1) to less than 1. The personalization values of all the drugs are set to 0 to prevent iterations of the drugs themselves from going wrong.
The calculation process is like putting all proteins and all drugs in the solution, then simulating the connections between all proteins and drugs by calculation. The damping factor is set to 0.85 to simulate the metabolism of proteins and drugs. The whole calculation is based on Python 3.8.10. The relating python libraries include NetworkX, Pandas, and NumPy. We use Pandas and NumPy to import all the docking data into a matrix for PageRank calculating. The protein expression value is then imported by the PageRank personalization parameter of NetworkX. Lastly, we can get a comprehensive ranking of drugs.
Prediction and Rank of combined pharmacotherapy
In addition to the comprehensive ranking of drugs, we also try to generate the rank of drug combinations. Similarly, the calculation places all drugs of combination and target proteins in a solution to bind free.
First, all drugs will be grouped according to the docking results of drugs in each combination. In this article, to reduce the amount of calculation, we selected the TOP20 drugs of each protein to include in the drug combination pool. Then, all the combinations were performed personalization-weight-PageRank against all protein targets. The sum of each score of all drugs in the combination is the final score of the combination. Lastly, we get the rank of combinations.
To make the distribution of combinations more clear, we propose drug-protein-expression fit score (DPEFS) to show the data distribution pattern. The calculation is as follows: The PageRank values of all proteins were summed by multiplying logFC, then divided by the total PageRank values of drugs, and finally divided by the PageRank values of specific proteins for standardized calculation. It is used for standardized calculation for comparing different combinations.
DPEFS evaluates the combination by referring to the protein expression trend. The higher the DPEFS, the better the fitness. In actual drug design, DPEFS is relatively high and PageRank score is relatively low, indicating that drugs of combination are relatively moderate, which suggests a negative outcome. All code can be found in GitHub (https://github.com/FeiLiuEM/PageRank-weight-drug).
Result
Expression analysis and clustering of hypothermia
Figure 2 shows the expressions of different mRNA of different groups after hypothermia. From the inside to the outside, the rings were divided into hypothermia 0.5h group, hypothermia 1H group, hypothermia 2H group, hypothermia 4H group, hypothermia 8h group, and hypothermia 18h group.
As shown in Table 1, in each group, we selected the top 3 expression protein targets. In the Hypothermia 0.5h group, the target proteins are circadian-associated transcriptional repressor (CIART), Glutathione-specific gamma-glutamylcyclotransferase 1 (CHAC1), and Uridine diphosphate glucose pyrophosphatase nudix hydrolase 22 (NUDT22). The target proteins of the Hypothermia 1h group are CHAC1, corneodesmosin (CDSN), and Nuclear receptor subfamily 1 group D member 1 (NR1D1). The target proteins of the Hypothermia 2h group are cold-induced RNA-binding protein (CIRP), armadillo repeat-containing X-linked protein 5 (ARMCX5), and coiled-coil domain-containing protein 122 (CCDC122). The target proteins of the Hypothermia 4h group are CIRP, receptor activity-modifying protein 3 (RAMP3), and carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1). The target proteins of the Hypothermia 4h group are the same: CIRP, RAMP3, and NAD(P)H dehydrogenase [quinone] 1 (NQO1). Within the targets, CHAC1 could enhance apoptosis[43]. NUDT22 is an Mg2+-dependent UDP-glucose and UDP-galactose hydrolase[44], while high glucose shows a negative effect in HIE like stroke[45]. CCDC122 potentially pro-inflammatory[46]. CIRP can effectively reduce cell death in the early stage of hypothermia therapy. However, it has a strong pro-inflammatory effect outside the cell, leading to cell killing. There is no definitive research on the timing of this shift. Referring to the previous article[47], we conservatively believed that CIRP could be identified as a negative protein from the 8H group. CEACAM1[48] and NQO1[49] promote apoptosis. All the other targets are shown protective effects or don’t have enough data. The personalization values were calculated in Table 1. All the structures of target proteins in Figure 3 were obtained by the rules in the section of Materials and Methods.
Visualize evolutionary conservation and Structure-Function Relationship-Based Binding Site Prediction
The conservation analysis of all the target proteins was listed in Figure 4A-K. The redder the amino acid, the higher possibility the amino acid sequence with function. Then we identified its structure-function relationship by the COACH-D server. The results showed a familiar result of conservation analysis listed in Figure 4L-V. As shown in Table 2, the range around 3-5 Å of the active site was used for the setting of the receptor pocket of the target proteins that were used for virtual screening.
Virtual Screening of target proteins’ Antagonists
We utilized the virtual screening technique to identify potential antagonists exhibiting an adequate binding affinity. We started with a chemical database consisting of 8,697 drug molecules and isolated a set of compounds satisfying the threshold of a high docking score. The results of the best match complexes are shown in Figure 5 and all the results are listed in the Additional file Table 1.
MD Simulations and Binding Free Energy Analysis
We performed MD simulation of 11 complexes to measure the stability of the protein-ligand complex. RMSD (root-mean-square deviation) profiles of the protein are shown in Figure 6A, which indicates that all systems were relatively stable during the entire simulation run. Moreover, RMSF profiles of protein are measured to evaluate the moving of each amino acid. All proteins are available for further analysis (Figure 6B).
The RMSD of drug atoms was also conducted to predict the stability of the atoms in docked complexes (Figure 6C). Most compounds exhibited a consistently low RMSD, suggesting that these compounds formed stable complexes.
Drug rank of TH in different groups
We rank all drugs by PageRank. First, we PageRank all the drugs and get the results in table3. 2-drug-combinations are ranked in Table 4 and 3-drug-combinations in the additional file Table 2. For comprehensive rank, the results of PageRank were listed. For drug-combination ranks, the percentages of each drug’s value in the combination were calculated. And DPEFS was calculated for analyzing the distribution differences of drug combinations.
Discussion
In this paper, a new pharmacological method — bioinfo-pharmacology is proposed, using therapeutic hypothermia as an example. By bioinformatics analysis, protein structure prediction, and PageRank, we provide a direct bridge between symptom/treatment and drug design.
AlphaFold2 and RoseTTAFold were used for protein structure prediction. And the number of proteins selected by AlphaFold2 in this research was close to that of RoseTTAFold. During the process of protein structure prediction, we found that for some proteins, the structures predicted by RoseTTAFold have less irregular structure than that of AlphaFold2. This may be due to the 2D distance map level transformed and integrated by RoseTTAFold during neural network training[22], while AlphaFold2 only paired structure database and genetic database. We also find a phenomenon that the predicted protein structures were relatively unstable under molecular dynamics simulation than preview reports of other protein structures detected by X-ray.
The application of PageRank is suitable. First, the combination of drug molecules is a memoryless stochastic process, which meets the qualifications of the Markov process. Second, our method aims to simulate the binding process in vivo. The comprehensive analysis involves free docking of proteins with all drugs. Drug combination analysis is to put proteins and related drugs into the solution for docking.
Besides, the method has good compatibility for the wide compatibility of PageRank. In theory, all the technologies with numerical results can be ranked by the method. In this paper, for the lack of bioinformatics data of Therapeutic hypothermia, we only do a basic analysis. If there is more data of the TH, the analysis of Weighted Gene Coexpression Network Analysis (WGCNA)[50] or Gene Regulatory Networks (GRN)[51] will be better because they could provide more plausible results of protein list.
Meanwhile, pharmacophore models[52] can use bioinfo-pharmacology for highly efficient drug design. After ranking, top-ranked pharmacophore fingerprints or alignments could be linked together for good pharmacological effects. And ultimately, improve the therapeutic effect of drugs, reduce toxic and side effects, improve the success rate of clinical trials of new drugs, save drug research and development costs. For the same reason, this method can also enhance network pharmacology and chrono-pharmacology. Network pharmacology[6] focuses on the application of protein network structures to improve drug discovery. By PageRank, the association between protein network structures and different drugs can be more accurately understood through comprehensive drug analysis of multiple targets rather than the previous single target. Thus, it has a good promotion effect on traditional herbal medicine research. In traditional herbal medicine, there may be multiple drug molecules in a single herb, and its complex multi-target problem can be efficiently analyzed by new methods. Another influenced area is chrono-pharmacology. Chrono-pharmacology[53] is expert in the adaptation and anticipation mechanisms of the body concerning clock system regulation of various kinetic and dynamic pathways, including absorption, distribution, metabolism, and excretion of drugs and nutrients. By bioinfo-pharmacology, researchers can develop drugs for different time groups, which will bring precision medicine to this kind of diseases.
Based on these potential improvements and high compatibility, we propose the concept of bioinfo-pharmacology for its ability to directly apply bioinformatics for drug discovery. Bioinfo-pharmacology is a method that uses bioinformatics, protein structure prediction, and PageRank for drug design. The main feature is that multiple targets target multiple molecules/pharmacophores. Overall, this approach builds a bridge between disease/treatment and drug development, bringing up more possibilities for future drug development.
This research has some defects. 1: For the speed of calculating, we only choose the top 3 mRNAs and use the top 1 complex for MD simulation. Furthermore, the duration of molecular dynamics simulation is set to 50ns. These operations mitigate the rationality of the results relatively; 2. Theoretically, pharmacophore modeling has a better improvement under PageRank. But considering the purpose of the article, we use AutoDock to dock all the marketing drugs.
In summary, this paper proposes a new method of pharmacology—bioinfo-pharmacology by PageRank. The results provide medical clues for the treatment of TH. Besides, it can help the functional research of proteins at the molecular level for experimental biologists. In addition, we can do drug combination analysis of drugs similarly. The new approach could have a huge impact on precision medicine, drug design, and traditional herbal medicine in the future.
Data Availability
Acknowledge
We thank Zaizai Cao, Xiangjie Lin, and Yuanyuan Hao for the algorithm discussion.
Footnotes
↵# The first 1 author is the first author.
Declaration
Ethics approval and consent to participate. Not applicable.
Consent for publication All authors allow the publication of this article.
Availability of data and materials All the data that support the findings of this study are available for email from authors.
Competing interests The authors have declared that they have no conflicts or interests.
Funding The authors have stated that no such relationships exist.