ABSTRACT
Background Drug repositioning is a key component of COVID-19 pandemic response, through identification of existing drugs that can effectively disrupt COVID-19 disease processes, contributing valuable insights into disease pathways. Traditional non in silico drug repositioning approaches take substantial time and cost to discover effect and, crucially, to validate repositioned effects.
Methods Using a novel in-silico quasi-quantum molecular simulation platform that analyzes energies and electron densities of both target proteins and candidate interruption compounds on High Performance Computing (HPC), we identified a list of FDA-approved compounds with potential to interrupt specific SARS-CoV-2 proteins. Subsequently we used 1.5M patient records from the National COVID Cohort Collaborative to create matched cohorts to refine our in-silico hits to those candidates that show statistically significant clinical effect.
Results We identified four drugs, Metformin, Triamcinolone, Amoxicillin and Hydrochlorothiazide, that were associated with reduced mortality by 27%, 26%, 26%, and 23%, respectively, in COVID-19 patients.
Conclusions Together, these findings provide support to our hypothesis that in-silico simulation of active compounds against SARS-CoV-2 proteins followed by statistical analysis of electronic health data results in effective therapeutics identification.
INTRODUCTION
There have been 529,301 US deaths as of March 12, 2021 due to COVID-19.I1 The Food and Drug Administration (FDA) has so far approved three COVID-19 vaccines.I2 However, a substantial time lag is expected between the start of vaccinations and effective herd immunity.3,4 Furthermore, vaccine hesitancy is high in the US with 51% to 72% of the population intending to be vaccinated.5,6,7 Additionally, a global race for vaccine acquisition continues.8 As 70% of the population must become immune to interrupt this pandemic9, COVID-19-related deaths will continue in the coming months.4 Therefore, drug repurposing is urgently needed to reduce COVID-19 mortality 5 while providing insight into disease pathways.10
In this study, we tested the hypothesis that in-silico quasi-quantum simulation of FDA-approved compounds against SARS-CoV-2 proteins followed by statistical analysis of 1.5M electronic health data can efficiently identify effective drug repositioning candidates.
METHODS
In-silico Quasi Quantum Simulation
We used ARIScience’s previously developed quasi-quantum (QQ) molecular simulation platform to disassemble and analyze the energy distribution of 11 SARS-CoV-2 proteins (Table 1) against 1,513 known FDA-approved active ingredients. This proprietary method uses electron density approximations, high probability conformations determinations, and multi-dimensional energy searches to determine intermolecular affinities. Using Java framework for highly parallel processing within a supercomputing node, and SLURM to spread load across nodes these proteins were simulated at neutral pH. Top candidates for each targeted protein were loaded into Jupyter11 for consolidated candidate interaction energy ranking. The resulting top candidates were chosen for pharmacological prevalence assessment. Next, top candidates were selected for statistically significant clinical effect validation using the National COVID Cohort Collaborative (N3C) repository.12 The goal of in-silico simulation was to identify small molecule drugs with strong affinity to SARS-CoV-2 proteins and potential to interrupt or delay viral activity. Control test for the QQ simulation framework used the human nicotine receptor (alpha 4 beta 2) with nicotine as positive control ligand and albuterol as negative control ligand. The resultant interaction energy from the QQ simulation for nicotine (−0.0026) was substantially lower than that of albuterol (energy: −0.0012) in line with the expectation that nicotine interacts substantially with the nicotine receptor, but albuterol does not. The energy units are in 0.0188 Hartrees.
Use of De-identified Patient Records
The N3C securely harmonizes Electronic Health Record (EHR) data from 36 medical centers dating from 01-01-2018. The data were securely transferred to a data enclave and harmonized into a single common model.13 As of 12-07-2020, N3C contained 26M total patients, 372k COVID+ patients, 1.1B lab results, 401M drug exposures, and 179M procedures. We used date-shifted data; all dates except for age were shifted +/-180 days. Drug exposure timing was calculated from the Earliest RNA-based SARS-CoV-2 Diagnosis (ERSD) for each patient. N3C data were crucial in assessing clinical significance of in-silico findings using actual clinical EHR data. The reason is interaction of a compound to a protein may result in one of (a) no-effect (b) increase or (c) decrease in protein activity. Item (c) is the desired effect.14,15
Death Endpoint Definition
We compiled OHDSI death concepts to define our endpoint.16 Deaths within 12 weeks after ERSD (excluding deaths via accidents, falls and burns) were classified as COVID-19 associated.
Statistical Validation of Effect against Death Endpoint
We performed cohort matching to account for the potential confounder bias in the compared cohorts.17,18 The 18 pre-COVID-19 diagnosis predictors we used to construct a propensity score model for cohort matching were: gender, age, race, geographical region, Charlson Comorbidity Index (CCI) categories (0, 1, 2-3, 4-5, 6+), prior medical disposition (smoker, diabetic, chronic respiratory disorder, hypertensive), prior access to medical care (through prior monthly outpatient, inpatient and ER visits, medication, procedure rates), BMI, data provider and COVID-19-related dexamethasone use. We used Bayesian logistic model19 to have numerically stable estimates. The propensity scores were used for nearest neighbor matching with replacement (Figure 1).17 To assess the treatment effect on the treated for the treatment of interest, we fit Bayesian logistic regression model with weighting to account for repeated sampling of control patients due to matching with replacement at 95% credible interval. We developed Java/Python/R routines to prepare and analyze N3C data respectively. With a sample size of 2,318 patients in each matched cohort, we can detect a difference of 1% increase in death between the two groups with 80% power.
Epidemiological Exclusions and Missing Data Handling
To minimize misclassification due to missing data, data-providers were excluded from the analysis if their data was deemed incomplete for key patient information.20 Of the 36 data-providers, four were excluded after assessing (a) usage frequency of common medications (Azithromycin, Metformin, Montelukast) (b) medicated patient percentage. This excluded 161,682 patients. We excluded two data-providers whose data quality was not assessed by us affecting 290,578 patients. We excluded six data-providers due to missing death data, excluding 632,614 patients. The final cohort included 1.52M patients.
RESULTS
Drug repositioning candidates by in-silico quasi-quantum simulation
The eleven SARS-CoV-2 proteins we chose for computational analyses are: nsp1, nsp9, nsp15, S, N, E, ORF3a, ORF7a, ORF8, and ORF9b (Table 1). While nsp1, nsp9, and nsp15 are essential components of viral replication, S, N, and E are the structural proteins needed for production of mature virions. ORF3a, ORF7a, ORF8, and ORF9b are virulence factors that enable the virus to create a favorable replication environment.21,22,23,24 After a pharmacological prevalence assessment of the top in-silico candidates and their affinity energies, 18 candidate compounds (Table 2) were selected for statistical validation using 1.5M patients.
Clinical effect validation using 1.5M patients’ records
The primary measured endpoint (EP) was death within 84 days of ERSD among 30-to-85 years old (yo) patients. This age range was chosen based upon mortality frequency by age of COVID+ patients in literature (Figure 1).25 Candidate drugs identified using our in-silico simulation were used to create multi-predictor-based matched cohorts to measure COVID-19 mortality statistical significance. The patients were stratified into three sets with matched cohorts created for each set for each assessed drug. These were: (a) all 30-85yo patients regardless of CCI26, (b) all 30-85yo diabetic patients with CCI<=3, and (c) all 30-85yo non-diabetic patients with CCI<=3. The results for each group are as follows:
All 30-85yo patients
Metformin, Triamcinolone, Amoxicillin, and Hydrochlorothiazide showed a reduction in mortality odds by 27%, 26%, 26%, and 23%, respectively (Table 3). Exposure to these drugs was based on whether they are generally taken chronically (Metformin, Hydrochlorothiazide, non-topical Triamcinolone) or acutely (Amoxicillin). Chronic and acute exposure to a drug considered (a) exposure 365 days prior to ERSD to 2 weeks after ERSD and (b) exposure 4 weeks prior to ERSD to 2 weeks after ERSD, respectively.
Our in-silico simulations showed Metformin’s affinity to N and ORF7a proteins with energies of − 0.0072 and −0.00309 respectively. Triamcinolone showed affinity to NSP1 (Figure 2) and S protein with energies of −0.001145 and −0.00162, respectively. Amoxicillin showed affinity to viral proteins NSP1 and N with energies of −.0034 and −.0024, respectively. Hydrochlorothiazide showed affinity to Spike and NSP1 proteins with energies of −0.0057 and −0.0053 (Table 2). The energy units are in 0.0188 Hartrees.
Diabetic (30-85yo, CCI<=3) and separately non-diabetic patients (30-85yo, CCI <= 3) Results for ‘only diabetic patients’ showed Hydrochlorothiazide and Metformin had statistically significant reduction in mortality odds by 49% and 34%, respectively. Results for ‘only non-diabetic patients’ showed Hydrochlorothiazide and Metformin had statistically significant reduction in mortality odds by 30% each (Table 3).
DISCUSSION
We identified four FDA-approved drugs as COVID-19 repositioning candidates using our novel in-silico quasi-quantum simulation methods followed by statistical analysis of 1.5M patients’ EHR data. We found that Metformin, Triamcinolone, Amoxicillin and Hydrochlorothiazide were associated with 27%, 26%, 26%, and 23% reduced mortality odds, respectively.
We highlight that among 1,513 drugs used in our in-silico simulations against specified 11 SARS-CoV-2 proteins, Metformin had the strongest in-silico signal (interaction energy − 0.007279) and also the highest reduction (by 27%) in COVID-19 mortality odds. It was followed by Hydrochlorothiazide’s signal (interaction energy of −0.005759) and 23% reduction in mortality odds.
The identified drugs Metformin (anti-diabetic), Triamcinolone (anti-inflammatory), Amoxicillin (anti-bacterial) and Hydrochlorothiazide (anti-hypertensive) are not members of a single class of drugs. Rather, each of these drugs have unique known clinical effect mechanisms, which may have played crucial roles in addition to their in-silico predicted SARS-CoV-2 protein interaction effect - both contributing to improved COVID-19 outcomes. The precise mechanisms by which these drugs exert their positive effects in COVID-19 will require further investigation and are beyond the scope of this study. For example, Metformin (a) inhibits gluconeogenesis thus reducing blood sugar27 and (b) helps activate pro-survival kinase AMPK which via mitochondria involved metabolic pathways results in cardiovascular health and lifespan improvement.28 Triamcinolone, a synthetic glucocorticoid, is used to treat autoimmune diseases, asthma, rheumatoid and arthritic conditions.29,30,31 The beta-lactam antibiotic, Amoxicillin, inhibits transpeptidation required for bacterial cell membrane synthesis.32 The loop diuretic, Hydrochlorothiazide, inhibits distal convoluted tubule renal sodium chloride transporter resulting in the loss of sodium and potassium, and reduction in blood pressure.33 Other researchers have also hypothesized Metformin’s positive effect on COVID-19.34
The identified drug candidates are usually well tolerated but caution must be followed for comorbid patients. Metformin, which has a 90% clearance via kidney tubular mechanism35, is contraindicated in patients with decreased creatinine clearance (CrCl) (< 45 ml/min) due to lactic acidosis risk.36 Amoxicillin is contraindicated for those with severe penicillin allergies. A CrCl based dose adjustment is necessary for ideal candidates with kidney disease. High risk patients with radiographically proven covid-19 pneumonia are selectively treated with antibiotics for possible superimposed bacterial infections with macrolides, cephalosporins or fluoroquinolones. A dose adjusted switch to amoxicillin for lower respiratory tract infection to treat community acquired pneumonia is feasible. Hydrochlorothiazide (HCTZ) is a diuretic used as a first line antihypertensive for essential hypertension patients. HCTZ can cause rare organ threatening complications like pancreatitis. Loop diuretics like Furosemide are being used in critically ill patients to maintain a negative fluid balance. Although not as potent as loop diuretics, adding HCTZ in select patients based on CrCl may prove to be beneficial to achieve diuresis37 and antiviral effect.
Triamcinolone can be administered systemically, orally, or by nebulization for direct pulmonary delivery. Thus it may act through both (a) interaction with SARS-CoV-2 proteins, and (b) pulmonary anti-inflammatory effects by stabilizing mast cells, a major cytokine storm source in COVID-19.31
Whether the doses and administration routes of these four drugs affect the primary outcome require further studies. Understanding the mechanisms by which these four drugs improve COVID-19 outcomes but not other drugs with similar functions (e.g. Fluticasone, or Clindamycin) will require further studies.
Limitations
The N3C dataset did not track whether a patient was involved in a COVID-19 vaccination trial which, while unlikely, may skew results as vaccinated individuals are less likely to die from COVID-19. Our statistical procedure’s uncertainty intervals did not take into account the selection procedure for the propensity model nor for the implicit multiple comparison post estimation. Given multiple treatments of interest, and varying sample sizes for each treatment, accounting for these factors is nontrivial and we are not aware of any currently available method to accurately account for them. This could potentially lead to optimistic uncertainty estimates potentially inflating the type I error. Finally, a patient’s diabetic disposition was solely based on clinical diagnosis and did not take HbA1c levels into consideration to compare between controlled versus uncontrolled diabetes.
Future work
Depending on funding we may look at the effect of these compounds on hypertensive and hospitalized patient subsets in addition to in vitro and in vivo antiviral assays. The novel simulation platform and the methodology to assess clinical effects we used have implications much beyond SARS-CoV-2.
Data Availability
National Institute of Health (NIH) National COVID Cohort Collaborative (N3C) has clear procedures for researchers to gain access to the de-identified patient data (1000+ researchers already have access to the data)
CONFLICTS OF INTEREST
Joy Alamgir is founder of ARIScience. Melissa Haendel is a co-founder of Pryzm Health.
Supplemental Appendix is available at: https://www.ariscience.org/p1_sc2_paper_01.html
ACKNOWLEDGEMENTS
We thank Massachusetts Green High Performance Computing Center, University of Maine System Advanced Computing Group, University of North Dakota Computational Research Center, Intel Corporation and ARI Internal HPC group for computing power contribution. The statistical analyses we performed were conducted with data or tools accessed through the NCATS N3C Data Enclave and supported by NCATS U24 TR002306. This research was possible because of the patients whose information is included within the data and the organizations and scientists who have contributed to the on-going development of this community resource.12 This work also used resources via the COVID-19 HPC Consortium, a private-public effort to bring together government, industry, and academia who are volunteering free COVID-19 research compute-time.39 ARIScience funded this project.
We also acknowledge reference data compilation assistance by Randa Bdair, Jeff Butler, Ievgen Duboriz and Haven López.
Footnotes
ruhul_abid{at}brown.edu