PT - JOURNAL ARTICLE AU - Mathur, Piyush AU - Sethi, Tavpritesh AU - Mathur, Anya AU - Maheshwari, Kamal AU - Cywinski, Jacek B AU - Khanna, Ashish K AU - Dua, Simran AU - Papay, Frank TI - Explainable machine learning models to understand determinants of COVID-19 mortality in the United States AID - 10.1101/2020.05.23.20110189 DP - 2020 Jan 01 TA - medRxiv PG - 2020.05.23.20110189 4099 - http://medrxiv.org/content/early/2020/05/26/2020.05.23.20110189.short 4100 - http://medrxiv.org/content/early/2020/05/26/2020.05.23.20110189.full AB - COVID-19 mortality is now the leading cause of death per day in the United States,ranking higher than heart disease and cancer.Multiple projection models have been built and used to understand the prevalence of disease and anticipated mortality.These models take into account various epidemiologic factors of disease spread and more recently some of the mitigation measures.The authors developed a dataset with many of the socioeconomic, demographic, travel, and health care features likely to impact COVID-19 mortality.The dataset was compiled using 20 variables for each of the fifty states in the United States.We subsequently developed two independent machine learning models using Catboost regression and random forest.Both the models showed similar level of accuracy.CatBoost regression model obtained R2 score of 0.99 on the training data set and 0.50 on the test.Random forest model similarly obtained a R2 score of 0.88 on the training data set and 0.39 on the test set. To understand the relative importance of features on COVID-19 mortality in the United States,we subsequently used SHAP feature importance and Boruta algorithm.Both the models show that high population density, pre-existing need for medical care and foreign travel may increase transmission and thus COVID-19 mortality whereas the effect of geographic, climate and racial disparities on COVID-19 related mortality is not clear.Location based understanding of key determinants of COVID-19 mortality, is needed for focused targeting of mitigation and control measures.Explanatory models such as these are also critical to resource management and policy framework.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo external funding was received.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB approval not required.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData sources listed in supplemental content