PT - JOURNAL ARTICLE AU - Gonçalves, Carlos Pedro AU - Rouco, José TI - Comparing Decision Tree-Based Ensemble Machine Learning Models for COVID-19 Death Probability Profiling AID - 10.1101/2020.12.06.20244756 DP - 2020 Jan 01 TA - medRxiv PG - 2020.12.06.20244756 4099 - http://medrxiv.org/content/early/2020/12/08/2020.12.06.20244756.short 4100 - http://medrxiv.org/content/early/2020/12/08/2020.12.06.20244756.full AB - We compare the performance of major decision tree-based ensemble machine learning models on the task of COVID-19 death probability prediction, conditional on three risk factors: age group, sex and underlying comorbidity or disease, using the US Centers for Disease Control and Prevention (CDC)’s COVID-19 case surveillance dataset. To evaluate the impact of the three risk factors on COVID-19 death probability, we extract and analyze the conditional probability profile produced by the best performer. The results show the presence of an exponential rise in death probability from COVID-19 with the age group, with males exhibiting a higher exponential growth rate than females, an effect that is stronger when an underlying comorbidity or disease is present, which also acts as an accelerator of COVID-19 death probability rise for both male and female subjects. The results are discussed in connection to healthcare and epidemiological concerns and in the degree to which they reinforce findings coming from other studies on COVID-19.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo funding or any kind of payment were received in the production of the current work.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:All ethical guidelines were followed. The only data used for running and testing the machine learning algorithms was the Centers for Disease Control and Prevention (CDC)' COVID-19 case surveillance data which is available for public use at https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf. All patient data in the CDC's database is deidentified patient. No additional clinical trials, requiring approval, were performed.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data used for running and testing the machine learning algorithms was CDC's COVID-19 case surveillance data made publicly available at https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf