RT Journal Article SR Electronic T1 Comparing Decision Tree-Based Ensemble Machine Learning Models for COVID-19 Death Probability Profiling JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.12.06.20244756 DO 10.1101/2020.12.06.20244756 A1 Gonçalves, Carlos Pedro A1 Rouco, José YR 2020 UL http://medrxiv.org/content/early/2020/12/08/2020.12.06.20244756.abstract AB We compare the performance of major decision tree-based ensemble machine learning models on the task of COVID-19 death probability prediction, conditional on three risk factors: age group, sex and underlying comorbidity or disease, using the US Centers for Disease Control and Prevention (CDC)’s COVID-19 case surveillance dataset. To evaluate the impact of the three risk factors on COVID-19 death probability, we extract and analyze the conditional probability profile produced by the best performer. The results show the presence of an exponential rise in death probability from COVID-19 with the age group, with males exhibiting a higher exponential growth rate than females, an effect that is stronger when an underlying comorbidity or disease is present, which also acts as an accelerator of COVID-19 death probability rise for both male and female subjects. The results are discussed in connection to healthcare and epidemiological concerns and in the degree to which they reinforce findings coming from other studies on COVID-19.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo funding or any kind of payment were received in the production of the current work.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:All ethical guidelines were followed. The only data used for running and testing the machine learning algorithms was the Centers for Disease Control and Prevention (CDC)' COVID-19 case surveillance data which is available for public use at https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf. All patient data in the CDC's database is deidentified patient. No additional clinical trials, requiring approval, were performed.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data used for running and testing the machine learning algorithms was CDC's COVID-19 case surveillance data made publicly available at https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf