PT - JOURNAL ARTICLE AU - Fernandes, Fernando Timoteo AU - de Oliveira, Tiago Almeida AU - Teixeira, Cristiane Esteves AU - de Moraes Batista, Andre Filipe AU - Costa, Gabriel Dalla AU - Filho, Alexandre Dias Porto Chiavegatto TI - A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil AID - 10.1101/2020.08.26.20182584 DP - 2020 Jan 01 TA - medRxiv PG - 2020.08.26.20182584 4099 - http://medrxiv.org/content/early/2020/09/01/2020.08.26.20182584.short 4100 - http://medrxiv.org/content/early/2020/09/01/2020.08.26.20182584.full AB - Introduction The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms.Methods A total of 1,040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from São Paulo, Brazil, were followed from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. Routinely-collected laboratory, clinical and demographic data was used to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). A random sample of 70% of patients was used to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other.Results All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale.Conclusion The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by National Council for Scientific and Technological Development (CNPq) under Grant Number 402626/2020-6 and Paraiba Research Foundation FAPESQPB with Grant Number 206/2020Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study was approved by the Institutional Review Board (IRB) of BP - A Beneficencia Portuguesa de Sao Paulo (CAAE:31177220.4.3001.5421)All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data comes from medical records from BP - A Beneficencia Portuguesa de Sao Paulo Hospital in Brazil and it is not publicly available as it contains sensitive information of patients.