RT Journal Article SR Electronic T1 Uncovering clinical risk factors and prediction of severe COVID-19: A machine learning approach based on UK Biobank data JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.09.18.20197319 DO 10.1101/2020.09.18.20197319 A1 Wong, Kenneth C.Y. A1 SO, Hon-Cheong YR 2020 UL http://medrxiv.org/content/early/2020/09/22/2020.09.18.20197319.abstract AB Background COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with severe disease. Accurate prediction of those at risk of developing severe infections is also important clinically.Methods Based on the UK Biobank (UKBB data), we built machine learning(ML) models to predict the risk of developing severe or fatal infections, and to evaluate the major risk factors involved. We first restricted the analysis to infected subjects, then performed analysis at a population level, considering those with no known infections as controls. Hospitalization was used as a proxy for severity. Totally 93 clinical variables (collected prior to the COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements (e.g. hematological/liver and renal function/metabolic parameters etc.), anthropometric measures and other risk factors (e.g. smoking/drinking habits) were included as predictors. XGboost (gradient boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationship between risk factors and outcomes.Results A total of 1191 severe and 358 fatal cases were identified. For the analysis among infected individuals (N=1747), our prediction model achieved AUCs of 0.668 and 0.712 for severe and fatal infections respectively. Since only pre-diagnostic clinical data were available, the main objective of this analysis was to identify baseline risk factors. The top five contributing factors for severity were age, waist-hip ratio(WHR), HbA1c, number of drugs taken(cnt_tx) and gamma-glutamyl transferase levels. For prediction of mortality, the top features were age, systolic blood pressure, waist circumference (WC), urea and WHR.In subsequent analyses involving the whole UKBB population (N for controls=489987), the corresponding AUCs for severity and fatality were 0.669 and 0.749. The same top five risk factors were identified for both outcomes, namely age, cnt_tx, WC, WHR and cystatin C. We also uncovered other features of potential relevance, including testosterone, IGF-1 levels, red cell distribution width (RDW) and lymphocyte percentage.Conclusions We identified a number of baseline clinical risk factors for severe/fatal infection by an ML approach. For example, age, central obesity, impaired renal function, multi-comorbidities and cardiometabolic abnormalities may predispose to poorer outcomes. The presented prediction models may be useful at a population level to help identify those susceptible to developing severe/fatal infections, hence facilitating targeted prevention strategies. Further replications in independent cohorts are required to verify our findings.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported partially by the Lo Kwee Seong Biomedical Research Fund from The Chinese University of Hong Kong.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The UK Biobank study has received ethical approval from the NHS National Research Ethics Service North West (16/NW/0274).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe UK Biobank data is available to registered researchers.