Abstract
We apply a meta-clustering technique to discover age-gender unbiased COVID-19 patient subphenotypes based on phenotypical before admission, including pre-existing comorbidities, habits and demographic features, to study the potential early severity stratification capabilities of the discovered subgroups through characterizing their severity patterns including prognostic, ICU and morbimortality outcomes. We used the Mexican Government COVID-19 open data including 778,692 SARS-CoV-2 population-based patient-level data as of September 2020. The meta-clustering technique consists of a two-stage clustering approach combining dimensionality reduction and hierarchical clustering: 56 clusters from independent age-gender clustering analyses supported 11 clinically distinguishable meta-clusters (MCs). MCs 1-3 showed high recovery rates (90.27-95.22%), including healthy patients of all ages; children with comorbidities alongside priority in medical resources; and young obese smokers. MCs 4-5 showed moderate recovery rates (81.3-82.81%): patients with hypertension or diabetes of all ages; and obese patients with pneumonia, hypertension and diabetes. MCs 6-11 showed low recovery rates (53.96-66.94%): immunosuppressed patients with high comorbidity rate; CKD patients with poor survival length and recovery; elderly smokers with COPD; severe diabetic elderly with hypertension; and oldest obese smokers with COPD and mild cardiovascular disease. Group outcomes conformed to the recent literature on dedicated age-gender groups. These results can potentially help in the clinical patient understanding and their stratification towards automated early triage, prior to further tests and laboratory results are available, or help decide priority in vaccination or resource allocation among vulnerable subgroups or locations where additional tests are not available.
Code available at: https://github.com/bdslab-upv/covid19-metaclustering
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by Universitat Politècnica de València contract no. UPV-SUB.2-1302 and FONDO SUPERA COVID-19 by CRUE-Santander Bank grant: Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Using Open Data from the Government of Mexico, terms available at: https://datos.gob.mx/libreusomx
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵† Senior authors
(1) Revise the article. (3) Supplemental files updated.
Data Availability
The studied sample is available in our GitHub repository.
Abbreviations
- COVID-19
- coronavirus disease 2019
- SARS-CoV-2
- severe acute respiratory syndrome coronavirus 2
- ML
- Machine Learning
- PCA
- Principal Component Analysis
- MCA
- Multiple Correspondence Analysis
- LOESS
- locally estimated scatterplot smoothing
- COPD
- Chronic Obstructive Pulmonary Disease
- CKD
- Chronic Kidney Disease
- INMUSUPR
- Immunosuppression
- ICU
- Intensive Care Unit
- RR
- Recovery Rate
- MC
- Meta-Cluster
- TIC
- Types of Clinical Institution
- DIF
- National System for Integral Family Development
- IMSS
- Mexican Institute of Social Security
- ISSSTE
- Institute for Social Security and Services for State Workers
- PEMEX
- Mexican Petroleum Institution
- SEDENA
- Secretariat of the National Defense
- SEMAR
- Secretariat of the Navy
- SSA
- Secretariat of Health