Abstract
BACKGROUND Diabetes and hypertension are among top public health priorities, particularly in low and middle-income countries where their health and socioeconomic impact is exacerbated by the quality and accessibility of health care. Moreover, their connection with severe or deadly COVID-19 illness has further increased their societal relevance. Tools for early detection of these chronic diseases enable interventions to prevent high-impact complications, such as loss of sight and kidney failure. Similarly, prognostic tools for COVID-19 help stratify the population to prioritize protection and vaccination of high-risk groups, optimize medical resources and tests, and raise public awareness.
METHODS We developed and validated state-of-the-art risk models for the presence of undiagnosed diabetes, hypertension, visual complications associated with diabetes and hypertension, and the risk of severe COVID-19 illness (if infected). The models were estimated using modern methods from the field of statistical learning (e.g., gradient boosting trees), and were trained on publicly available data containing health and socioeconomic information representative of the Mexican population. Lastly, we assembled a short integrated questionnaire and deployed a free online tool for massifying access to risk assessment.
RESULTS Our results show substantial improvements in accuracy and algorithmic equity (balance of accuracy across population subgroups), compared to established benchmarks. In particular, the models: i) reached state-of-the-art sensitivity and specificity rates of 90% and 56% (0.83 AUC) for diabetes, 80% and 64% (0.79 AUC) for hypertension, 90% and 56% (0.84 AUC) for visual diminution as a complication, and 90% and 60% (0.84 AUC) for development of severe COVID disease; and ii) achieved substantially higher equity in sensitivity across gender, indigenous/non-indigenous, and regional populations. In addition, the most relevant features used by the models were in line with risk factors commonly identified by previous studies. Finally, the online platform was deployed and made accessible to the public on a massive scale.
CONCLUSIONS The use of large databases representative of the Mexican population, coupled with modern statistical learning methods, allowed the development of risk models with state-of-the-art accuracy and equity for two of the most relevant chronic diseases, their eye complications, and COVID-19 severity. These tools can have a meaningful impact on democratizing early detection, enabling large-scale preventive strategies in low-resource health systems, increasing public awareness, and ultimately raising social well-being.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research project was supported by the EmpatIA grant ( https://empatia.la/)
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This project is exempted from IRB requirements as it is limited exclusively to the analysis of publicly available datasets.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data used in the present work is publicly available, and the corresponding references to data sources are included in the manuscript