Abstract
This study aimed to develop and validate a cardiovascular diseases (CVD) risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (P-CARDIAC), for recurrent cardiovascular events using Machine-Learning technique.
Three cohorts of Chinese patients with established CVD in Hong Kong were included; Hong Kong Island cohort as the derivation cohort, whilst the Kowloon and New Territories cohorts were validation cohorts. The 10-year CVD outcome was a composite of diagnostic or procedure codes for coronary heart disease, ischaemic or haemorrhagic stroke, peripheral artery disease, and revascularization. We estimated incidence of recurrent CVD events for each cohort with reference to the total person-years of each cohort. Multivariate imputation with chained equations (MICE) and XGBoost were applied for the model development. The comparison with TRS-2°P and SMART2 used the validation cohorts with 1000 bootstrap replicates.
A total 48,799, 119,672 and 140,533 patients were included in the derivation and validation cohorts, respectively. A list of 125 risk variables were used to make predictions on CVD risk, of which, eight classes of medications were considered interactive drug use. Model performance in the derivation cohort showed satisfying discrimination and calibration with a C-statistic of 0·69. Internal validation showed good discrimination and calibration performance with C-statistic over 0·6. P-CARDIAC also showed better performance than TRS-2°P and SMART2.
Compared to other risk scores, P-CARDIAC enables to identify unique patterns of Chinese patients with established CVD. We anticipate that P-CARDIAC can be applied in various settings to prevent recurrent CVD events, thus reducing the related healthcare burden.
Condensed Abstract A CVD risk prediction model named Personalized CARdiovascular DIsease risk Assessment for Chinese (P-CARDIAC), for recurrent cardiovascular events among Chinese adults using Machine-Learning technique was newly developed. It predicted 10-year CVD outcome including a composite of diagnostic or procedure codes for coronary heart disease, ischaemic or haemorrhagic stroke, peripheral artery disease, and revascularization by incidence of recurrent CVD. Model showed satisfying discrimination and calibration with a C-statistic of 0·69. P-CARDIAC also showed better performance than existing risk scores, such as TRS-2°P and SMART2. P-CARDIAC could help predict recurrent CVD risk and reduce the healthcare burden.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This project is funded by Hong Kong Innovation and Technology Bureau (ref no: PRP/070/19FX) and Amgen Hong Kong Limited.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethical approval for this study was granted by the Institutional Review Board of the University of Hong Kong/HA Hong Kong West Cluster (UW20-073).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Abbreviations list
- CVD
- Cardiovascular Disease
- P-CARDIAC
- Personalized CARdiovascular DIsease risk Assessment for Chinese
- TRS-2°P
- Thrombolysis in Myocardial Infarction (TIMI) Risk Score for Secondary Prevention
- SMART2
- Secondary Manifestations of ARTerial disease
- ML
- Machine-Learning
- EHR
- Electronic Health Records
- HA
- Hospital Authority
- ICD-9-CM
- Ninth Revision, Clinical Modification
- BNF
- British National Formulary
- MICE
- Multivariate imputation with chained equations
- CPH
- Cox proportional hazards model
- LASSO
- Least Absolute Shrinkage and Selection Operator
- CHD
- Coronary Heart Disease
- PAD
- Peripheral Arterial Disease
- MI
- Myocardial Infarction