Scalable cardiovascular risk assessment using artificial intelligence-enabled event adjudication and widely available hematologic predictors ============================================================================================================================================ * James G. Truslow * Shinichi Goto * Max Homilius * Christopher Mow * John M. Higgins * Calum A. MacRae * Rahul C. Deo ## Abstract **Introduction** Researchers routinely evaluate novel biomarkers for incorporation into clinical risk models. Although of potential benefit, such emerging markers, which are often costly or not yet commercially available, are unlikely to enable the scalable risk assessment needed for population health strategies. In contrast, the ideal inputs for population approaches would be those already widely available for most patients. We hypothesized that simple hematologic markers, available in an outpatient complete blood count without differential, would be useful to develop risk models for cardiovascular events. **Methods** Using routine laboratory measurements as predictors and neural network-based automated event adjudication of 1,072,348 discharge summaries, we developed and validated models for prediction of heart attack, ischemic stroke, heart failure hospitalization, revascularization, and all-cause mortality. **Results** Models with hematology indices alone showed Harrell’s concordance index ranging from 0.60–0.80 on an external validation set. Hematology indices added significantly in terms of discrimination and calibration performance compared to models using only demographic data and diagnostic codes for coronary artery disease, heart failure, and ischemic stroke, with the concordance index of resulting models in the range 0.75–0.85 on an external validation set. Predictive features varied by outcome, and included red blood cell, leukocyte, and platelet indices. **Conclusion** We conclude that low-cost ubiquitous inputs, if biologically informative, can provide population-level readouts of risk. ## Introduction Two approaches guide the development of clinical risk models1. One strategy focuses on evaluating innovative markers, typically biomolecular, in a research cohort where banked historical samples and adjudicated outcomes are available. Recent biomarker candidates include genetic variants2–4, protein biomarkers5, somatic mutations6 and serum metabolites7. The motivation for such an approach is to improve model performance, while proposing something new about disease mechanisms. One challenge for such a strategy is the lack of availability of such markers in other cohorts, making replication a challenge. The problem of predictor availability extends to model deployment, especially when considering the use of such models to guide population health initiatives. Moreover, given that contributors to disease may evolve over time, insights from a historical cohort may not be readily transportable to new settings8. An alternative approach involves training models using existing low-cost data readily available on a high percentage of patients within a healthcare system. Such an approach is pragmatic and exploits model training as the first step in a systematic program to enable risk assessment for the maximal number of individuals. The output of the models could then be used to guide population health initiatives, hospital- or payer-level financial planning, or clinical trial enrollment9. The use of widely available inputs also facilitates iteratively updating models to match changing environments. In this work, we focus on the latter approach, but use an unconventional choice of predictors: hematologic indices available in a complete blood count without differential (CBC). Moreover, in the spirit of enabling ease of re-training models in new settings, we use a machine-learning strategy for adjudication of acute coronary syndrome, coronary revascularization, heart failure hospitalization, and ischemic stroke, using discharge summaries from hospitalizations10. We describe discrimination and calibration performance of the models across two institutions. ## Methods ### Study design and source data All patients in the model-derivation cohort had a non-urgent outpatient CBC recorded on a Sysmex XE-5000 Automated Hematology System at the Massachusetts General Hospital (MGH) during March 2016 - May 2017. Among 494,654 result-log files stored by the XE-5000s during this period, there were 469,543 unique lab-order identifiers. Where identifiers were associated with more than one file, their timestamps varied by less than 3 hours in 99% of cases. We thus interpreted such multiple readouts as mapping to a single blood sample, and we discarded all but the most recent file. We discarded another 26 files which had internally inconsistent data. Among the 469,517 files we retained for later processing, there were no missing data. For each patient, we also accessed all diagnostic-code data from the Massachusetts General Brigham (MGB) Electronic Data Warehouse (EDW), including admit diagnoses, encounter diagnoses, medical history, and problem lists, as well as all discharge summaries and progress notes. Encounters, diagnostic codes, and notes could be from any of the institutions within MGB network of hospitals. We excluded all tests that occurred within 7 days prior to or 30 days following an emergency department visit or hospital admission, as defined by the date of a discharge summary note, so that the hematologic indices would more closely reflect the patient’s baseline values. From the remaining CBCs for each patient, only the earliest one was included, and this CBC defined the patient’s time of entry into the survival study (see below). We excluded patients younger than 30 years at the time of entry into study. Follow-up time was defined from the date of the CBC to the last encounter or note within the MGB-EDW under the following categories: encounters coded as “office visit”, “system generated”, or “hospital encounter”; notes coded as “progress note”, “telephone encounter”, “assessment & plan”, “discharge summary”, “ED provider note”, “consult”, “MR AVS snapshot”, “plan of care”, “patient instruction”, “H&P”. Because of the dramatic change in hospital population caused by COVID-19 in the winter of 2020, we did not consider follow-up or events after 2019 Dec. 31. When a patient’s follow-up extended past this date or if a study outcome occurred after this date, we right censored the patient at 2020 Jan. 1. Patients with no follow-up in the MGB-EDW system were excluded. The validation cohort consisted of all individuals who had a 10-parameter complete blood count (described below) collected at Brigham Women’s Hospital (BWH) between June 2015 and December 2016, and whose results were available within the MGB-EDW. As with the training data, we excluded laboratory values occurring close in time to emergency department or hospital discharges. ### Artificial intelligence-enabled adjudication of cardiovascular outcomes We recently trained neural network-based models to classify discharge summaries as associated with one the following four cardiovascular events: acute coronary syndromes (ACS), ischemic stroke (IS), heart-failure hospitalization (HF), and percutaneous coronary intervention/coronary artery bypass surgery (PCI/CABG)10. Briefly, after manually labeling 1,372 training notes + 592 validation notes + 1,003 testing notes across 17 institutions, we trained and validated models for event adjudication, with AUROC on the 1,003 test notes as follows: ACS, 0.967; HF, 0.965; IS, 0.980; PCI/CABG, 0.998. The fraction of all discharges in our test set which received labels for CVD events was as follows: ACS, 0.054; HF, 0.122; IS, 0.059; PCI/CABG, 0.054. The four types of events were not mutually exclusive, and a discharge summary could receive as many as four positive labels. For the current work, we selected classification thresholds to maximize *F**1* score (**Supplementary Table 1A**). The four event models were deployed on the 1,072,348 discharge summaries for the derivation and validation cohorts and a binary event status was determined by each model for each note. The four AI-adjudicated clinical events, plus all-cause mortality, constituted the five primary outcomes for our survival analysis. In addition to these primary outcomes, we also defined two composite outcomes: 1) acute coronary syndrome or heart-failure hospitalization or ischemic stroke (“ACS/HF/IS”); and 2) ACS/HF/IS or death (“ACS/HF/IS/death”). Date of death was obtained from MGB-EDW, without mediation by statistical methods. We did not employ competing-risk models or exclude a patient from the dataset for outcome *X* if they also experienced outcome *Y*. Prior history of CAD, IS, or HF was established using diagnostic codes, which we previously evaluated for performance on a set of manually labeled test notes10 (**Supplementary Table 1B)**. In contrast to the event models above, which map a single discharge summary to the probability of an event immediately associated with that hospital discharge, ICD codes were taken in aggregate across a patient’s history, to generate a binary status of extant disease up to a certain date. ### Predictors Predictors used in the sex-specific survival models were: * Patient’s age at entry into study. * Binary diagnostic codes for history of each of three conditions, at time of entry into study: heart failure; ischemic stroke; coronary artery disease or myocardial infarction. * Ten hematologic indices obtained as routine values using the Sysmex XE-5000, at time of entry into study: hematocrit, whole-blood hemoglobin concentration, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, platelet count, red blood cell count, red blood cell distribution width, mean platelet volume, white cell count. All predictors were centered and scaled except the three binary flags for disease. Some predictor sets included 1st order interactions with age. When main effects were centered and scaled, this transformation was applied before computing interaction terms **(Supplementary Table 3)**. ### Derivation and validation of the models For each of the seven outcomes, we used *L**1*-penalized Cox proportional hazards models to estimate coefficients for each risk factor. For each outcome, we developed separate models for male and female patients. Penalty strength for each model was selected by optimizing Harrell’s concordance index (C-index)11 through *k*-fold cross validation (3 ≤ k ≤ 5). Ties were handled with Breslow’s method. Regression and penalty tuning were performed with the python package scikit-survival v0.14.012. Each combination of outcome and sex was treated with five different models, each supported by a different set of predictors: set 1 consisted of the ten hematologic indices (“HEM”); set 2 added age to set 1 (“HEM-AGE”); set 3 added disease history to set 2 (“HEM-AGE-HX”); set 4 consisted of age and history (“AGE-HX”); set 5 consisted of just age (“AGE”). Whenever we included age in a predictor set, we also include first-order interactions between age and all main effects **(Supplementary Table 3B)**. To evaluate performance within the derivation set and provide a measure of uncertainty for the performance estimate, we repeated rounds of 5-fold cross validation, placing the penalty-tuning loop within the model-derivation step for each fold. We then trained final models on the entire derivation dataset. We externally validated each final model on the BWH dataset, having preprocessed the validation data in the same way as the training data. We removed any patients from the validation dataset who also appeared in the derivation dataset. ### Evaluation of model performance Each combination of outcome, predictors, and sex were evaluated by C-index and Brier score. Brier score was calculated using survival at 3 years post-CBC, using the IPCW method to address censoring**13**,**14**. Performance on the derivation dataset is presented with means and confidence intervals computed from repeated *k*-fold cross-validation (40 repeats of 5-fold cross validation). Performance on the validation dataset is presented with means and confidence intervals computed from bootstrapping the validation dataset (1000 bootstrapped samples). Coefficients for final models are estimated using the entire derivation dataset. Where the performance of two different predictor sets is compared for a single event model (e.g., compare predictor sets HEM-AGE-HX and AGE-HX, for the time-to-death model on the female validation cohort), the comparison is reported as the difference between model scores, as evaluated on a given set of patients. For the derivation dataset, this means that model scores are computed during each fold of cross-validation and compared for the patients in that fold. For the validation dataset, this means that model scores are computed on each bootstrapped sample and compared for the patients in that sample. Thus, a distribution of comparisons is compiled for each type of performance metric. We evaluate model calibration graphically by plotting deciles of observed event probability vs. predicted probability. Comparisons are made at 3 years of follow-up. Baseline hazard is estimated nonparametrically, using Breslow’s method15, and represents the hazard for a patient whose predictors are all zero. This is a patient with average hematologic indices, average age, and with no history of CAD or MI or heart failure or stroke. ### Random Survival Forest To explore the benefit of modeling high dimensional interactions, we employed an ensemble learning method, random survival forest (RSF)16 to train models for each combination of outcome, sex, and predictor set with the help of the Python package scikit-survival v0.14.0. We performed a single round of hyperparameter tuning using *k*-fold cross-validation (k = 5). Hyperparameters optimized included: trees per forest {50,150}; maximum depth of any tree {2,4,6,8}; number of features examined for splitting at each node {1,2,3,4}. We report the mean test-set C-index calculated over the *k* folds. The hyperparameter set represented by the reported C-index is the one that produced the highest mean C-index. ## Results ### Baseline characteristics The derivation cohort included 11,056 males and 12,182 females, all of whom received at least one outpatient, non-urgent CBC between 2016-Apr-2 and 2017-May-16 (**Table 1A, 1B**). The external validation cohort, at a second institution, consisted of 10771 males and 18900 females, all of whom received at least one outpatient elective CBC between 2015-Jun-1 and 2017-Jan-1. All patients in this study were at least 30 years old at the time the CBC was performed. Unlike other risk-model efforts17,18, which have focused on a population free of CVD events and not taking statins, our focus was on a broader set of patients, likely at higher risk. Our rationale was that such individuals are responsible for high resource healthcare utilization, and so could benefit from triggered therapeutic innovations to reduce risk, whether they be novel medications or population-health initiatives. View this table: [Table 1A.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T1) Table 1A. Baseline Demographic Characteristics of Derivation and Validation Cohorts View this table: [Table 1B.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T2) Table 1B. Baseline Clinical Characteristics of Derivation and Validation Cohorts Median follow-up time for the derivation cohort was 1,091 days, and for the validation cohort was 1,517 days (**Supplementary Figures 1 and 2**). The most notable differences between derivation and validation cohorts were a younger age in the female validation cohort compared to the derivation cohort (53.7 vs. 57.4 years) and greater racial diversity within the validation cohorts (e.g., 19.6% non-White in female validation vs. 12.3% in female derivation). Prevalence of cardiovascular disease in the male derivation cohort was similar to that in the male validation cohort, whereas prevalence in the female derivation cohort was higher than in the validation cohort. CONSORT-type diagrams showing the flow of patients and their data through our analysis are available in the Supplement (**Supplementary Figures 3 and 4)**. ### Rates of cardiovascular disease and death In the derivation cohort, the proportion of patients with one or more ACS events within the follow-up period was 0.019, with one or more HF hospitalizations was 0.046, with one or more ischemic strokes was 0.022, with one or more coronary revascularizations was 0.023 (**Tables 2A, 2B**). The fraction who died was 0.054. In the validation cohort, the proportion of patients with one or more ACS events was 0.013, with one or more HF hospitalizations was 0.045, with one or more ischemic strokes was 0.021, with one or more coronary revascularizations was 0.014. The fraction who died was 0.050. View this table: [Table 2A.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T3) Table 2A. Case Count in Derivation Cohort View this table: [Table 2B.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T4) Table 2B. Case Count in Validation Cohort Number of patients in **(A)** derivation and **(B)** validation cohorts, and number of each type of outcome observed, as well as the fraction of patients with each outcome. Fraction is reported as outcomes per 1000 patients, along with 95% confidence intervals for the fraction. ### Model performance Model performance was assessed in the derivation cohort using repeated 5-fold cross validation (**Table 3**). We refer to the held-out data defined by repeated *k*-fold cross validation as the internal test set (though, to be clear, the process of repeated *k*-fold cross validation operated on 100% of the derivation cohort). Results from the external validation cohort are obtained by the bootstrap method 19. View this table: [Table 3.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T5) Table 3. Model performance, according to C-index #### The relationship of age to cardiovascular outcomes: AGE predictor set Age is a powerful predictor for cardiovascular outcomes. On the internal test set, the AGE models produced C-indices between 0.65 and 0.74, whereas in the external validation cohort, the AGE models produced C-indices between 0.64 and 0.78 (**Table 3**). Brier scores were < 0.15 for all models in this study, their low magnitude due primarily to the low frequency of events in this dataset (**Supplementary Table 4**). #### The utility of hematologic predictors for predicting cardiovascular events: HEM, and HEM-AGE vs AGE Hematologic predictors also had utility for predicting cardiovascular outcomes. On the internal test set, C-indices for the HEM models ranged from 0.59 to 0.80 whereas on the validation set, C-indices for the HEM models ranged from 0.61 to 0.80 (**Table 3**). For both cohorts, the best discrimination was seen for heart failure hospitalization, composite cardiovascular outcomes, and death for both sexes, whereas the worst prediction was seen for ACS risk in men. In the internal test set, when we added hematologic predictors to the AGE models (thus making the HEM-AGE models), we observed superior discrimination for almost all of our outcomes compared to models predicting on age alone (**Table 4**). The only models that did not clearly benefit from adding the hematologic predictors were models for ACS and PCI/CABG, in which cases the improvement in C-index was small or not statistically significant. Out of 14 comparisons (7 outcomes × 2 sexes) for these two predictor sets, 11 showed a statistically significant change in C-index on the internal test set. Of those 11, 10 also showed a corroborating statistically significant change on the external validation set. View this table: [Table 4.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T6) Table 4. Comparison of pairs of predictor sets according to C-index The addition of hematologic predictors to the AGE models improved (made smaller) Brier scores for almost all outcomes in the internal test set, for both men and women (**Supplementary Table 4**). Out of 14 comparisons for these two predictor sets, 11 showed a statistically significant change in Brier score on the internal test set. Of those 11, 10 also showed a corroborating statistically significant change on the external validation set. Only the ACS models and the female IS model and did not show a statistically significant change. #### Incorporating disease history: AGE-HX vs AGE We hypothesized that history for some diseases would be available in most EHRs, albeit with varying degrees of completeness, and so may be considered for a scalable risk-assessment strategy. Adding disease histories to the AGE predictor set (i.e., making the AGE-HX predictor set) resulted in superior discrimination: AGE-HX models had higher C-index compared to AGE models for all outcomes, for both sexes, and in both derivation and validation cohort (**Table 4**). On the internal test set, increases in C-index were in the range 0.026 to 0.126. On the external validation set, increases in C-index were in the range 0.020 to 0.108. AGE-HX models also had better (lower) Brier scores than the AGE models (**Supplementary Table 4**). Out of 14 comparisons on the internal test set, all 14 showed statistically significant improvement, and all 14 improvements were corroborated by statistically significant improvements in the external validation set. #### Substitution of hematology for history: AGE-HEM vs AGE-HX To address the scenario where these highly predictive diagnostic codes are unavailable (e.g., a population-health program initiated with the help of a diagnostic lab provider), we compared the utility of the AGE-HEM models to that of the AGE-HX models. Equivalence between the two predictor sets depended on the outcome considered. For HF, IS, and the composite outcomes, the drop in C-index associated with using AGE-HEM instead of AGE-HX was < 0.03, for men and women, on both internal and external cohorts (**Table 4**). For modeling death, AGE-HEM was in fact superior to AGE-HX by as much as 0.077. For ACS and PCI/CABG, the drop in C-index was between 0.062 and 0.127 and was significant in both sexes and datasets. Calibration showed minimal difference, though for all outcomes except death and ACS/HF/IS/death, using AGE-HEM instead of AGE-HX produced minimally inferior Brier scores (**Supplementary Table 4**), with the HF model showing the lowest magnitude change in performance (increase of 0.003 on the external dataset). #### Incremental value of hematologic predictors to age + disease history: HEM-AGE-HX vs AGE-HX As was the case when we added hematologic predictors to the AGE models, we found that adding hematologic predictors to the AGE-HX models (thus making the HEM-AGE-HX models) often yielded significantly higher C-indices than did the AGE-HX predictor set, both for men and for women (**Table 4**). For example, C-index for the male HF model was higher by 0.067 under the larger predictor set, and C-index for the male death model was higher by 0.080 under the larger predictor set. The incremental effect of hematologic predictors was usually smaller than the difference between HEM-AGE and AGE, and so effects were less often significant in both validation and derivation datasets. Models for HF, death, and the composite outcomes showed statistically significant improvements in both cohorts for both sexes. Improvements in Brier score followed almost the same pattern (**Supplementary Table 4**). On the validation cohort, the HEM-AGE-HX models demonstrated C-indices between 0.75 and 0.85 (**Table 3**). #### Calibration curves We examined calibration curves for the HEM-AGE-HX model for all outcomes (**Figure 1)**, as measured on the validation cohort. For females, risk is calibrated well for the all but the highest-risk decile. Over-prediction of risk for high-risk patients is more pronounced among the male models than the female models, consistent with higher Brier scores for male models. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F1) Figure 1. Calibration curves for HEM-AGE-HX models on validation cohort. Calibration curves for the seven modeled outcomes, on female (**A**) and male (**B**) in the validation cohort. Predictor set is HEM-AGE-HX, which includes hematology and age and disease history, and interactions with age. Predicted risk is compared with observed outcomes at 3 years. #### Survival by predicted-risk quartile Discrimination was assessed visually using Kaplan-Meier curves for the seven outcomes, as modeled with the HEM-AGE-HX predictor set and measured on the external validation cohort (**Figure 2)**. In each subplot, the cohort is split into quartiles, according to the predicted relative risk of the outcome for each patient. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F2) Figure 2. Kaplan-Meier curves for HEM-AGE-HX models, by quartiles of predicted risk in validation cohort. Curves are shown for all seven outcomes in the validation cohort, females (**A**) and males (**B**). For each outcome, the cohort is split into quartiles, according to the risk model developed for that outcome on the derivation cohort. Lowest-risk quartile is shown in blue; highest-risk quartile is shown in red. For all seven outcomes among men, the risk groups predicted by the models maintain proper rank by observed risk over the whole observation period, at least for the highest risk 3 quartiles. For the HF model, survival curves for the low-risk and lowest-risk quartiles intersect at a follow-up time of around 4 years, which can indicate poor calibration of this model for the healthiest patients, though given the low event rate after 4 years (< 10/month for the entire male cohort), this is also readily attributable to noise in the Kaplan-Meier estimator. Performance for the female-cohort models is similar: survival curves for the riskiest three quartiles are properly ordered over the entire observation period, but the low-risk and lowest-risk curves are sometimes inverted for the HF model and for the two composite models. The only exception is the PCI/CABG model, for which the middle two risk groups are inverted at t < 2 years. ### Parameter estimates Proportional-hazards coefficients in the HEM-AGE-HX models were examined for the three outcomes (composites not included) where the HEM-AGE-HX predictor set had better discrimination than AGE-HX on the internal test set (**Table 5**). The *L**1* penalization selects approximately half of the ten hematologic indices for each of the final models. Among the highly correlated set {HCT, HGB, MCH, MCHC, MCV, RBC}, usually more than half are eliminated, with the most commonly retained being RBC and HGB. Two out of the ten hematologic indices appear in all six models in Table 8: platelet count (PLT) and red cell distribution width (RDW). White cell count (WBC) and red cell count (RBC) are retained in five out of six models. As expected, the disease-history terms and the age terms are extremely strong factors in most HEM-AGE-HX models. View this table: [Table 5.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T7) Table 5. HEM-AGE-HX parameter estimates for selected outcomes ### Random Survival Forest Discrimination of the Cox proportional hazards models was comparable to the discrimination of the random survival forests (**Supplementary Table 6**). Many of the optimized RSF models had slightly better discrimination than the Cox proportional hazards models, with advantages for RSF ranging from −0.009 to 0.050. The biggest advantage for RSF came when using the HEM predictor set, which does not include any explicit interaction terms, and which therefore might be harder for a linear model to learn from than for a tree-based model. Improvement was less apparent for the HEM-AGE-HX models. ## Discussion The objective of our work is fundamentally different from that of most efforts focused on training risk models: we sought to develop risk models for cardiovascular events based entirely on low-cost, widely available, well-calibrated20, and objective input features. Whereas most risk models have emphasized the utility of a diverse set of inputs, which will typically require providers to order specific laboratory tests, document an expansive set of diagnoses, and record candid responses of patients regarding such attributes as smoking status18, our approach deliberately attempts to minimize the burden on providers or dependence on costly elements of data collection. Our rationale is based in part on our own experience with EHR-based risk assessment: for example, the percentage of patients in our cohorts for which one can compute the Pooled-Cohort Equation risk score from structured data within the MGB-EDW is <30% (data not shown). Even when the relevant tests may have been ordered, they may not be available as structured data in the EHR, if, for example, they were collected outside of the system and were only available as scanned documents. Furthermore, most risk models focus on primary prevention – whereas our focus was on a broader patient population with anticipated high resource utilization. We have used hematologic predictors based on an assumption that many contributors to chronic diseases such as cardiovascular disease are systemic and may be reflected across multiple tissue types. This hypothesis is supported by a large amount of prior data, both experimental and observational, including the ability to estimate cardiovascular outcomes from digital retinograms21, the association of somatic mutations in leukocytes with coronary artery disease and heart failure risk6,22, the association of a complete-blood count-based score with mortality23–27, and the modulation of cardiovascular disease through leukocyte-restricted gene knockouts28,29. Others have shown association of hematologic indices with all-cause mortality and cardiovascular events, including a focus on red-cell distribution width23,30–34 and specific leukocyte subpopulations such as the neutrophil-to-lymphocyte ratio35. The utility of RDW for predicting outcomes may be related to its association with anemias of chronic disease or with clonal hematopoiesis of indeterminate potential6, which appear to predict similar patterns of events. In both cases, the biomarkers may primarily be providing a readout for underlying systemic pathway abnormalities, with the hematopoietic lineage representing a shared upstream pathway abnormality or an index of prior exposures. Although our work demonstrated success primarily for heart failure, major adverse cardiovascular events, all-cause mortality, and to a lesser extent ischemic stroke, we had less success in predicting incident acute coronary syndromes beyond what is possible with age and prior diagnoses. Nonetheless, a model using hematologic indices alone was able to modestly predict incident acute coronary syndromes (C-index = 0.62 [95% CI 0.57 to 0.67] and 0.60 [95% CI 0.56 to 0.63] for women and men, respectively, on validation dataset, comparable to polygenic risk scores2) suggesting that some signal exists, but there is likely redundancy with age and existing CVD. Nonetheless, a strategy combining age and hematologic predictors may still be useful in situations where structured diagnoses are not available or as complete as found within the MGB-EDW. The primary limitation of our work, which is also a strength, is a deliberate use of restricted numbers of inputs to the risk models. Our patient population may also exhibit some selection bias that could limit generalizability to other settings – though in such cases, our approach combining widely available inputs with automated adjudication should enable rescaling or retraining. We have also not looked at out how these measures of risk progress through time and in response to therapy in individual patients. Given the limited performance benefit of random survival forests for predicting outcomes with these same inputs, we chose not to explore other algorithms, though it is possible some would show superior performance. There is no shortage of risk models available for cardiovascular outcomes – and the challenge is motivating their use. We suspect that models such as the ones described here will primarily be attractive at an institutional or payer level, where better estimation of risk for a very large percentage of a population has clear economic value. We are not aware of any comparable effort to deliberately push the limits of models with minimal low-cost input features. We see our approach as complementary to other models which typically rely on individual or provider-entered data and have focused on patients not currently taking statins and free of cardiovascular disease17,18,36. These approaches also lend themselves to the creation of continuously updating risk trajectories where the rates of change may have additional predictive utility. Our goal is to maximize the number of patients for which a system-level estimation of risk is feasible, with minimal disruption to the existing workflow of providers, a place where we see model deployment having the greatest impact. ## Data Availability The data that support the findings of this study are available on request from the corresponding author R.C.D. upon approval of the data sharing committees of the respective institutions. The data are not publicly available due to the presence of information that could compromise research participant privacy. ## Supplementary Figures ![Supplementary Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F3.medium.gif) [Supplementary Figure 1.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F3) Supplementary Figure 1. Distribution of time to events in derivation cohort. Time to 1st outcome, also time to right-censoring, for patients without outcomes in the observation period. Results broken down by the seven outcomes for female (**A**) and male (**B**) in this study. Derivation cohort. ![Supplementary Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F4.medium.gif) [Supplementary Figure 2.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F4) Supplementary Figure 2. Distribution of time to events in validation cohort. Time to 1st outcome, also time to right-censoring, for patients without outcomes in the observation period. Results broken down by the seven outcomes for female (**A**) and male (**B**) in this study. ![Supplementary Figure 3A:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F5.medium.gif) [Supplementary Figure 3A:](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F5) Supplementary Figure 3A: Flow diagram for modeling CVD events in derivation cohort MGH patients and their data, used to model time-to-event for ACS, HF hospitalization, stroke, PCI/CABG ![Supplementary Figure 3B:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F6.medium.gif) [Supplementary Figure 3B:](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F6) Supplementary Figure 3B: Flow diagram for modeling CVD events in validation cohort BWH patients and their data, used to model time-to-event for ACS, HF hospitalization, stroke, PCI/CABG ![Supplementary Figure 4A:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F7.medium.gif) [Supplementary Figure 4A:](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F7) Supplementary Figure 4A: Flow diagram for modeling death in derivation cohort MGH patients and their data, used to model time-to-event for death ![Supplementary Figure 4B:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.12.21249662/F8.medium.gif) [Supplementary Figure 4B:](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/F8) Supplementary Figure 4B: Flow diagram for modeling death in validation cohort BWH patients and their data, used to model time-to-event for death ## Supplementary Tables View this table: [Supplementary Table 1A.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T8) Supplementary Table 1A. Performance of artificial-intelligence models for CVD-event adjudication. Performance of classifier algorithms used to adjudicate outcomes for our study. Classification thresholds were chosen to maximize F1 score. View this table: [Supplementary Table 1B.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T9) Supplementary Table 1B. Performance of ICD codes in predicting history of disease ICD codes can predict a patient’s history of disease at the time of their entry into the survival study. View this table: [Supplementary Table 2.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T10) Supplementary Table 2. Main effects, notation and description Main effects of the proportional-hazards models referred to in this study. All main effects are unitless in the proportional-hazards model. Except for the three history predictors, all predictors are mean-centered and scaled by their standard deviation, as measured in the derivation cohort. The history variables are binary (0 for no history; 1 for yes history of disease) and are not scaled or centered. View this table: [Supplementary Table 3.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T11) Supplementary Table 3. Terms used in the five predictor sets for survival models The elements in each list are the terms, *X**i*, in the linear predictor of the Cox proportional-hazards model. View this table: [Supplementary Table 4.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T12) Supplementary Table 4. Model performance, according to Brier score at 3 years View this table: [Supplementary Table 5.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T13) Supplementary Table 5. Comparison of pairs of predictor sets according to 3-year Brier Score View this table: [Supplementary Table 6.](http://medrxiv.org/content/early/2021/01/15/2021.01.12.21249662/T14) Supplementary Table 6. Comparison of random survival forest model to Cox proportional hazards on the basis of C-index * Received January 12, 2021. * Revision received January 12, 2021. * Accepted January 15, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Pencina MJ, Goldstein BA, D’Agostino RB. Prediction Models — Development, Evaluation, and Clinical Application. New Engl J Med. 2020;382:1583–1586. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMp2000589&link_type=DOI) 2. 2.Mosley JD, Gupta DK, Tan J, Yao J, Wells QS, Shaffer CM, Kundu S, Robinson-Cohen C, Psaty BM, Rich SS, Post WS, Guo X, Rotter JI, Roden DM, Gerszten RE, Wang TJ. Predictive Accuracy of a Polygenic Risk Score Compared With a Clinical Risk Score for Incident Coronary Heart Disease. Jama. 2020;323:627–635. 3. 3.Khan SS, Cooper R, Greenland P. Do Polygenic Risk Scores Improve Patient Selection for Prevention of Coronary Artery Disease? Jama. 2020;323:614–615. 4. 4.Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, Dehghan A, Muller DC, Elliott P, Tzoulaki I. Predictive Accuracy of a Polygenic Risk Score–Enhanced Prediction Model vs a Clinical Risk Score for Coronary Artery Disease. Jama. 2020;323:636– 645. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) 5. 5.Zethelius B, Berglund L, Sundström J, Ingelsson E, Basu S, Larsson A, Venge P, Ärnlöv J. Use of Multiple Biomarkers to Improve the Prediction of Death from Cardiovascular Causes. New Engl J Medicine. 2008;358:2107–2116. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa0707064&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18480203&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000255849300004&link_type=ISI) 6. 6.Jaiswal S, Fontanillas P, Flannick J, Manning A, Grauman PV, Mar BG, Lindsley RC, Mermel CH, Burtt N, Chavez A, Higgins JM, Moltchanov V, Kuo FC, Kluk MJ, Henderson B, Kinnunen L, Koistinen HA, Ladenvall C, Getz G, Correa A, Banahan BF, Gabriel S, Kathiresan S, Stringham HM, McCarthy MI, Boehnke M, Tuomilehto J, Haiman C, Groop L, Atzmon G, Wilson JG, Neuberg D, Altshuler D, Ebert BL. Age-Related Clonal Hematopoiesis Associated with Adverse Outcomes. New Engl J Medicine. 2014;371:2488–2498. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1408617&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25426837&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) 7. 7.Wang TJ, Larson MG, Vasan RS, Cheng S, Rhee EP, McCabe E, Lewis GD, Fox CS, Jacques PF, Fernandez C, O’Donnell CJ, Carr SA, Mootha VK, Florez JC, Souza A, Melander O, Clish CB, Gerszten RE. Metabolite profiles and the risk of developing diabetes. Nat Med. 2011;17:448–453. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nm.2307&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21423183&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000289245100035&link_type=ISI) 8. 8.Pearl J, Bareinboim E. External Validity: From Do-Calculus to Transportability Across Populations. Stat Sci. 2014;29:579–595. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1214/14-STS486&link_type=DOI) 9. 9.DeSalvo KB, O’Carroll PW, Koo D, Auerbach JM, Monroe JA. Public Health 3.0: Time for an Upgrade. Am J Public Health. 2016;106:621–622. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2105/AJPH.2016.303063&link_type=DOI) 10. 10.Goto S, Homilius M, John JE, Truslow JG, Werdich AA, Blood AJ, Park BH, MacRae CA, Deo RC. Artificial intelligence-enabled event adjudication: estimating delayed cardiovascular effects of respiratory viruses. medRxiv doi 101101/2020111220230706. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=101101/2020111220230706&link_type=DOI) 11. 11.Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the Yield of Medical Tests. Jama. 1982;247:2543–2546. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.1982.03320430047030&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7069920&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1982NP06700024&link_type=ISI) 12. 12.Pölsterl S. scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. Journal of Machine Learning Research. 2020;21:1–6. 13. 13.Brier GW. Verification of Forecasts Expressed in Terms of Probability. Mon Weather Rev. 1950;78:1–3. 14. 14.Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999;18:2529–2545. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529::AID-SIM274>3.0.CO;2-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10474158&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000082507800027&link_type=ISI) 15. 15.Breslow N. Discussion on Professor Cox’s Paper. J Royal Statistical Soc Ser B Methodol. 1972;34:202–220. 16. 16.Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Statistics. 2008;2:841–860. 17. 17.Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’Donnell CJ, Robinson JG, Schwartz JS, Shero ST, Smith SC, Sorlie P, Stone NJ, Wilson PWF. 2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J Am Coll Cardiol. 2014;63:2935–2959. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjE3OiI2My8yNV9QYXJ0X0IvMjkzNSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzE1LzIwMjEuMDEuMTIuMjEyNDk2NjIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 18. 18.Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. Bmj. 2017;357:j2099. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNTcvbWF5MjNfNS9qMjA5OSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzE1LzIwMjEuMDEuMTIuMjEyNDk2NjIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 19. 19.DiCiccio TJ, Efron B. Bootstrap confidence intervals. Stat Sci. 1996;11:189–228. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1214/ss/1032280214&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996WC62900002&link_type=ISI) 20. 20.Miller WG. The role of proficiency testing in achieving standardization and harmonization between laboratories. Clin Biochem. 2009;42:232–235. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.clinbiochem.2008.09.004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19863911&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000264309400003&link_type=ISI) 21. 21.Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016;316:2402–2410. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2016.17216&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27898976&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) 22. 22.Dorsheimer L, Assmus B, Rasper T, Ortmann CA, Ecke A, Abou-El-Ardat K, Schmid T, Brüne B, Wagner S, Serve H, Hoffmann J, Seeger F, Dimmeler S, Zeiher AM, Rieger MA. Association of Mutations Contributing to Clonal Hematopoiesis With Prognosis in Chronic Ischemic Heart Failure. JAMA Cardiology. 2019;4:25–33. 23. 23.Horne BD, Anderson JL, Muhlestein JB, Ridker PM, Paynter NP. Complete blood count risk score and its components, including RDW, are associated with mortality in the JUPITER trial. Eur J Prev Cardiol. 2013;22:519–526. 24. 24.Horne BD, May HT, Muhlestein JB, Ronnow BS, Lappé DL, Renlund DG, Kfoury AG, Carlquist JF, Fisher PW, Pearson RR, Bair TL, Anderson JL. Exceptional Mortality Prediction by Risk Scores from Common Laboratory Tests. The American Journal of Medicine. 2009;122:550–558. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.amjmed.2008.10.043&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19486718&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000266503600013&link_type=ISI) 25. 25.Horne BD, Muhlestein JB, Bennett ST, Muhlestein JB, Jensen KR, Marshall D, Bair TL, May HT, Carlquist JF, Hegewald M, Knight S, Le VT, Bunch TJ, Lappé DL, Anderson JL, Knowlton KU. Extreme erythrocyte macrocytic and microcytic percentages are highly predictive of morbidity and mortality. Jci Insight. 2018;3:e120183. 26. 26.Anderson JL, Ronnow BS, Horne BD, Carlquist JF, May HT, Bair TL, Jensen KR, Muhlestein JB, Group I. Usefulness of a Complete Blood Count-Derived Risk Score to Predict Incident Mortality in Patients With Suspected Cardiovascular Disease. The American Journal of Cardiology. 2007;99:169–174. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.amjcard.2006.08.015&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17223413&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000243545900005&link_type=ISI) 27. 27.Brennan M-L, Reddy A, Tang WHW, Wu Y, Brennan DM, Hsu A, Mann SA, Hammer PL, Hazen SL. Comprehensive Peroxidase-Based Hematologic Profiling for the Prediction of 1-Year Myocardial Infarction and Death. Circulation. 2010;122:70–79. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjg6IjEyMi8xLzcwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDEvMTUvMjAyMS4wMS4xMi4yMTI0OTY2Mi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 28. 28.Eck MV, Bos IST, Kaminski WE, Orsó E, Rothe G, Twisk J, Böttcher A, Amersfoort ESV, Christiansen-Weber TA, Fung-Leung W-P, Berkel TJCV, Schmitz G. Leukocyte ABCA1 controls susceptibility to atherosclerosis and macrophage recruitment into tissues. Proc National Acad Sci. 2002;99:6298–6303. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czo5OiI5OS85LzYyOTgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wMS8xNS8yMDIxLjAxLjEyLjIxMjQ5NjYyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 29. 29.Duewell P, Kono H, Rayner KJ, Sirois CM, Vladimer G, Bauernfeind FG, Abela GS, Franchi L, Nuñez G, Schnurr M, Espevik T, Lien E, Fitzgerald KA, Rock KL, Moore KJ, Wright SD, Hornung V, Latz E. NLRP3 inflammasomes are required for atherogenesis and activated by cholesterol crystals. Nature. 2010;464:1357–1361. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature08938&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20428172&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000277149000048&link_type=ISI) 30. 30.Perl M, Havakuk O, Finkelstein A, Halkin A, Revivo M, Elbaz M, Herz I, Keren G, Banai S, Arbel Y. High red blood cell distribution width is associated with the metabolic syndrome. Clinical Hemorheology and Microcirculation. 2016;63:35–43. 31. 31.Zalawadiya SK, Zmily H, Farah J, Daifallah S, Ali O, Ghali JK. Red Cell Distribution Width and Mortality in Predominantly African-American Population With Decompensated Heart Failure. J Card Fail. 2011;17:292–298. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cardfail.2010.11.006&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21440866&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) 32. 32.Felker MG, Allen LA, Pocock SJ, Shaw LK, McMurray J, Pfeffer MA, Swedberg K, Wang D, Yusuf S, Michelson EL, Granger CB, Investigators C. Red Cell Distribution Width as a Novel Prognostic Marker in Heart Failure Data From the CHARM Program and the Duke Databank. Journal of the American College of Cardiology. 2007;50:40–47. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjc6IjUwLzEvNDAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wMS8xNS8yMDIxLjAxLjEyLjIxMjQ5NjYyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 33. 33.Tonelli M, Sacks F, Arnold M, Moye L, Davis B, Pfeffer M, Investigators for the C and RE (CARE) T. Relation Between Red Blood Cell Distribution Width and Cardiovascular Event Rate in People With Coronary Disease. Circulation. 2008;117:163–168. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjk6IjExNy8yLzE2MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzE1LzIwMjEuMDEuMTIuMjEyNDk2NjIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 34. 34.Kofink D, Muller SA, Patel RS, Dorresteijn JA, Berkelmans GF, Groot MC de, Solinge WW van, Haitjema S, Leiner T, Visseren FL, Hoefer IE, Asselbergs FW, Group. Routinely measured hematological parameters and prediction of recurrent vascular events in patients with clinically manifest vascular disease. PLOS ONE. 2018;13:e0202682. 35. 35.Forget P, Khalifa C, Defour J-P, Latinne D, Pel M-C, Kock M. What is the normal value of the neutrophil-to-lymphocyte ratio? BMC Research Notes. 2017;10:12. 36. 36.Ridker PM, Buring JE, Rifai N, Cook NR. Development and Validation of Improved Algorithms for the Assessment of Global Cardiovascular Risk in Women: The Reynolds Risk Score. Jama. 2007;297:611–619. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.297.6.611&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17299196&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.12.21249662.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000244177400024&link_type=ISI)