Individualized risk trajectories for iron-related adverse outcomes in repeat blood donors
=========================================================================================

* W. Alton Russell
* David Schienker
* Brian Custer

## ABSTRACT

**Background** Despite a fingerstick hemoglobin requirement and 56-day minimum donation interval, repeat blood donation continues to cause and exacerbate iron deficiency.

**Study design and methods** Using data from the REDS-II Donor Iron Status Evaluation study, we developed multiclass prediction models to estimate the competing risk of hemoglobin deferral and collecting blood from a donor with sufficient hemoglobin but low or absent underlying iron stores. We compared models developed with and without two biomarkers not routinely measured in most blood centers: ferritin and soluble transferrin receptor. We generated and analyzed ‘individual risk trajectories’: estimates of how each donors’ risk developed as a function of the time interval until their next donation attempt.

**Results** With standard biomarkers, the top model had a multiclass area under the receiver operator characteristic curve (AUC) of 77.6% (95% CI 77.3% - 77.8%). With extra biomarkers, multiclass AUC increased to 82.8% (95% CI 82.5% - 83.1%). In the extra biomarkers model, ferritin was the single most important variable, followed by the donation interval. We identified three risk archetypes: ‘fast recoverers’ (<10% risk of any adverse outcome on post-donation day 56), ‘slow recoverers’ (>60% adverse outcome risk on day 56 that declines to <35% by day 250), and ‘chronic high-risk’ (>85% risk of adverse outcome on day 250).

**Discussion** A longer donation interval reduced estimated risk of iron-related adverse events for most donors, but risk remained high for some. Tailoring safeguards to individual risk estimates could reduce blood collections from donors with low or absent iron stores.

Key words
*   blood donation
*   iron deficiency
*   ferritin
*   hemoglobin

## INTRODUCTION

Repeat blood donation can cause or exacerbate iron deficiency, with higher incidence among teen donors and premenopausal women [1–6]. In the United States, potential donors are screened using fingerstick hemoglobin or hematocrit tests and deferred if levels are below a minimum cutoff. Currently, minimum hemoglobin levels are 12.5 g/dL for women and 13.0 g/dL for men. Because fingerstick hemoglobin is an unreliable indicator of iron stores, some donors with low or absent iron stores qualify to donate and are subjected to further iron loss [4]. In addition, low hemoglobin deferrals consume time and resources for both donors and blood centers, decrease donor satisfaction, and reduce the likelihood of future donations [7]. More reliable measures of iron status include ferritin, zinc protoporphyrin, soluble transferrin receptor, and hepcidin, but these are more costly to measure, and most are not yet available as point of care tests [8].

Past studies have identified several factors that increase risk of iron deficiency among blood donors. The Danish Blood Donor Study found that sex, menopause status, and donation history were the strongest predictors of iron deficiency among donors, and weight, age, vitamin use, and diet were also significant [5]. Similar results have been found for donors in the United States, Australia, and the Netherlands [1–4,6]. Other studies identified age, time since last donation, and donation history as strong predictors of a low hemoglobin deferral for repeat blood donors [9,10]. To our knowledge, no prediction model has been developed that considers the competing risks of hemoglobin deferral and of collecting blood from a donor with sufficient hemoglobin but low or absent underlying iron stores.

In this study, we used data for a cohort of donors from the REDS-II Iron Status Evaluation (RISE) study [11] to develop machine learning models that estimate the risks of hemoglobin deferral and collecting blood from a donor with low or absent iron stores as a function of the donation interval – the length of time from an index donation until the donor returns for a subsequent donation attempt. We analyzed the models’ predictive performance and variable importance, and we used the models to generate and assess donors’ risk trajectories.

## METHODS

Using data from the RISE study, we trained multiclass prediction models to predict the risk of three iron-related adverse outcomes at a subsequent donation attempt: hemoglobin deferral, donating with low iron stores, and donating with absent iron stores. We assessed the models’ predictive performance, compared performance with and without the inclusion of two non-routine biomarkers (ferritin and STfR) as features for prediction, and generated and analyzed individual risk profiles for each donor’s likelihood of iron-related adverse donation outcomes at their next visit as a function of their donation interval (how long until the donor returns). We have shared all code in a public repository [12] and provide the TRIPOD checklist (**Table S4**) [13].

### Data preprocessing and formatting

The RISE dataset contains data from six U.S. blood centers on donation attempts for 2,425 donors over a 2-year period December 2007 – December 2009 [11]. Enrolled participants completed a whole blood donation and agreed to donate frequently over the next two years. The study targeted equal numbers of male and female donors and about twice as many frequent donors compared to first time or reactivated donors. Collected data elements include donation history, biometrics for each visit, and questionnaire responses regarding demographics, diet, supplemental iron consumption, female reproductive health, and demographics. For the ‘standard biomarkers’ model, we used 46 variables available for donations in the RISE dataset together with the time interval until the donor returns to predict the outcome of a follow-up donation attempt. We assumed that donor characteristics measured at the baseline visit such as diet, vitamin use, smoking, and female reproductive health indicators would not change significantly over the study period, and we used them to predict outcomes following subsequent donations by the same donor. We also developed an ‘extra biomarkers’ model, for which we included ferritin, STfRr, and derived measures (log ferritin, ratio of STfR to log ferritin, and calculated body iron) as features for prediction. We re-coded or imputed missing values for some fields; **Table S1** contains these details for all features used for prediction. We also included a composite dietary iron consumption score that was generated for each donor in the RISE dataset as part of a prior secondary analysis of this dataset [14].

To generate the model development dataset, we considered donations with at least 150 mL of red blood cell loss as potential index donations, which included whole blood donations, mixed apheresis donations that included a single red cell unit, and some donations that were classified as ‘quantity not sufficient.’ We excluded potential index donations that were double red cell donations due to limited data, the altered iron recovery profiles that follows the large iron loss from double red collection, and the 112-day mandatory deferral period after such donations. We also excluded donations that were missing a measurement of ferritin and donations for which neither fingerstick hemoglobin nor hematocrit was recorded. If follow-up visits were recorded after potential index donations, we generated labels with the time until the follow-up visit (in days) and its outcome. For all index donations followed by a visit with significant iron loss, defined as a loss of at least 55 mL of red blood cells, we generated a label for the index donation based on the first such follow-up visit. Additionally, we generated labels for any follow-up visits that did not result in significant iron loss (i.e., visits resulting in a deferral or apheresis donations of platelets or plasma with <55 mL of red blood cell loss) if such visits occurred between the index donation and the first follow-up visit with significant iron loss. For each index donation *i*, the outcome of its follow-up visits (*z**i*) was classified as hemoglobin deferral (labeled as *z**i* = 1) if one were recorded; as a low iron donation (*z**i* = 2) if pre-donation ferritin was ≥ 12 mg/dl and < 20 mg/dl for women or ≥ 12 mg/dl and < 30 mg/dl for men; as an absent iron donation (*z**i* = 3) if pre-donation ferritin was <12 mg/dl; and as a ‘no adverse outcome’ donation otherwise (*z**i* = 0). Follow-up donations without ferritin measurements (*z**i* = −1) were not included in the model development dataset but were included in a ‘first return’ dataset. We used the first return dataset to calibrate the model and generate risk trajectories as described below.

### Prediction model development

#### Model selection

We evaluated several candidate model types: gradient boosted machines, random forest, regression trees, and generalized linear models with elastic net regularization with and without second order interaction terms. For each model type, to optimize performance while minimizing overfitting, we evaluated multiple parameter configurations via grid search with nested cross validation and resampling (**Table S2**) [15]. We generated 15 *model assessment partitions* which consisted of 3 resamples of 5 equal-sized partitions of the entire dataset, which we generated with stratified sampling to ensure the distribution of outcomes was balanced across partitions. For each model assessment partition, we defined all data not included in the partition as the corresponding *model tuning set*. Within the 15 tuning sets, we assessed all candidate model configurations (model type and hyperparameter setting) using 5-fold validation, assessing the multiclass area under the reliever operator characteristic curve (multiclass AUC) using the Hand and Till method [16]. We compared model configurations based on the average multiclass AUC across 5 cross validation folds averaged over all 15 tuning sets (assessing a total of 75 realizations of each candidate model configuration).

We also evaluated ensemble models, which combine the risk scores from multiple base models. We assessed two methods of combining risk scores from base models: a simple average and a weighted average, for which we weighted each model’s score proportionally to its accuracy raised to a power of four as suggested by Large et. al. [17]. We assessed AUC for each candidate ensemble configuration across the same 5 cross validation folds within each of the 15 tuning sets.

We selected the top model configuration based on multiclass AUC. To produce an unbiased assessment of the selected model configuration, we then assessed multiclass AUC on each of the 15 model assessment partitions. For each assessment partition, we trained the model configuration on all data not in the partition and used this model to generate risk scores on the assessment partition; we used those risk scores to calculate multiclass AUC. We completed this model development process both with ferritin, STfR, and derived measures as features (extra biomarkers model) and without (standard biomarkers model). We also computed one-vs-rest AUC for each feature, a measure of how well the model discriminates one outcomes from the other three.

#### Feature importance

For the top-performing “standard” and “extra biomarkers” model configurations, we assessed the importance of features for prediction using a random permutation method [18]. We trained the model on an altered version of each model tuning sets for which one feature column was randomly shuffled. We then generated risk scores for the corresponding model assessment partition and calculated the multiclass AUC. We calculated the percent decrease in multiclass AUC when a feature’s column was shuffled as compared to using the unaltered model tuning sets, which we used as a measure of the feature’s importance to the model.

#### Calibration

To generate the final model, we retrained the selected model configurations on the entire model development dataset, then we calibrated the predicted probabilities to the ‘first return’ dataset. In this dataset, index donations were labeled only once with the outcome of the first subsequent donation attempt, which included follow-up donations with no ferritin measurement. We estimated the distribution of outcomes in this dataset by assuming that followup donations with no ferritin measurement would have the same distribution of absent, low, and ‘no-adverse outcome’ donations as did the follow-up donations for which ferritin was measured. Mathematical details are provided in the supplemental methods.

### Risk trajectory analysis

For each index donation, we generated a risk trajectory using the calibrated ‘extra biomarkers’ model by predicting the likelihood of each outcome at the donor’s next donation attempt for each possible follow-up donation interval between 56 and 250 days. We generated graphical representations of individual donors’ risk trajectories showing how the estimated of each adverse outcome evolves depending on the number of days until the donor returns. To illustrate differences in risk trajectories, we created three recovery archetypes: ‘fast recoverers’ (<10% risk of any adverse outcome on post-donation day 56), ‘slow recoverers’ (>60% adverse outcome risk on day 56 that declines to <35% by day 250), and ‘chronic high-risk’ (>85% risk of adverse outcome on day 250). In a separate subgroup analysis, we compared the mean and 95% confidence interval for the estimated risk of each adverse outcome as a function of the donation interval for groups of donors stratified by selected parameters.

## RESULTS

### Data processing

In the RISE dataset, a total of 7817 donations from 1922 donors were followed by at least one follow-up visit. We excluded 520 index donations because hemoglobin was not recorded, and we excluded a further 18 index donations from the first return dataset because the first follow-up visit with significant iron loss was less than 56 days later. The first return dataset contained 7279 index donations labeled with the outcome of the first follow-up donation. That outcome was a hemoglobin deferral for 636 index donations; a low-iron donation for 754; an absent iron donation for 568; no adverse outcome for 1340; and a completed donation with unknown iron status for 3981. The model development dataset included 3529 unique index donations from 1543 donors. 3149 index donations were labeled with one follow-up donation, 289 were labeled twice, and 91 were labeled with 3 or more follow-up visit outcomes (maximum of 8).

### Prediction model

Separately for the standard and extra biomarker versions, we evaluated 2,006 non-ensembled model configurations (model type and hyperparameter setting) and four enemble models. For both versions, the top-performing non-ensembled model was a gradient boosted machine (**Figure S1, Table S2**). The top-performing standard biomarkers model configuration was an ensemble model that averaged the risk scores for three gradient boosted machine and three random forest models; the top extra biomarkers configuration was an ensemble model that averaged risk scores for two gradient boosted machines, a random forest model, and two penalized regression models, one with second order interaction terms. Multiclass AUC for the top ensemble models assessed on the model assessment partitions was 77.6% (95% CI 77.3% - 77.8%) for the standard biomarkers model and 82.8% (95% CI 82.5% - 83.1%) for the extra biomarkers model (**Table 1**). For both the standard and extra biomarkers model, the top ensemble model had a higher mean AUC with lower standard error than each of the base models that comprised it across the model tuning sets (**Figure S2**). Both models had the highest discriminative performance for predicting no adverse outcome donations and the lowest discriminative performance for predicting low iron donations (**Figure 1**). Inclusion of the extra biomarkers had the greatest improvement in distinguishing low and absent iron donations from the other outcomes (one-vs-rest AUC increased 6.9% for low iron donations and 8.9% for absent iron donations; **Table 1**).

View this table:
[Table 1:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/T1)

Table 1: 
Multiclass and one verses rest AUC by outcome for the top “standard” and “extra biomarker” model configuration as assessed on the model assessment partitions.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F1)

Figure 1: 
One-vs-rest ROC curves for the standard and extra biomarker models as assessed on the model assessment partitions. For each outcome, one ROC curve is plotted for each of the three resamples of the data, combining data from the corresponding 5 model assessment partitions. Black dot at 75% sensitivity and 75% specificity.

For the standard biomarkers model, the donation interval (time to return) was the most important feature for prediction (the median decrease in multiclass AUC when shuffling this feature was 4.9%), followed by venous hemoglobin and the number of red blood cell units donated in the last 24 months (median decreases in multiclass AUC of 3.1%, and 2.0%, respectively; **Figure 2** and **Figure S3**). For the extra biomarkers model, ferritin was by far the most important feature, followed by donation interval and fingerstick hemoglobin/hematocrit (median decreases in multiclass AUC of 3.6%, 1.5%, and 0.4%, respectively; **Figure 2** and **Figure S4**). For both versions, the calibration weights down-weighted relative likelihood of hemoglobin deferrals (**Table S3**).

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F2)

Figure 2: 
Relative variable importance for the top “standard” and “extra” biomarker models. Variables were included in this figure if among the top 15 most important variables for at least one of the models. Full variable importance plots shown in the supplement.

### Individual risk profiles

Using the calibrated extra biomarkers model on the first return dataset, the median risk of any adverse outcome with a 56-day donation interval was 69% (Interquartile range [IQR] 35% – 92%). For a 250-day interval, the median risk of any adverse outcome fell to 36% (IQR 11% – 66%). The median decrease in absolute risk from an interval of 56 to 250 days was 22% (IQR 9% – 32%).

Individual donor risk trajectories were highly heterogeneous (**Figure S5**). Across the first return dataset, 433 (12%) index donations were by a fast recoverer (<10% risk of any adverse outcome on post-donation day 56), 304 (8%) index donations were by a slow recoverer (>60% adverse outcome risk on day 56 and <35% on day 250), and 403 (11%) index donations were by a chronic high-risk donor (>85% risk of adverse outcome on day 250). Risk trajectories differed markedly across these three archetypes (**Figure 3**). For chronic high-risk donors, while overall adverse outcome risk slightly declined for longer donation intervals, risk of a low iron donation increased with donation interval for most donors (**Figure 4**).

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F3)

Figure 3: 
Individual risk profiles for four selected donors that represent each of the three donation archetypes. The donation interval (time to return donation attempt) is varied on the x axis from 56 to 250 days. Height of colored area indicates the risk of each adverse outcome and likelihood of a ‘no adverse outcome’ donation.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F4)

Figure 4: 
Risk trajectory for any adverse outcome (top plot) or a specific adverse outcome (bottom three plots) for 300 randomly selected donors. Five randomly selected donors fitting each of the three archetypes are highlighted in red, orange, and green. Other donors’ trajectories are shown in grey.

In cohort analysis, average risk of an adverse outcome was lowest for iron replete donors and highest for donors with absent iron at the index donation; average risk for low-iron donors was in between that of the other two cohorts (**Figure 5**). While overall adverse outcome risk declined with longer donation intervals for all three cohorts, risk of a low iron donation increased with longer intervals for donors with absent iron stores at the index donation. when defining cohorts based on the tertile of venous hemoglobin at the index donation, donors with venous hemoglobin in the lowest tertile (9.8-13 g/dL) had the highest risk of any adverse outcome (**Figure 6**). Whereas an absent iron donation was the most likely adverse outcome for a donor with absent iron stores at index donation, hemoglobin deferral was the most likely adverse outcome for a donor in the lowest tertile of venous hemoglobin at index donation. Average risk trajectory also differed across cohorts defined by gender (**Figure S6**), number of red blood cell units donated over the prior two years (**Figure S7**), self-reported iron supplementation use (**Figure S8**), and composite dietary heme iron intake (**Figure S9**).

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F5)

Figure 5: 
Average risk trajectory with 95% confidence intervals for donors in the first return dataset stratified by iron status at index donation, defined by the donor’s ferritin level.

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F6)

Figure 6: 
Average risk trajectory with 95% confidence intervals for donors in the first return dataset stratified by venous hemoglobin (HGB) measured at the index donation in g/dL.

## DISCUSSION

This analysis of 7279 index donations from the RISE study found that risk of iron-related adverse outcomes at follow-up donations can be estimated as a function of the interval before a follow-up donation attempt and that individual donors’ risk trajectories are highly heterogeneous. For most donors, estimated risk decreased precipitously if they waited longer to return, suggesting that longer minimum donation intervals would prevent some cases of donor-associated iron deficiency and hemoglobin deferrals. For some donors, risk remained high even with a 250-day donation interval. This heterogeneity in estimated risk trajectories suggests that uniform or sex-based intervals may be insufficient. Including ferritin as a predictor improved risk estimation, particularly with respect to estimating risk of absent iron donations. For some donors, estimated risk of an adverse outcome remained over 90% even for a 250-day donation interval. These donors may have underlying (and potentially undiagnosed) iron deficiency or a related condition, which may make them poor candidates for repeat blood donation. The heterogeneity in donor risk and the predictive power associated with the use of donor characteristics should be examined further in order to facilitate policy design such as personalized inter-donation intervals.

Our analysis has several limitations. Most notably, the RISE study population is not representative of a typical repeat blood donor population. RISE participants were asked to commit to frequent blood donation, and recruitment was targeted to achieve proportional representation based on sex and donation history [19]. We restricted our analysis to the subset of donations in the RISE study for which ferritin was measured, which may further bias our findings. Further study is needed to assess the generalizability of our prediction model’s performance to a more representative blood donor population. Many of the features we used for prediction are highly correlated (e.g., venous and fingerstick hemoglobin; 12- and 24-month donation history), which can cause feature importance to ‘spread’ over correlated features [20]. Due to this, our feature importance method should only be interpreted as which features the model relied on most (or was most sensitive to) rather than which features are most correlated with adverse outcome risk. To calibrate our model, we assumed the distribution of absent, low, and replete iron status for follow-up donations without a ferritin measurement mirrored the distribution across follow-up donations at which ferritin was measured, but this may not be the case.

We see several ways the approaches reported here can be used to gain further insights into tailored donation intervals for blood donors. Extension of this work to larger blood center operational datasets outside of specific clinical studies will provide information on the effectiveness of machine learning models when the quality and completeness of information may be more limited. Key features identified in this analysis are readily available such as donor hemoglobin/hematocrit, donation interval, and increasingly ferritin measurement. Other features such as venous hemoglobin and survey assessments of donor dietary habits and supplementation are not likely to be implemented as standard donor assessments.

Despite the limitations, our analysis demonstrates that repeat donors have heterogeneous risk of iron-related adverse outcomes as a function of their donation interval, and machine learning models can estimate individual donors’ risk trajectories. Such predictive models could be a valuable tool for managing risks to donors while ensuring a sufficient blood supply.

## Data Availability

The RISE dataset was accessed through the National Heart, Lung, and Blood Institute (NHLBI) Biolincc repository ([https://biolincc.nhlbi.nih.gov](https://biolincc.nhlbi.nih.gov)). Our Research Materials Distribution Agreement prohibits publication of the raw data, but other researchers can submit a data request to NHLBI at no charge.

[https://www.doi.org/10.5281/ZENODO.5247221](https://www.doi.org/10.5281/ZENODO.5247221) 

## DECLARATIONS

## Funding

WAR was funded by a Stanford Interdisciplinary Graduate Fellowship.

## Conflicts

The authors have no conflicts of interest to declare.

## Ethics/Consent

Because our analysis used fully de-identified human subjects data for a secondary analysis, this study was exempted from full IRB review by the Stanford University IRB.

## Data and materials

The RISE dataset was accessed through the National Heart, Lung, and Blood Institute (NHLBI) Biolincc repository ([https://biolincc.nhlbi.nih.gov](https://biolincc.nhlbi.nih.gov)). Our Research Materials Distribution Agreement prohibits publication of the raw data, but other researchers can submit a data request to NHLBI at no charge.

## Code availability

All code is uploaded to a public repository at [https://www.doi.org/10.5281/ZENODO.5247221](https://www.doi.org/10.5281/ZENODO.5247221)

## Authors’ contributions

All authors contributed to study design. WAR conducted the analysis and composed the manuscript; BC and DS edited the manuscript.

## SUPPLEMENTAL MATERIALS

### SUPPLEMENTAL METHODS

#### Calibration

Our calibration procedure was as follows: we totaled each follow-up outcome in the first return dataset as *n*(*k*), where *k* = −1,0,1,2,3 correspond to a donation with unknown iron status (no ferritin measurement); a no adverse outcome donation, a hemoglobin deferral, a low iron donation, and an absent iron donation, respectively. We then calculated *ñ*(*k*), an estimation of what the totals would have been if ferritin were measured for all follow-up donations assuming the distribution of outcomes was the same as for completed donations with ferritin measures. These were calculated as *ñ*(1) = *n*(1) (hemoglobin deferral) and ![Graphic][1]</img> for *k* = 0,2,3 (completed donations). We then used our top model configuration to generate the unnormalized probability vector ![Graphic][2]</img> for each index donation *i* in the first return dataset. We computed weights *W*(*k*) for the unnormalized probability of each outcome ![Graphic][3]</img> by solving the system of equations ![Graphic][4]</img> for each index donation *i ∈* 1,2, …, *l* and *k* = 0,1,2,3. The final calibrated model used parameters *w*(*k*) together with the uncalibrated scores from the model ![Graphic][5]</img> to produce the estimated likelihood of each outcome at a follow-up donation as ![Graphic][6]</img> This ensured that the expectation of the distribution of the predicted outcome for the first return dataset would correspond to our estimated totals *ñ*(*k*)

### SUPPLEMENTAL TABLES

View this table:
[Table S1:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/T2)

Table S1: 
List of features for prediction model with description and notes from feature engineering.

View this table:
[Table S2:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/T3)

Table S2: 
Model types and hyperparameters assessed as candidates. All hyperparameter combinations were assessed in 5-fold cross validation on each of 15 model validation sets defined by the nested cross validation scheme. SB = Standard biomarkers version, XB = extra biomarkers version

View this table:
[Table S3:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/T4)

Table S3: 
Calibration weights calculated for matching the expected distribution of the risk scores to the estimated distribution in the first return dataset. Compared to the raw risk predictions generated by the model trained in the model development dataset, calibration down-weighted risk of hemoglobin deferral (evidenced by a calibration weight less than 1) and up-weighted likelihood of the other three outcomes for both models.

View this table:
[Table S4:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/T5)

Table S4: 
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) model reporting checklist.

### SUPPLEMENTAL FIGURES

![Figure S1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F7.medium.gif)

[Figure S1:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F7)

Figure S1: 
Average multiclass AUC for each evaluated model configuration as assessed using 5-fold cross validation and averaged across 15 tuning sets (excluding ensemble models). Each configuration (model type and hyperparameter set) is plotted as a dot. Distributions for each model type are shown.

![Figure S2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F8.medium.gif)

[Figure S2:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F8)

Figure S2: 
Distribution of multiclass AUC for across the 15 tuning sets for the top ensemble model configurations and the base model configurations that comprised them. For both the “standard” and “extra biomarkers” versions, the top ensemble was an average of the base models.

![Figure S3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F9.medium.gif)

[Figure S3:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F9)

Figure S3: 
Relative variable importance for the top “standard biomarkers” model.

![Figure S4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F10.medium.gif)

[Figure S4:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F10)

Figure S4: 
Relative variable importance for the top “extra biomarkers” model.

![Figure S5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F11.medium.gif)

[Figure S5:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F11)

Figure S5: 
Individual risk trajectory for sixty randomly selected index donations. X-axis indicates the donation interval (days until a return donation attempt) and the height of the colored areas indicate the risk of each possible outcome: no adverse outcome (cyan), hemoglobin deferral (yellow), low iron donation (orange), and absent iron donation (red).

![Figure S6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F12.medium.gif)

[Figure S6:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F12)

Figure S6: 
Average risk trajectory with 95% confidence intervals for donors in the first return dataset stratified by gender. Compared to men, women had higher estimated risk for absent iron donations and hemoglobin deferral but a similar average risk trajectory for low iron donations

![Figure S7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F13.medium.gif)

[Figure S7:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F13)

Figure S7: 
Average risk trajectory with 95% confidence intervals for donors in the first return dataset stratified by the number of red blook cell (RBC) units donated in the prior 24 months. Those who donated 2 or fewer units in the prior two years had lower risk of adverse outcomes, particularly absent iron donations.

![Figure S8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F14.medium.gif)

[Figure S8:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F14)

Figure S8: 
Average risk trajectory with 95% confidence intervals for donors in the first return dataset stratified by iron supplementation. Donors with ‘less than daily’ iron supplementation had lower risk of adverse outcomes, particularly hemoglobin deferral, whereas donors taking either no iron supplmeentation or daily iron supplementation had more similar risk trajectories. These results are not intuitive, but may be due to confounding variables, for which this analysis does not account. For example, donors with diagnosed anemia or a related condition may be more likely to take daily iron supplementation.

![Figure S9:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/11/2021.10.09.21264792/F15.medium.gif)

[Figure S9:](http://medrxiv.org/content/early/2021/10/11/2021.10.09.21264792/F15)

Figure S9: 
Average risk trajectory with 95% confidence intervals for donors in the first return dataset stratified by heme dietary iron intake score, which is calculated from self-reported dietary data, at index donation. On average, donors in the lowest tertile of heme iron intake had a higher estimated risk of an absent iron donation but similar risk trajectory for hemoglobin deferral or a low iron donation.

## Acknowledgments

The authors thank the NHLBI Biolincc repository for making the RISE dataset available at no charge, and we thank Dr. Bryan Spencer for providing the dietary heme iron intake scores he generated for a separate analysis.

## Footnotes

*   **Funding:** WAR was funded by a Stanford Interdisciplinary Graduate Fellowship.

*   **Conflicts:** The authors have no conflicts of interest to declare.

*   Received October 9, 2021.
*   Revision received October 9, 2021.
*   Accepted October 11, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## REFERENCES

1.  1.Cable RG, Glynn SA, Kiss JE, et al. Iron deficiency in blood donors: the REDS-II donor iron status evaluation (RISE) study. Transfusion. 2012;52(4):702–711. doi:10.1111/j.1537-2995.2011.03401.x
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1537-2995.2011.03401.x&link_type=DOI) 

2.  2.Salvin HE, Pasricha SR, Marks DC, Speedy J. Iron deficiency in blood donors: A national cross-sectional study. Transfusion. 2014;54(10):2434–2444. doi:10.1111/trf.12647
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/trf.12647&link_type=DOI) 

3.  3.Spencer BR, Bialkowski W, Creel DV, et al. Elevated risk for iron depletion in high-school age blood donors. Transfusion. 2019;59(5):1706–1716. doi:10.1111/trf.15133
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/trf.15133&link_type=DOI) 

4.  4.Baart AM, Van Noord PAH, Vergouwe Y, et al. High prevalence of subclinical iron deficiency in whole blood donors not deferred for low hemoglobin. Transfusion. 2013;53(8):1670–1677. doi:10.1111/j.1537-2995.2012.03956.x
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1537-2995.2012.03956.x&link_type=DOI) 

5.  5.Rigas AS, Sørensen CJ, Pedersen OB, et al. Predictors of iron levels in 14,737 Danish blood donors: results from the Danish Blood Donor Study. Transfusion. 2014;54(3 Pt 2):789–796. doi:10.1111/trf.12518
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/trf.12518&link_type=DOI) 

6.  6.Patel EU, White JL, Bloch EM, et al. Association of blood donation with iron deficiency among adolescent and adult females in the United States: a nationally representative study. Transfusion. 2019;59(5):1723–1733. doi:10.1111/trf.15179
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/trf.15179&link_type=DOI) 

7.  7.Custer B, Chinn A, Hirschler NV, Busch MP, Murphy EL. The consequences of temporary deferral on future whole blood donation. Transfusion. 2007;47(8):1514–1523. doi:10.1111/j.1537-2995.2007.01292.x
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1537-2995.2007.01292.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17655597&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F11%2F2021.10.09.21264792.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000248540100025&link_type=ISI) 

8.  8.Kiss JE, Vassallo RR. How do we manage iron deficiency after blood donation? 2018;181:590–603. doi:10.1111/bjh.15136
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/bjh.15136&link_type=DOI) 

9.  9.Baart AM, De Kort WLAM, Moons KGM, Vergouwe Y. Prediction of low haemoglobin levels in whole blood donors. Vox Sanguinis. 2011;100(2):204–211. doi:10.1111/j.1423-0410.2010.01382.x
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1423-0410.2010.01382.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20726956&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F11%2F2021.10.09.21264792.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000286146300005&link_type=ISI) 

10. 10.Baart AM, De Kort WLAM, Atsma F, Moons KGM, Vergouwe Y. Development and validation of a prediction model for low hemoglobin deferral in a large cohort of whole blood donors. Transfusion. 2012;52(12):2559–2569. doi:10.1111/j.1537-2995.2012.03655.x
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1537-2995.2012.03655.x&link_type=DOI) 

11. 11.Cable RG, Brambilla D, Glynn SA, et al. Effect of iron supplementation on iron stores and total body iron after whole blood donation. Transfusion. 2016;56(8):2005–2012. doi:10.1111/trf.13659
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/trf.13659&link_type=DOI) 

12. 12.Russell WA, Scheinker D, Custer B. altonrus/iron_trajectories: Code repository for Individualized risk trajectories for iron-related adverse outcomes in repeat blood donors. Zenodo. August 2021. doi:10.5281/ZENODO.5247221
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.5247221&link_type=DOI) 

13. 13.Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. 2015;102:148–158. doi:10.1002/bjs.9736
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/bjs.9736&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25627261&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F11%2F2021.10.09.21264792.atom) 

14. 14.Spencer BR, Fox M, Wise L, Cable R. A composite measure of heme iron consumption predicts incident iron depletion in repeat blood donors. In: Abstract of 29th Regional Congress of the ISBT. Vol 114. Basel: Vox Sanguinis; 2019:5–240. doi:10.1111/vox.12792
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/vox.12792&link_type=DOI) 

15. 15.Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics. 2006;7(1):91. doi:10.1186/1471-2105-7-91
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2105-7-91&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16504092&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F11%2F2021.10.09.21264792.atom) 

16. 16.Hand DJ, Till RJ. A Simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning. 2001;45(2):171–186. doi:10.1023/A:1010920819831
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1010920819831&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000170596600003&link_type=ISI) 

17. 17.Large J, Lines J, Bagnall A. A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates. Data Mining and Knowledge Discovery. 2019;33(6):1674–1709. doi:10.1007/s10618-019-00638-y
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s10618-019-00638-y&link_type=DOI) 

18. 18.Breiman L. Random forests. Machine Learning 2001 45:1. 2001;45(1):5–32. doi:10.1023/A:1010933404324
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1010933404324&link_type=DOI) 

19. 19.Cable RG, Glynn SA, Kiss JE, et al. Iron deficiency in blood donors: Analysis of enrollment data from the REDS-II Donor Iron Status Evaluation (RISE) study. Transfusion. 2011;51(3):511–522. doi:10.1111/j.1537-2995.2010.02865.x
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1537-2995.2010.02865.x&link_type=DOI) 

20. 20.Toloşi L, Lengauer T. Classification with correlated features: Unreliability of feature ranking and solutions. Bioinformatics. 2011;27(14):1986–1994. doi:10.1093/bioinformatics/btr300
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btr300&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21576180&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F11%2F2021.10.09.21264792.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000292554900016&link_type=ISI)

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif
 [3]: /embed/inline-graphic-3.gif
 [4]: /embed/inline-graphic-4.gif
 [5]: /embed/inline-graphic-5.gif
 [6]: /embed/inline-graphic-6.gif