Abstract
Background We developed two unique machine learning (ML) models that predict risk of: 1) a major COVID-19 outbreak in the service county of a local HD population within following week, and 2) a hemodialysis (HD) patient having an undetected SARS-CoV-2 infection that is identified after following 3 or more days.
Methods We used county-level data from United States population (March 2020) and HD patient data from a network of clinics (February-May 2020) to develop two ML models. First was a county-level model that used data from general and HD populations (21 variables); outcome of a COVID-19 outbreak in a dialysis service area was defined as a clinic being located in one of the national counties with the highest growth in COVID-19 positive cases (number and people per million (ppm)) in general population during 22-28 Mar 2020. Second was a patient-level model that used HD patient data (82 variables) to predict an individual having an undetected SARS-CoV-2 infection that is identified in subsequent ≥3 days.
Results Among 1682 counties with dialysis clinics, 82 (4.9%) had a COVID-19 outbreak during 22-28 Mar 2020. Area under the receiver operating characteristic curve (AUROC) for the county-level model was 0.86 in testing dataset. Top predictor of a county experiencing an outbreak was the COVID-19 positive ppm in the general population in the prior week. In a select group (n=11,664) used to build the patient-level model, 28% of patients had COVID-19; prevalence was by design 10% in the testing dataset. AUROC for the patient-level model was 0.71 in the testing dataset. Top predictor of an HD patient having a SARS-CoV-2 infection was mean pre-HD body temperature in the prior week.
Conclusions Developed ML models appear suitable for predicting counties at risk of a COVID-19 outbreak and HD patients at risk of having an undetected SARS-CoV-2 infection.
Introduction
The 2019 coronavirus disease (COVID-19) pandemic is challenging the world’s healthcare systems, including bringing complexities to maintenance of dialysis in people with end stage kidney disease (ESKD) (1-5). In the United States, most ESKD patients are treated by outpatient hemodialysis (HD) where social distancing can be difficult and heightened infection control measures are required (e.g. temperature screenings, double-masking, isolation treatments/shifts/clinics) (1-5). ESKD patients are typically older and have multiple comorbidities, placing the population at higher risks for requiring intensive care and dying if affected by COVID-19 (6-12).
Early reports from the United States show an 11% COVID-19 mortality in ESKD (13), which is higher than the 3.2% COVID-19 mortality in the national population (14). This is not unexpected with reports from China and Europe suggesting a 16% and 23% COVID-19 mortality in ESKD (15, 16). Albeit the high mortality rate, an impaired immune response may render dialysis patients more frequently asymptomatic when infected by SARS-CoV-2 (15, 16). In both the general and EKSD populations, the most prevalent symptoms of COVID-19 at presentation are fever (11%-66% dialysis & 82% general population) and cough (57% dialysis & 62% general population) (15, 17, 18). The less frequent occurrence of signs and symptoms indicative of COVID-19 in dialysis patients could be making the outbreak even more challenging to manage.
Dialysis providers routinely capture patient/clinical data during care. The robust data collected during HD treatments (generally thrice weekly) provide unique opportunities to leverage artificial intelligence (AI) in predicting COVID-19 outcomes. AI modeling helped identify onset of the outbreak in China (19, 20) and is currently being used to help with early detection of areas/individuals in the general population at risk of being affected by COVID-19 (21-23).
As part of an healthcare operations effort in response to the COVID-19 outbreak, an integrated kidney disease healthcare company aimed to develop two unique machine learning (ML) prediction models that identify risk of: 1) a major COVID-19 outbreak in a dialysis service county, and 2) a HD patient having an undetected severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) infection. We analyzed the performance of these models to determine their possible utility for testing in the HD population.
Materials and Methods
General
An integrated kidney disease healthcare company (Fresenius Medical Care, Waltham, MA, United States) used COVID-19 data on the general United States population and HD patient data from a national network of dialysis clinics (Fresenius Kidney Care, Waltham, MA, United States) to develop two unique ML models that predict risk of: 1) a major COVID-19 outbreak in the county of dialysis clinic(s) servicing a local HD population in the following week (7 days), and 2) an adult HD patient having an undetected SARS-CoV-2 infection that is identified after the following ≥3 days.
This analysis was performed in adherence with the Declaration of Helsinki under a protocol reviewed by New England Independent Review Board (NEIRB). This retrospective analysis was determined to be exempt and did not require consent (Needham Heights, MA, United States; NEIRB#1-17-1302368-1).
Populations and Outcomes
County Population and COVID-19 Outbreak
For construction of a model to predict a major COVID-19 outbreak in the county for a local HD population, we obtained data on daily COVID-19 positive (COVID-19+) cases in each United States county during 14-28 Mar 2020 from The New York Times COVID-19 dataset (The New York Times Company, New York City, United States) (24). We also captured data on the total United States county population (25). We used retrospective county-level data from local HD populations at the national network of clinics from 08-28 Mar 2020.
We used data from all counties with ≥1 dialysis clinic and >5 HD patients. The outcome of a major COVID-19 outbreak in a dialysis service area (county) was defined was defined by a clinic being located in one of the top 300 national counties that had a combination of the highest (a) growth rate in COVID-19+ cases (number and people per million (ppm)) in the general population, and (b) highest number of COVID-19+ cases ppm in the last week of the observation period (22-28 Mar 2020).
Patient Population and SARS-CoV-2 Infection
For the construction of a model to predict individuals with an undetected SARS-CoV-2 infection, we considered data from adult (age ≥18 years) HD patients treated in the national network. Positive arm included data from patients who had ≥1 laboratory (rRT-PCR) confirmed COVID-19+ test as of the end of the observation period (27 Feb 2020 through 04 May 2020). Negative arm included data from patients who: 1) resided in counties with no reported COVID-19 cases as of 12 April 2020, and/or 2) had been laboratory tested and were found COVID-19 negative.
We defined the index date of a HD patient having a SARS-CoV-2 infection as the date of the first recorded suspicion that led to a COVID-19+ test, or the date of the COVID-19+ test in patients without an earlier recorded suspicion. In control patients with a negative COVID-19 test result, the test date was used as the index date. In controls without a test (patients living in counties with no reported cases), the index date was randomly sampled from the positive cases’ index dates occurring before 30 March 2020. This cutoff was chosen to minimize the possibility that control patients were infected but not tested (or reported) until after the date defining negative counties of 12 April 2020. We included data from patients with 1) ≥1 laboratory (of any kind) collected both 1-14 days and 31-60 days before the individual’s prediction date (3 days prior to index date, further defined below), and 2) ≥1 HD treatment both 1-7 days and 31-60 days preceding the prediction date. We excluded data from patients suspected to have COVID-19 and/or pending laboratory testing.
AI Model Development
Software
We used Python version 3.7.7 (Python Software Foundation, Delaware, United States) to build two ML models utilizing the XGBoost package (26).
County-Level COVID-19 Outbreak Prediction Model
In the county-level model, we considered county-wide data in the general population for 4 variables and county-wide data on each local HD population based on the county of the dialysis clinic(s) for 17 a priori selected variables detailed in Table 1 and Figure 1A. The ML model was trained, validated, and tested using a random 60:20:20% split of the dataset.
Patient-Level SARS-CoV-2 Infection Prediction Model
In the patient-level model, we used 82 a priori selected treatment/laboratory variables (Tables 3 and 4; Figure 1B) up to the individually defined prediction date (3 days prior to the index date defined above) to predict the risk of a SARS-CoV-2 infection being identified in the following ≥3 days. This is intended to yield individual predictions at least 3 days in advance of symptoms that warranted testing. We used a 60:20:20% randomized split of COVID-19+ samples for the training, validation, and testing datasets, and added the same number of COVID-19 negative patients to only the training and validation datasets. The testing dataset used to evaluate final model performance had a higher number of COVID-19 negative samples added to more closely match the prevalence observed in the overall national HD population.
Statistical Methods
Descriptive Statistics
Descriptive statistics were tabulated for demographics and variables before the predicted COVID-19 outbreak in a county or HD patient SARS-CoV-2 infection.
Analysis of ML Model Feature Importance
Shapley values (27, 28) were calculated using the SHAP python package to determine the influence of each variable on model predictions (29, 30). SHAP value is calculated for each variable and each observation, representing a measure of impact (positive or negative) of the observed value’s contribution to each individual prediction. Overall feature importances for each model were calculated using the mean absolute values for each variable across all measures.
Analysis of ML Model Performance
Performance of ML models was measured by the area under the receiver operating characteristic curve (AUROC) in the training, validation, and testing datasets, as well as, the recall, precision, and lift in the testing datasets (Refer to Supplementary Methods for further details). AUROC, recall, and precision metrics yield scores on a scale of 0 (lowest) to 1 (highest). Lift metric yields an estimate of how many times more/less effective the model is at predicting the outcome versus not using a model. Cutoff thresholds for classifying predictions were selected to optimize recall, precision, and lift.
Results
Prediction of Counties with a COVID-19 Outbreak
County Characteristics
Among all counties with dialysis clinic(s) servicing local HD populations, 4.9% (n=82) of counties had a major COVID-19 outbreak during the week of 22-28 Mar 2020 (Supplementary Table 1), as defined as being in the top 300 United States counties with the highest growth in COVID-19+ people (number and ppm) in the general population (Table 1). Counties with a COVID-19 outbreak had a 30% higher population density compared to counties without an outbreak.
On average, the proportion of patients residing at an assisted living facility did not remarkably differ for local HD populations among counties with a major COVID-19 outbreak or not (Table 1). The proportion of HD treatments with high body temperatures (>100 F) in the 14 days before a COVID-19 outbreak was <1 percentage point different between county groups. The percent change in the 14-day county-wide mean value for clinical, treatment, and laboratory parameters during the prior 28 days before a COVID outbreak was small in local HD populations and unremarkable in all cases.
County-Level Prediction Model Feature Importance
Assessment of variable feature importance with SHAP values showed the top predictors of counties experiencing an outbreak were the COVID-19+ ppm in the general population during prior 7 days, number of deaths per COVID-19+ case in the general population during prior 7 days, and the change in 14-day mean pre-HD body temperature among local HD patients in prior 28 days (Figure 2A).
SHAP value plot in Figure 2B further shows the degree of positive or negative impact of each individual measurement on the prediction. For example, individual counties with a higher number of COVID-19+ ppm in the general population during prior 7 days (warmer colors) were associated with a higher SHAP value, and this was typically vice versa for individual counties with a lower number of COVID-19+ ppm (cooler colors).
County-Level Prediction Model Performance
The ML model had good performance in prediction of the 7-day risk for counties experiencing a COVID-19 outbreak. The AUROC for the model was 0.988, 0.799, and 0.857 in the training, validation, and testing datasets respectively (Figure 3).
A threshold of 0.85 provided the best performance for the model within goals of limiting false positives. In the testing dataset, the recall was 0.35 showing the model correctly predicted true positives for a COVID-19 outbreak in 35% of positive counties. The precision was 0.40 showing 40% of counties predicted to have an outbreak physically experienced a COVID-19 outbreak. The lift was 7.9 in the testing dataset, suggesting model use is 7.9 times more effective in predicting a COVID-19 outbreak in a county compared to not using a model.
Prediction of HD Patients with a SARS-CoV-2 Infection
Patient Characteristics
We identified data from 11,664 HD patients meeting eligibility criteria (3,249 COVID-19+ cases and 8,415 unaffected (control) patients). The prevalence of COVID-19+ cases (about 28% COVID-19+) in the training and validation datasets was consistent within the cohort. For the testing dataset used to evaluate final model performance, there was a 10% prevalence of COVID-19+ cases based on the designed data split.
In the cohort, there was a higher proportion of HD patients with a SARS-CoV-2 infection of black race, Hispanic ethnicity, and with diabetes (Table 2). Mean values for treatment and laboratory variables before a SARS-CoV-2 infection being identified in the subsequent ≥3 days (or concurrent index date in controls) are shown in Tables 3 & 4.
HD patients who contracted COVID-19 had only subtle, clinically unremarkable distinctions in treatment/laboratory characteristics before being suspected to have a SARS-CoV-2 infection compared to unaffected patients. Mean pre-/post-HD body temperatures (Table 3) and inflammatory markers (white blood cell (WBC) count and differential) (Table 4) before a SARS-CoV-2 infection being identified did not did not show a clinically relevant difference differ between groups. HD patients who had a SARS-CoV-2 infection identified in the following 3 days did have higher ferritin levels compared to unaffected patients.
Patient-Level Prediction Model Feature Importance
Calculation of variable feature importance with SHAP values found the top three predictors of HD patients having a SARS-CoV-2 infection were the patient’s mean pre-HD body temperature, blood urea nitrogen (BUN), and albumin (Figure 4A).
The SHAP value plot is shown in (Figure 4B). For the top predictor of mean pre-HD body temperature before a SARS-CoV-2 infection, higher body temperatures (warmer colors) were mostly associated with a positive SHAP value, while lower body temperatures (cooler colors) were always associated with a negative SHAP value.
Patient-Level Prediction Model Performance
The HD patient-level model had adequate performance in prediction of the 3-day risk of a SARS-CoV-2 infection. The ML model had an AUROC of 0.88, 0.69, and 0.71 in the training, validation, and testing datasets respectively (Figure 5).
Setting the threshold for classifying observations as positive or negative at 0.85 to minimize false positives, precision for the ML model in the testing dataset was 0.42 showing 42% of patients predicted to have a SARS-CoV-2 infection actually had symptoms in the subsequent ≥3 days and were confirmed to have COVID-19. The lift was 4.2, suggesting model use is 4.2 times more effective in predicting a HD patient who contracts COVID-19, as compared to not having a model. However, given the high threshold, recall was 0.04 showing the model correctly predicted true positives for a SARS-CoV-2 infection in 4% of positive HD patients.
Discussion
We successfully developed two ML prediction models using retrospective data that appear to have suitable performance in identifying counties at risk of a major COVID-19 outbreak in the following week and HD patients at risk of having a SARS-CoV-2 infection in the following ≥3 days. The top predictor of a COVID-19 outbreak in a county was the COVID-19+ ppm in the general population during prior week. The top predictor of a SARS-CoV-2 infection was the individual patient’s mean weekly body temperature.
Albeit top predictors are not surprising, the observed distinctions were subtle. Without insights from the models considering an array of variables, it would not be clear where one should classify a higher/lower risk for a county or patient that is meaningful. For instance, an increase of 0.2°F (0.1°C) observed in weekly pre-HD body temperature alone may not be considered actionable. The average pre-HD body temperature was 97.5°F (36.4°C) (primarily oral measurements) in our analysis and has been previously reported as 98.2°F (36.7°C) (31). Given 98.6°F (37°C) is the expected average in healthy populations, the lower body temperature of HD patients is of importance with the rather low incidence of fever presenting in dialysis patients with COVID-19 (11%-to-66% with fever (15, 17)).
Prospective testing appears warranted and we anticipate combination of the county- and patient-level models may yield the greatest early insights on where and when providers can focus additional resource allocations to combat COVID-19, and what otherwise asymptomatic HD patients might be most appropriate for COVID-19 testing and triage to an isolation shift/clinic. These models have potential to provide a data-driven way for dialysis providers/clinicians to predict local transmission rates of COVID-19 and individuals with undetected infections. As more data is captured in the COVID-19 outbreak, further prediction models that can classify the risk of morbid/mortal outcomes in dialysis patients need to be developed.
The potential applications of AI for COVID-19 have been previously detailed (32); the first priority was suggested as “early detection and diagnosis of the infection”. In the United States, county-level surveillance models have been developed to estimate dynamics of COVID-19 on hospital resource capacities (33) and relative risk of an emerging COVID-19 outbreak (34). Our model expands upon surveillance efforts by using advanced ML techniques to identify the risks of a COVID-19 outbreak in counties considering predictor variables specific to the HD population. The robustness of data and an a priori selection of variables to be included in both the county- and patient-level ML models bring value through assessment of feature importance; this allows for interpretation of meaningfulness of predictors, albeit it does not determine causality.
A systematic review identified several models developed using data from China for early detection of COVID-19 in suspected individuals (23). None were designed for chronic disease populations. One is an externally validated ML model that predicts COVID-19 in suspected asymptomatic patients (AUROC validation=0.872) (35). Another effort used a prediction model (AUROC validation=0.966) to develop logic for an 8 variable COVID-19 risk chart (36). A further model with an AUROC of 0.938 was created to detect COVID-19 pneumonia in patients admitting to a fever clinic (37). Other models use genomic/computed tomography data to diagnose COVID-19 (23). Consistent variables used across models included age, body temperature, and flu-like illness symptoms (23). Although these models were all reported to have suitable performance, all were subject to bias due to non-generalizable sampling of controls without COVID-19 and possible overfitting. We cannot rule out that our ML models may have similar bias, although they included large samples and the testing dataset for the patient-level model had relatively generalizable sampling for the dialysis population with respect to positives/negatives (13). Our patient-level model is unique in its ability to identify the risk of SARS-CoV-2 infection in patients without any suspicion of being affected with the disease.
These developed models hold promise to help providers with the COVID-19 pandemic and any subsequent wave(s) of outbreak (38, 39). Nonetheless, AI modeling is never 100% accurate and model risk classifications need to be interpreted within the extent of the model’s performance.
Conclusions
The developed AI models showed suitable performance in prediction of dialysis service areas at risk of becoming a COVID-19 hotspot and individual HD patients at risk of having an undetected SARS-CoV-2 infection. Prospective testing is needed, and these models should provide key insights for consideration by healthcare providers.
Data Availability
The datasets with dialysis patient data used for this analysis are not publicly available.
Disclosures
CM, JWL, SC, HH, YJ, LAU, FWM are employees of Fresenius Medical Care in the Global Medical Office. KMB, EDW, IADS, KB, JLH, RJK are employees of Fresenius Medical Care North America. LN is an employee of Fresenius Medical Care Deutschland GmbH in the EMEA Medical Office. PK is an employee of Renal Research Institute, a wholly owned subsidiary of Fresenius Medical Care. IADS, KB, PK, JLH, RJK, LAU, FWM have share options/ownership in Fresenius Medical Care. PK receives honorarium from Up-To-Date and is on the Editorial Board of Blood Purification and Kidney and Blood Pressure Research. JLH has directorships in the Renal Physicians Association Board of Directors and Nephroceuticals LLC Scientific Advisory Board. FWM has directorships in the Fresenius Medical Care Management Board, Goldfinch Bio, and Vifor Fresenius Medical Care Renal Pharma. JPK has no conflicts of interest to disclose.
Authors’ contributions
Design was performed by: CM, IADS, KB, PK, JLH, RJK, LAU, FWM. Data collection and analysis was performed by: CM, JWL, SC, HH, YJ, LAU. The interpretation, drafting and revision of this manuscript was conducted by all authors. The decision to submit this manuscript for publication was jointly made by all authors and the manuscript was confirmed to be accurate and approved by all authors.
Other contributions
We would like to acknowledge Vladimir M Rigodon for assistance with the composition of the regulatory protocol for this analysis.
Funding
Project and manuscript composition were supported internally by Fresenius Medical Care.
Footnotes
↵* Co-first authors