PT - JOURNAL ARTICLE AU - Liu, Xiaoli AU - Shen, Max AU - Lie, Margaret AU - Zhang, Zhongheng AU - Li, Deyu AU - Liu, Chao AU - Mark, Roger AU - Zhang, Zhengbo AU - Celi, Leo Anthony TI - Evaluating Prognostic Bias of Critical Illness Severity Scores Based on Age, Gender, and Primary Language in the USA: A Retrospective Multicenter Study AID - 10.1101/2022.08.01.22277736 DP - 2022 Jan 01 TA - medRxiv PG - 2022.08.01.22277736 4099 - http://medrxiv.org/content/early/2022/08/03/2022.08.01.22277736.short 4100 - http://medrxiv.org/content/early/2022/08/03/2022.08.01.22277736.full AB - Background Although severity scoring systems are used to support decision making and assess ICU performance, the likelihood of bias based on age, gender, and primary language has not been studied. We aimed to identify the potential bias of them such as Sequential Organ Failure Assessment (SOFA) and Acute Physiology and Chronic Health Evaluation IVa (APACHE IVa) by evaluating hospital mortality across subgroups divided by age, gender, and primary language via two large intensive care unit (ICU) databases.Methods This multicenter, retrospective study was conducted using data from the Medical Information Mart for Intensive Care (MIMIC, 2001-2019) database and the electronic ICU Collaborative Research Database (eICU-CRD, 2014-2015). SOFA and APACHE IVa scores were obtained from the first 24 hours of ICU admission. Hospital mortality was the primary outcome. Patients were stratified by age (16-44, 45-64, 64-79, and 80-), gender (female and male), and primary language (English and non-English) then assessed for discrimination and calibration in all subgroups. To evaluate for discrimination, the area under receiver operating characteristic (AUROC) curve and area under precision-recall curve (AUPRC) were used. Standardized mortality ratio (SMR) and calibration belt plot were used to evaluate calibration.Findings A total of 173,930 patient encounters (78,550 MIMIC and 95,380 eICU-CRD) were studied. Measurements of discrimination performed best for the youngest age ranges and worsened with increasing age (AUROC ranging from 0.812 to 0.673 for SOFA and 0.882 to 0.754 for APACHE IVa, p <0.001). There was a significant difference in discrimination between male and female patients, with female patients performing worse. With MIMIC data, patients whose primary language was not English performed worse than English speaking patients (AUROC ranging 0.771 to 0.709 [p <0.001] for SOFA). Measurements of calibration applied to SOFA showed a statistically significant overestimation of mortality in the youngest patients (SMR 0.55-0.6) and underestimation of mortality in the oldest patients (SMR 1.54-1.57). When using SOFA, mortality is overestimated for male patients (SMR 0.92-0.97) and underestimated for female patients (SMR 1.05-1.11) while mortality is overestimated for English-speaking patients (SMR 0.85) and greatly underestimated for non-English speaking patients (SMR 1.4). In contrast, the calibration applied to APACHE-IVa shows underestimation of mortality for all age groups and genders.Interpretation The differences in discrimination and calibration with increasing age, female gender, and non-English speaking patients suggest that illness severity scores are prone to bias in their mortality predictions. Caution must be taken when using these illness severity scores for quality benchmarking across ICUs and decision-making for practices among a diverse population.Funding Z.B.Z was funded by the National Natural Science Foundation of China (62171471).Evidence before this study We searched PubMed, arXiv, and medRxiv from the inception of the database to July 10, 2022, for articles published without language restrictions. The search terms were (illness severity score OR SOFA OR APACHE-II OR APACHE-IV OR SAPS) AND (evaluation OR performance OR bias) AND ((age OR older OR elderly OR 65 years old OR 80 years old OR subgroup) OR (gender OR Female OR male) OR (language speaking OR English speaking)). Multiple studies have explored the performance among their concerned subgroups with limited patients and hospitals such as over 80, older with sepsis, and surgical patients. Although a small number of studies have presented the performance of scores by age groups, they have not systematically examined the differences and bias between younger and older patients in depth. Few articles analyzed the differences between men and women. No study has discussed the evaluation performance between Non-English and English speakers. We identified that no studies have comprehensively reported the potential bias of clinical scores in the assessment of subgroups classified by age, gender, and English-speaking.Added value of this study To our best knowledge, we are the first to conduct a systematic bias analysis of the SOFA and APACHE-IVa scores to assess in-hospital outcomes across age (16-44, 45-64, 65-79, and 80-), gender (male and female), and English speaking (Yes and No) subgroups using multicenter data from 189 U.S. hospitals and 173,930 patients episodes. The assessment was performed covering discrimination (AUROC and AUPRC) and calibration (SMR and Calibration belt plot). We found that the AUROCs between the two scores decreased significantly with age. The illness severity exists underestimation for oldest patients and serious overestimation for youngest patients using SOFA score. Both scores demonstrated slightly better AUROCs for males. For Non-English speaking patients, SOFA showed a large reduction in AUROC and very significant underestimation compared to English speakers. Furthermore, there exists higher observed mortality of older patients, females, and Non-English speakers compared to their respective other subgroups using the same SOFA score.Implications of all the available evidence The aging of the ICU, especially the extremely rapid growth of patients over 80 years old. They exhibit unique characteristics with more comorbidities, frailty, worse prognosis, and the need for more humanistic care, which has evolved into a serious challenge for early clinical triage, diagnosis, and treatment. Females are more likely to withhold pain and not be transferred to the ICU for treatment, which leads to potentially more critical severity illnesses admitted to ICU compared to males. SOFA and APACHE-IVa scores are very important basis and standards for early ICU assessment of illness severity and decision-making. While these general phenomena were noticed in clinical practice of the mentioned subgroups, there is a lack of clear and detailed quantitative analysis of the bias in the use of these scores to protect these vulnerable populations and prevent potential unintentional harm to them. The U.S. is a multicultural and racially integrated country, and the number of Non-English speakers is rising every year which reflects greater socioeconomic and ethnic disparities. Limited communication can also have an impact on patient assessment and treatment. However, the use of the SOFA score for the evaluation of this group of patients has not been reported to date. In this study, we used multicenter data with a large sample size to identify potential bias using the SOFA and APACHE-IVa scores for all mentioned special groups of patients.Competing Interest StatementThe authors have declared no competing interest.Funding StatementZ.B.Z was funded by the National Natural Science Foundation of China (62171471).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The de-identification and anonymization were both strictly implemented in the MIMIC and eICU-CRD databases. Our retrospective study was exempted by the ethical review committee of the US.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors https://mimic.mit.edu/ https://eicu-crd.mit.edu/