PT - JOURNAL ARTICLE AU - Libin, Alexander AU - Treitler, Jonah T. AU - Vasaitis, Tadas AU - Shao, Yijun TI - Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes AID - 10.1101/2024.09.18.24313889 DP - 2024 Jan 01 TA - medRxiv PG - 2024.09.18.24313889 4099 - http://medrxiv.org/content/early/2024/09/19/2024.09.18.24313889.short 4100 - http://medrxiv.org/content/early/2024/09/19/2024.09.18.24313889.full AB - Artificial Intelligence (AI) fairness in healthcare settings has attracted significant attention due to the concerns to propagate existing health disparities. Despite ongoing research, the frequency and extent of subgroup fairness have not been sufficiently studied. In this study, we extracted a nationally representative pediatric dataset (ages 0-17, n=9,935) from the US National Health Interview Survey (NHIS) concerning COVID-19 test outcomes. For subgroup disparity assessment, we trained 50 models using five machine learning algorithms. We assessed the models’ area under the curve (AUC) on 12 small (<15% of the total n) subgroups defined using social economic factors versus the on the overall population. Our results show that subgroup disparities were prevalent (50.7%) in the models. Subgroup AUCs were generally lower, with a mean difference of 0.01, ranging from -0.29 to +0.41. Notably, the disparities were not always statistically significant, with four out of 12 subgroups having statistically significant disparities across models. Additionally, we explored the efficacy of synthetic data in mitigating identified disparities. The introduction of synthetic data enhanced subgroup disparity in 57.7% of the models. The mean AUC disparities for models with synthetic data decreased on average by 0.03 via resampling and 0.04 via generative adverbial network methods.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis project is funded by the pilot project 1OT2OD032581-02-388 under the NIH AIM-AHEAD initiative.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study used ONLY openly available human data that were originally located at: https://www.cdc.gov/nchs/nhis/data-questionnaires-documentation.htmI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe datasets generated during and/or analyzed during the current study are publicly available. https://www.cdc.gov/nchs/nhis/data-questionnaires-documentation.htm