PT - JOURNAL ARTICLE AU - Mayhew, Michael B. AU - Midic, Uros AU - Choi, Kirindi AU - Khatri, Purvesh AU - Buturovic, Ljubomir AU - Sweeney, Timothy E. TI - Towards Equitable Patient Subgroup Performance by Gene-Expression-Based Diagnostic Classifiers of Acute Infection AID - 10.1101/2022.04.24.22274125 DP - 2022 Jan 01 TA - medRxiv PG - 2022.04.24.22274125 4099 - http://medrxiv.org/content/early/2022/04/27/2022.04.24.22274125.short 4100 - http://medrxiv.org/content/early/2022/04/27/2022.04.24.22274125.full AB - Host-response gene expression measurements may carry confounding associations with patient demographic characteristics that can induce bias in downstream classifiers. Assessment of deployed machine learning systems in other domains has revealed the presence of such biases and exposed the potential of these systems to cause harm. Such an assessment of a gene-expression-based classifier has not been carried out and collation of requisite patient subgroup data has not been undertaken. Here, we present data resources and an auditing framework for patient subgroup analysis of diagnostic classifiers of acute infection. Our dataset comprises demographic characteristics of nearly 6500 patients across 49 studies. We leverage these data to detect differences across patient subgroups in terms of gene-expression-based host response and performance with both our candidate pre-market diagnostic classifier and a standard-of-care biomarker of acute infection. We find evidence of variable representation with respect to patient covariates in our multi-cohort datasets as well as differences in host-response marker expression across patient subgroups. We also detect differences in performance of multiple host-response-based diagnostics for acute infection. This analysis marks an important first step in our ongoing efforts to characterize and mitigate potential bias in machine learning-based host-response diagnostics, highlighting the importance of accounting for such bias in developing diagnostic tests that generalize well across diverse patient populations.Competing Interest StatementAll authors are full-time employees and shareholders in Inflammatix, Inc. Inflammatix is developing the IMX-BVN classifiers into a commercial test that is not yet for sale, and has several patents linked to the test.Funding StatementThis study was fully funded by Inflammatix, Inc.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:INF-IIS-01: IRB of Stanford University gave ethical approval for this work; INF-IIS-03: IRBs of Attikon University Hospital and the Hellenic Sepsis Study Group gave ethical approval for this work; INF-IIS-04: IRBs of Attikon University Hospital and the Hellenic Sepsis Study Group gave ethical approval for this work; INF-IIS-10: IRB of Robert Wood Johnson Barnabas Health gave ethical approval for this work; INS-IIS-11: IRB of Charite Universitatsmedizin Berlin gave ethical approval for this work; INF-IIS-19: IRB of Stanford University gave ethical approval for this work; INF-IIS-21: Ethics committee of Instituto de Investigacion Biomedica de Salamanca (IBSAL) gave ethical approval for this work; INF-02: Central IRB (WIRB IRB Study Number: 12614040 / WIRB IRB Protocol Number: 20191145) gave ethical approval for this work; INF-03: Ethics committee of Jehangir Clinical Development Centre Tvt. Ltd gave ethical approval for this workI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll demographic data from publicly available studies used in the present study are available upon reasonable request to the authors. These data will be made freely available to all as part of supplementary material upon peer-reviewed publication. Demographic data from one public study (GSE40165) was provided by the Oxford University Clinical Research Unit for analysis purposes only and cannot be redistributed.AUROCarea under the receiver operating characteristic curve. 3, 4, 7–9mAUCmulti-class AUC. 3, 4, 7–9xAUCacross-group (binary) AUC. 4, 7, 9xmAUCacross-group multi-class AUC. 4, 7, 9BVNbacterial-viral-noninfected. 3CIcredible interval. 7CPcomplete pooling. 3, 5LOO-PSISleave-one-out cross-validation approximation by Pareto-smoothed importance sampling. 3, 7MLmachine learning. 1, 2, 7, 11PCTprocalcitonin. 1, 3, 4, 9, 10PPpartial pooling. 3, 5–7ROCreceiver operating characteristic. 4