PT - JOURNAL ARTICLE AU - Rao, Gowtham A. AU - Shoaibi, Azza AU - Makadia, Rupa AU - Hardin, Jill AU - Swerdel, Joel AU - Weaver, James AU - Voss, Erica A AU - Conover, Mitchell M. AU - Fortin, Stephen AU - Sena, Anthony G. AU - Knoll, Chris AU - Hughes, Nigel AU - Gilbert, James P. AU - Blacketer, Clair AU - Andryc, Alan AU - DeFalco, Frank AU - Molinaro, Anthony AU - Reps, Jenna AU - Schuemie, Martijn J AU - Ryan, Patrick B TI - CohortDiagnostics: phenotype evaluation across a network of observational data sources using population-level characterization AID - 10.1101/2023.06.28.23291982 DP - 2023 Jan 01 TA - medRxiv PG - 2023.06.28.23291982 4099 - http://medrxiv.org/content/early/2023/06/30/2023.06.28.23291982.short 4100 - http://medrxiv.org/content/early/2023/06/30/2023.06.28.23291982.full AB - Objective This paper introduces a novel framework for evaluating phenotype algorithms (PAs) using the open-source tool, Cohort Diagnostics.Materials and Methods The method is based on several diagnostic criteria to evaluate a patient cohort returned by a PA. Diagnostics include estimates of incidence rate, index date entry code breakdown, and prevalence of all observed clinical events prior to, on, and after index date. We test our framework by evaluating one PA for systemic lupus erythematosus (SLE) and two PAs for Alzheimer’s disease (AD) across 10 different observational data sources.Results By utilizing CohortDiagnostics, we found that the population-level characteristics of individuals in the cohort of SLE closely matched the disease’s anticipated clinical profile. Specifically, the incidence rate of SLE was consistently higher in occurrence among females. Moreover, expected clinical events like laboratory tests, treatments, and repeated diagnoses were also observed. For AD, although one PA identified considerably fewer patients, absence of notable differences in clinical characteristics between the two cohorts suggested similar specificity.Discussion We provide a practical and data-driven approach to evaluate PAs, using two clinical diseases as examples, across a network of OMOP data sources. Cohort Diagnostics can ensure the subjects identified by a specific PA align with those intended for inclusion in a research study.Conclusion Diagnostics based on large-scale population-level characterization can offer insights into the misclassification errors of PAs.Competing Interest StatementAll authors are employees of Janssen Research & Development, LLC, and shareholders of Johnson & Johnson (J&J) stock. This study was sponsored by Janssen Research & Development, LLC. Funding StatementThis study was sponsored by Janssen Research & Development, LLCAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe data that support the findings of this study are available to license from Merative, Optum, IQVIA and JMDC.