ABSTRACT
Importance Randomized clinical trials (RCTs) are the standard for defining an evidence-based approach to managing disease, but their generalizability to real-world patients remains challenging to quantify.
Objective To develop a multidimensional patient variable mapping algorithm to quantify the similarity and representation of electronic health record (EHR) patients corresponding to an RCT and estimate the putative treatment effects in real-world settings based on individual treatment effects observed in an RCT.
Design A retrospective analysis of the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist Trial (TOPCAT; 2006-2012) and a multi-hospital patient cohort from the electronic health record (EHR) in the Yale New Haven Hospital System (YNHHS; 2015-2023).
Setting A multicenter international RCT (TOPCAT) and multi-hospital patient cohort (YNHHS).
Participants All TOPCAT participants and patients with heart failure with preserved ejection fraction (HFpEF) and ≥1 hospitalization within YNHHS.
Exposures 63 pre-randomization characteristics measured across the TOPCAT and YNNHS cohorts.
Main Outcomes and Measures Real-world generalizability of the RCT TOPCAT using a multidimensional phenotypic distance metric between TOPCAT and YNHHS cohorts. Estimation of the individualized treatment effect of spironolactone use on all-cause mortality within the YNHHS cohort based on phenotypic distance from the TOPCAT cohort.
Results There were 3,445 patients in TOPCAT and 11,712 HFpEF patients across five hospital sites. Across the 63 TOPCAT variables mapped by clinicians to the EHR, there were larger differences between TOPCAT and each of the 5 EHR sites (median SMD 0.200, IQR 0.037-0.410) than between the 5 EHR sites (median SMD 0.062, IQR 0.010-0.130). The synthesis of these differences across covariates using our multidimensional similarity score also suggested substantial phenotypic dissimilarity between the TOPCAT and EHR cohorts. By phenotypic distance, a majority (55%) of TOPCAT participants were closer to each other than any individual EHR patient. Using a TOPCAT-derived model of individualized treatment benefit from spironolactone, those predicted to derive benefit and receiving spironolactone in the EHR cohorts had substantially better outcomes compared with predicted benefit and not receiving the medication (HR 0.74, 95% CI 0.62-0.89).
Conclusions and Relevance We propose a novel approach to evaluating the real-world representativeness of RCT participants against corresponding patients in the EHR across the full multidimensional spectrum of the represented phenotypes. This enables the evaluation of the implications of RCTs for real-world patients.
Question How can we examine the multi-dimensional generalizability of randomized clinical trials (RCT) to real-world patient populations?
Findings We demonstrate a novel phenotypic distance metric comparing an RCT to real-world populations in a large multicenter RCT of heart failure patients and the corresponding patients in multisite electronic health records (EHRs). Across 63 pre-randomization characteristics, pairwise assessments of members of the RCT and EHR cohorts were more discordant from each other than between members of the EHR cohort (median standardized mean difference 0.200 [0.037-0.410] vs 0.062 [0.010-0.130]), with a majority (55%) of RCT participants closer to each other than any individual EHR patient. The approach also enabled the quantification of expected real world outcomes based on effects observed in the RCT.
Meaning A multidimensional phenotypic distance metric quantifies the generalizability of RCTs to a given population while also offering an avenue to examine expected real-world patient outcomes based on treatment effects observed in the RCT.
Competing Interest Statement
Dr. Thangaraj, Dr. Oikonomou, and Dr. Khera are coinventors of a provisional patent not related to the current work (63/606,203). Dr. Khera is an Associate Editor of JAMA and receives research support, through Yale, from the Blavatnik Foundation, Bristol-Myers Squibb, Novo Nordisk, and BridgeBio. He is a coinventor of U.S. Provisional Patent Applications 63/177,117, 63/428,569, 63/346,610, 63/484,426, 63/508,315, 63/580,137, 63/562,335, and a co-founder of Ensight-AI, Inc and Evidence2Health, LLC. Dr. Oikonomou is an academic co-founder of Evidence2Health LLC, and has been a consultant for Caristo Diagnostics, Ltd and Ensight-AI, Inc. He is a co-inventor in patent applications (US17/720,068, 63/619,241, 63/177,117, 63/580,137, 63/606,203, 63/562,335,WO2018078395A1, WO2020058713A1) and has received royalty fees from technology licensed through the University of Oxford. Dr. Suchard receives grants and contracts from the US Food & Drug Administration, the US Department of Veterans Affairs and Johnson & Johnson, all outside the scope of this work.
Funding Statement
The study is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health (R01HL167858). Dr. Thangaraj and Dr. Oikonomou are also supported by grants from the National Institutes of Health (5T32HL155000-03 and 1F32HL170592-01, respectively).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Yale Institutional Review Board reviewed this study, and a waiver of consent was granted because it was a retrospective study of medical records.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Re-merged pdf because histogram bars were not showing
Data Availability
The TOPCAT cohort is publicly available through the National Heart, Lung, and Blood Institute Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) The TOPCAT dataset is available at https://biolincc.nhlbi.nih.gov/studies/topcat/. The Yale electronic health record cohorts are not available due to the use of patient data.