Abstract
Objective We formulate population representativeness of randomized clinical trials (RCTs) as a machine learning (ML) fairness problem, derive new representation metrics, and deploy them in visualization tools which help users identify subpopulations that are underrepresented in RCT cohorts with respect to national, community-based or health system target populations.
Materials and Methods We represent RCT cohort enrollment as random binary classification fairness problems, and then show how ML fairness metrics based on enrollment fraction can be efficiently calculated using easily computed rates of subpopulations in RCT cohorts and target populations. We propose standardized versions of these metrics and deploy them in an interactive tool to analyze three RCTs with respect to type-2 diabetes and hypertension target populations in the National Health and Nutrition Examination Survey (NHANES).
Results We demonstrate how the proposed metrics and associated statistics enable users to rapidly examine representativeness of all subpopulations in the RCT defined by a set of categorical traits (e.g., sex, race, ethnicity, smoker status, and blood pressure) with respect to target populations.
Discussion The normalized metrics provide an intuitive standardized scale for evaluating representation across subgroups, which may have vastly different enrollment fractions and rates in RCT study cohorts. The metrics are beneficial complements to other approaches (e.g., enrollment fractions and GIST) used to identify generalizability and health equity of RCTs.
Conclusion By quantifying the gaps between RCT and target populations, the proposed methods can support generalizability evaluation of existing RCT cohorts, enrollment target decisions for new RCTs, and monitoring of RCT recruitment, ultimately contributing to more equitable public health outcomes.
Competing Interest Statement
This work was primarily funded by IBM Research AI Horizons Network. All authors were supported by IBM. Dr. Bennett, Ms. Qi, Dr. Gruen and Mr. Cahan were supported by Rensselaer Institute for Data Exploration and Applications. Dr. Bennett and Mr. Cahan were also supported by United Health Foundations.
Funding Statement
Dr. Bennett, Ms. Qi, Dr. Gruen and Mr. Cahan were supported by Rensselaer Institute for Data Exploration and Applications. Dr. Bennett and Mr. Cahan were also supported by United Health Foundations.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The procedures were approved by The Rensselaer's Institutional Review Board (IRB). IRB 1863: Equity in Clinical Trials has been recorded as IRB Review Not Required because it does not meet the regulatory definition of human subjects research. All methods were carried out following the NHLBI approved research plan: Equity in Clinical Trials, and all procedures were carried out in accordance with the applicable guidelines and regulations from NHLBI Research Materials Distribution Agreement.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The example ideal national patient data are calculated from the National Health and Nutrition Examination Survey (NHANES) 2015-2016 conducted by the National Center for Health Statistics (NCHS). The clinical trial data that support the findings of this study are available from Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available with permission of BioLINCC. The data generated during and analyzed during the current study are available in the GitHub repository, https://github.com/TheRensselaerIDEA/ClinicalTrialEquity.
Abbreviations
- (RCT)
- Randomized Clinical Trial
- (ML)
- Machine Leaning
- (NHANES)
- National Health and Nutrition Examination Survey
- (ACCORD)
- Action to Control Cardiovascular Risk in Diabetes
- (ALLHAT)
- Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial
- (SPRINT)
- Systolic Blood Pressure Intervention Trial