Abstract
Objectives Trials often do not represent their target populations, threatening external validity. The aim was to assess whether age, sex, comorbidity count and/or race/ethnicity are associated with likelihood of screen failure (i.e., failure to be randomised to the trial for any reason) among potential trial participants.
Design Bayesian meta-analysis of individual participant-level data (IPD).
Setting Industry-funded phase 3/4 trials in chronic medical conditions. Participants were identified as “randomised” or “screen failure” using trial IPD.
Participants Data were available for 52 trials involving 72,178 screened individuals of whom 24,733 (34%) failed screening.
Main outcome measures For each trial, logistic regression models were constructed to assess likelihood of screen failure, regressed on age (per 10-year increment), sex (male versus female), comorbidity count (per one additional comorbidity) and race/ethnicity. Trial-level analyses were combined in Bayesian hierarchical models with pooling across condition.
Results In age- and sex-adjusted models, neither age nor sex was associated with increased odds of screen failure, though weak associations were detected after additionally adjusting for comorbidity (age, per 10-year increment: odds ratio [OR] 1.02; 95% credibility interval [CI] 1.01 to 1.04 and male sex: OR 0.95; 95% CI 0.91 to 1.00). Comorbidity count was weakly associated with screen failure, but in an unexpected direction (OR 0.97 per additional comorbidity, 95% CI 0.94 to 1.00, adjusted for age and sex). Those who self-reported as Black were slightly more likely to fail screening (OR 1.04; 95% CI 0.99 to 1.09); an effect which persisted after adjustment for age, sex and comorbidity count (OR 1.05; 95% CI 0.98 to 1.12).
Conclusions Age, sex, comorbidity count and Black race/ethnicity were not strongly associated with increased likelihood of screen failure. Proportionate increases in screening these underserved populations may improve representation in trials.
Trial registration Relevant trials in chronic medical conditions were identified according to pre-specified criteria (PROSPERO CRD42018048202).
Background
Women, older people, people with multi-morbidity and those from ethnic minorities (“underserved populations”) are inadequately represented in trials1–6. This systematic under-representation undermines the generalisability of trial findings and confidence in the selection of optimal treatment strategies for these groups7–11. Furthermore, this under-representation poses ethical issues which can undermine broader public confidence in health research. Despite commitments from funders, journal editors, trialists and policymakers to improve the recruitment and retention of people from underserved populations2, there has not been any evident change over the last decade1,4,5,12–14.
To become a trial participant, individuals undergo two rounds of selection – an invitation to screening phase and a screening phase proper (Figure 1). In the invitation phase, individuals receive and accept an invitation to attend screening. Such invitees are identified from clinical encounters (or occasionally from patient registers), usually by members of healthcare staff rather than the research trial team. In the screening phase, trial staff apply formal inclusion and exclusion criteria (based on demographic and clinical characteristics) to individuals, and eligible people are invited to participate. The number or characteristics of individuals invited and screened is not a reporting requirement of clinical trial registries such as ClinicalTrials.gov, nor are these items in the influential Consolidated Standards of Reporting Trials (CONSORT) checklist for trial publications15. As such, the contribution of invitation and/or screening related factors to underrepresentation is not well described. This is an important gap as it is remains unclear whether changing trial eligibility criteria – in line with recommendations from organisations such as the United States Food and Drug Administration (FDA) – are likely to improve representation.
We have previously studied age, sex and comorbidity in 116 phase 3/4 industry-funded trials for which we had access to individual participant-level data (IPD). We found that trial participants were younger and had lower comorbidity counts than members of the community with the same index condition6. For a subset of these trials, we have access to data on individuals screened for participation. Therefore, we now examine whether age, sex, comorbidity counts, and self-reported race/ethnicity, predict failure to progress to randomisation among individuals who have undergone trial screening.
Methods
Study design
This was a Bayesian meta-analysis of IPD from industry-funded phase 3 or 4 trials in chronic medical conditions. We explore whether individual demographic and clinical characteristics are associated with failing trial screening for any reason.
Data sources and participants
Available IPD were obtained from Phase 3 or 4 trials contained within the Vivli trial repository (https://vivli.org). Appropriate trials for inclusion were identified according to pre-specified criteria (PROSPERO CRD42018048202)6. In brief, we included trials conducted in chronic medical conditions that are managed pharmacologically (but excluding trials in cancer, infections, psychiatric and developmental disorders)6.
Participants were identified as “randomised” or “screen failure” from within the trial IPD (Figure 1). Age, sex, comorbidity count and race/ethnicity were extracted where available for randomised participants and screen failures.
As previously described6,16,17, comorbidities were defined using concomitant medications and pre-specified medical history-based definition (MedDRA codes) for cardiovascular disease, chronic pain, arthritis, affective disorders, acid-related disorders, asthma/chronic obstructive pulmonary disease, diabetes mellitus, osteoporosis, thyroid disease, thromboembolic disease, inflammatory conditions, benign prostatic hyperplasia, gout, glaucoma, urinary incontinence, erectile dysfunction, psychotic disorders, epilepsy, migraine, parkinsonism and dementia6. Individuals were considered to have a comorbidity if they had evidence of this comorbidity from either concomitant medications or medical history-based definition (or both). A comorbidity count was calculated as the sum of the number of comorbidities at baseline (excluding the index condition).
Outcome
The outcome of interest was “screen failure”, defined as a failure to be randomised to a treatment group for any reason after entering the screening process.
Statistical analysis
Participant characteristics for randomised participants and screen failures were calculated for each available IPD trial. These included: age (mean and standard deviation; SD); sex (number and %); comorbidity count (mean and SD); number with 0, 1 and 2 or more comorbidities; race/ethnicity (number and %). Race/ethnicity categories used in this analysis were largely driven by those recorded in the trial IPD, which included White, Black or African descent (“Black”), Asian, American Indian or Alaska Native and Native American or Other Pacific Islander (“Indigenous”) and multiple or other “Other”. Of these the first four were as per the FDA recommendations18; the fifth “Other” was formed by collapsing all other categories because of small numbers.
Full details of the modelling have been published previously16. In brief, for each trial, logistic regression models were constructed to assess likelihood of screen failure, regressed on age (per 10-year increment), sex (male versus female), comorbidity count (per 1 additional comorbidity) and race/ethnicity.
Coefficients, standard errors and variance/covariance matrices were exported for each model from the Vivli secure environment. The estimates from each trial were then meta-analysed in Bayesian hierarchical models. Each model had a multivariate normal likelihood, where for each trial the exported coefficients supplied a vector of means, and the exported variance-covariance matrices for these coefficients was the covariance matrix of the multivariate normal. In the primary analysis, models were fitted with trial nested within index condition. In secondary analyses, we then explored different structures for the model hierarchy, where trial was nested within both index condition and treatment comparison. For all models, trial was treated as a random effect.
We fitted models with five main sets of covariates: i) age and sex; ii) comorbidity count; iii) race/ethnicity; iv) age, sex and comorbidity count, and; v) age, sex, comorbidity count and race/ethnicity. We fitted additional models with interaction terms for selected two-way and three-way interactions. For sex, female was the reference category, while for race/ethnicity, White was the reference category (as this was the largest group and present in all trials) with dummy (indicator) variables for the remaining levels. In sensitivity analyses, we included only participants with one or more, or two or more comorbidities (other than the index condition).
For each model we report point estimate and 95% credible interval odds ratios for the association between each characteristic (age, sex, comorbidity count and race/ethnicity) and screen failure. These were obtained by exponentiating the posterior distributions and obtaining the mean, 2.5th and 97.5th percentiles. We additionally report between trial, between index condition and between treatment comparison variation for each parameter as the standard deviations. Finally, for model v) we present condition-specific odds ratios.
Results
Baseline data
There were 52 trials involving 72,178 screened individuals, of whom 24,733 (34%) failed screening (Table 1). The number of trials included in the sequential models reflected the data availability in the trials. Age and sex data were available for all 52 trials. Comorbidity count data (including for individuals who did not pass screening) were available for 31 trials and race/ethnicity data were available for 45 trials. Both race/ethnicity and comorbidity count data were available for 27 trials.
Characteristics of randomised participants (Randomised = “Yes”) and screen failures (Randomised = “No”) for included studies. Race/ethnicity categories included White (“White”), Black or African descent (“Black”), Asian (“Asian”), American Indian or Alaska Native and Native American or Other Pacific Islander (“Indigenous”) and Multiple or Other (“Other”).
Trial-level factors and screen failure
On visual inspection, there were no identifiable associations between year of trial conduct, trial size or trial phase and proportion of individuals who did not pass screening (Figure 2). Trial-level factors were not explored further in meta-analysis models.
Scatter plot of trial-level factors against percentage of participants who failed screening for any reason.
Individual participant-level factors associated with screen failure: primary analysis with trial nested within index condition
On modelling age and sex (n=52 trials), neither was associated with increased odds of screen failure (odds ratio [OR] 1.01; 95% CI 0.99 to 1.03 for age per 10-year increment; OR 0.97; 95% CI 0.93 to 1.01 for male versus female sex; Table 2). After additional adjustment for comorbidity count (n=31 trials), there was a weak association between reduced odds of screen failure for older age and for male sex, though the latter just crossed the null (Table 2).
Model 1: adjusted for age and sex; Model 2: comorbidity count only; Model 3: race/ethnicity only; Model 4: adjusted for age, sex and comorbidity count; Model 5: adjusted for age, sex, comorbidity count and race/ethnicity. For all models, trial was nested within index condition.
Comorbidity count was weakly associated with screen failure, but in an unexpected direction (n=31 trials). In a model including solely comorbidity count the OR was 0.93 per additional comorbidity (95% CI 0.87 to 1.00). On additionally adjusting for age and sex (n=31 trials), and age, sex and race/ethnicity (n=27 trials) the odds ratios were similar (Table 2). In sensitivity analyses (restricting analyses to participants with one or more, or two or more comorbidities), the association between comorbidity count and screen failure was considerably weaker, the credible intervals included the null and overall were consistent with no association (OR 0.99; 95% CI 0.97 to 1.02 for both sensitivity analyses).
On modelling race/ethnicity in a univariate analysis (n=45 trials), all the credible intervals included the null (Table 2); however, self-reported Black race/ethnicity appeared to be weakly associated with higher likelihood of failing screening (OR 1.04; 95% CI 0.99 to 1.09). After adjustment for age, sex and comorbidity count (n=27 trials), the point estimate and credible interval was similar (OR 1.05; 95% CI 0.98 to 1.12).
There was no evidence of an interaction between age and sex, sex and comorbidity count, or age, sex and comorbidity count (Supplementary Table S1). On modelling an interaction between male sex and Black race/ethnicity, there was a slightly stronger association in women (OR 1.09; 95% CI 1.00 to 1.19) than men (OR 0.97; 0.75 to 1.19). However, the credible interval for the odds ratio for the interaction included the null (OR 0.90; 0.70 to 1.08) and this comparison should be interpreted circumspectly.
Variation across trials, index conditions and treatment comparisons: secondary analyses
The associations between individual participant-level factors and screen failure were similar in models where index condition and treatment comparison were ignored, and where trial was nested within both index condition and treatment comparison (Table 3 and Supplementary Table S2). There was no evidence of substantial variation across index conditions or treatment comparisons for associations between age, race/ethnicity and screen failure (Table 3 and Figure 3). There was a clear association between male sex and reduced odds of screen failure in trials in hypertension and chronic obstructive pulmonary disease (Figures 3 and 4). For comorbidity count, although the point estimates were in the same direction (below one) for all the index conditions, the associations were more markedly negative for asthma and for rhinitis, and to a lesser extent, for osteoporosis and diabetes (Figures 3 and 4).
Models, adjusted for age, sex, comorbidity count and race/ethnicity, were conducted at three levels: trial (where condition and treatment were ignored); trial nested within condition; and trial nested within condition and treatment.
Results are displayed for age (per 10-year increase), sex (male versus female), comorbidity count (per 1 additional comorbidity) and self-reported /ethnicity. The black dotted line is the reference line (no effect OR = 1).
Results are displayed for age (per 10-year increase), sex (male versus female), comorbidity count (per 1 additional comorbidity) and self-reported race/ethnicity. The black dotted line is the reference line (no effect OR = 1).
Model diagnostics
The analysis code, model outputs from the trial-level logistic regression models fit within the trial safe havens, and the model outputs from the Bayesian hierarchical models, are available in the project GitHub repository (https://github.com/ChronicDiseaseEpi/screenfail_public). For the latter we also provide model diagnostics in terms of the number of divergent transitions, the Rhat and the bulk and tail effective sample sizes. There were no divergent transitions for any of the models. Rhat (a convergence diagnostic which compares the between-chain and within-chain estimates) was always <= 1.02 and for most models and terms was < 1.01, indicating satisfactory convergence. For some of the models where index condition was ignored, and those where trial was nested within index condition and treatment comparison the effective sample size was less than 400. However, for all the models presented in the main manuscript the effective sample sizes (bulk and tail) were >400.
Discussion
Aggregate data analyses of randomised trial participants show underrepresentation of women, those with multi-morbidity and non-white ethnic groups1–3. However, it is not clear whether this under-representation arises at the invitation or screening phase. In this IPD analysis of 52 trials in chronic medical conditions, there was a weak association between higher age and increased likelihood of screen failure, with a lower likelihood of screen failure in individuals of male sex, although the credible interval for the latter included the null. Considering the detected associations between participant characteristics and screen failure were small, under-representation may be more driven by selection at the invitation to screening phase, rather than by application of trial eligibility criteria by the trial team during screening.
Strengths and limitations
The strengths of this study are in the use of IPD across diverse trials conducted in chronic medical conditions, while limited previous comparisons have used aggregate trial data, questionnaires or been limited to particular index conditions. However, we acknowledge some limitations. Firstly, data describing screened populations are not routinely reported either in clinical trial repositories (such as ClinicalTrials.gov) nor in published trials: we cannot be certain of the accuracy of data collection within potential trial participants who have not consented to complete data collection19. Nevertheless, this limitation also demonstrates the scarcity and value of our data. Secondly, there were inadequate data available within the IPD to which we had access to allow us to examine the reasons for failing screening. While underlying associations for screen failure were weak overall, specific reasons for failing screening may be for other reasons, such as frailty, which we have not investigated. Thirdly, reflecting our initial trial selection, trials conducted in cancer, infections, psychiatric and developmental disorders were excluded. Similarly, we were only able to obtain IPD for trials contained within the Vivli trial data-sharing repository, and for sponsors who share data using this repository. It is possible that these findings are not representative of trials. Due to incomplete reporting of data on screen failures in trial registries such as ClinicalTrials.gov, we cannot measure the representativeness of these included trials for assessment of screen failure across other disease groups, sponsors or in non-industry funded trials; however, we previously illustrated that IPD trials are broadly representative of trials registered on ClinicalTrials.gov for assessment of trial attrition16. Finally, our measure of comorbidity was crude – an overall count, and it is plausible that either specific comorbidities, interactions between comorbidities, and/or interactions between comorbidities and the index condition are predictive of screen failure.
Comparison with the previous literature
This is the first exploration examining participant characteristics associated with failing screening using trial IPD, across a wide variety of phase 3 and 4 industry-funded trials conducted for chronic medical conditions. However, a few studies have examined trial selection using other methodologies. A nationally representative survey found that women and men were equally likely to be invited to participate in trials3. We demonstrate that women are slightly more likely to fail trial screening than men across most index conditions, and clearly more likely to fail screening in trials of hypertension and chronic obstructive pulmonary disease. In a previous analysis of trial IPD, we showed that attrition after randomisation is not more likely in women16. Together, these studies suggest that enhancing the proportion of women invited to screening may improve female representation in trials.
We identified a weak and inconsistent association between Black race/ethnicity and increased likelihood of failing screening. Our findings are in keeping with the literature, which shows that people from racial and ethnic minorities are not substantially more likely to decline trial participation if offered20,21, but remain systematically under-represented in trials 22–24. This has prompted the development of guidelines to recruit and retain participants from ethnic minority groups (Trial Forge Guidance 3)2. The guidelines point to unintended exclusions of ethnic minorities because of restrictive eligibility criteria and recruitment pathways (certain comorbidities are more common among ethnic minorities25,26); provision of trial materials and information in poorly accessible forms (e.g., failure to consider language support, differences in literacy or cultural differences in the nature of communication); lack of cultural competence among trial staff; and an absence of trusting relationships between trialists and ethnic minority people. Ethnic minority groups may also have different motivations for trial participation, particularly in countries where universal healthcare is not provided3, and may stem from historical events (e.g., Tuskegee syphilis study), as well as discrimination that persists27. Since we found at most a weak association between Black race/ethnicity and screen failure among screened participants, this suggests that under-representation is more likely to have arisen at the invitation rather than the screening phase. Furthermore, our findings show important heterogeneity in patterns across groups, highlighting the importance of studying specific racial and ethnic groups. Consequently, approaches to improve representation may also be more effective if targeted at the invitation phase.
We identified a paradoxical association such that lower comorbidity count was associated with increased likelihood of failing screening; however, in sensitivity analyses – excluding people with very low levels of comorbidity – there was no apparent association between comorbidity count and screen failure. The most likely explanation for this observation is reporting or recording bias: potential participants may be more likely to recall medications and conditions when they have decided to participate in a trial, or investigators may make greater efforts to record such information in individuals they think are unlikely to fail screening.
Implications for practice and policy
Ours is the first of which we are aware to meta-analyse associations between individual-level characteristics and failing to pass trial screening, and it was only possible due to our access to trial individual-level participant data. To better understand and improve trial representativeness, reporting guideline groups (such as CONSORT), representatives of journals (such as the International Committee of Medical Journal Editors), and trial registries which mandate results reporting (such as ClincialTrials.gov) may wish to consider requiring reporting of invited and screened participants as part of trial dissemination.
In lieu of more widespread reporting, our own findings – while limited to a relatively small and selected set of phase 3 and 4 industry-funded trials for which IPD were available – suggest that processes during the invitation to screening phase may be more important as regards trial representativeness.
Conclusion
Age, sex, comorbidity count and race/ethnicity were not strongly associated with increased likelihood of screen failure. Proportionate increases in screening these underserved populations may improve representation in trials.
Contributorship statement
D.A.M. conceived the idea for the article. P.H. and E.B. identified suitable trials for inclusion. D.A.M., S.H.W., F.S.M., B.G., N.W. and S.D. critically advised on statistical analysis and presentation. J.S.L., J.C. and D.A.M. carried out the analysis. J.S.L. and D.A.M. created tables and figures. J.S.L. wrote the first draft of the manuscript. K.G. and S.V.K. critically advised on presentation and interpretation. J.S.L. and D.A.M. are guarantors for the overall content. All authors reviewed and approved the final submitted manuscript.
Transparency declaration
The lead author affirms that the manuscript is an honest, accurate and transparent account of the study being reported.
Public and Patient Involvement statement
Patients and the public were involved directly in the trial data used for this analysis, but were not directly involved in the design or planning of the current analysis. This paper has important messages for patients and the public, and support from Patient and Public Involvement and Engagement (PPIE) groups at the University of Glasgow will be sought to assist with dissemination of the research message.
Ethical approval
Ethical approval was not required for this study.
Funding
David McAllister is funded via an Intermediate Clinical Fellowship and Beit Fellowship from the Wellcome Trust, who also supported other costs related to this project such as data access costs and database licences (“Treatment effectiveness in multimorbidity: Combining efficacy estimates from clinical trials with the natural history obtained from large routine healthcare databases to determine net overall treatment Benefits.” 201492/Z/16/Z). P.H. is funded through a Clinical Research Training Fellowship from the Medical Research Council (Grant reference: MR/S021949/1)
Declaration
This manuscript is based in part on research using data from data contributors, Boehringer Ingelheim, Eli Lilly, Roche, Takeda and UCB, that has been made available through Vivli, Inc. Vivli has not contributed to or approved, and Vivli, Boehringer Ingelheim, Eli Lilly, Roche, Takeda and UCB, are not in any way responsible for, the contents of this manuscript. SVK acknowledges funding from the Medical Research Council (MC_UU_00022/2) and the Scottish Government Chief Scientist Office (SPHSU17). Outside the submitted work, JSL acknowledges personal lectureship honoraria from Astra Zeneca, Pfizer and Bristol Myers Squibb.
Data sharing statement
Individual patient-level data are available from the Vivli Centre for Global Clinical Research Data platform (https://vivli.org). Trial-level results, model outputs and analysis code are provided on the project GitHub repository:
Supplementary Data file
Model A: adjusted for age, sex and age:sex interaction; Model B: adjusted for age, sex, comorbidity count and age:sex:comorbidity count interaction; Model C: adjusted for age, sex, comorbidity count, race/ethnicity and sex:race/ethnicity interaction. For all models, trial was nested within condition, then condition and treatment.
Model 1: adjusted for age and sex; Model 2: comorbidity count only; Model 3: race/ethnicity only; Model 4: adjusted for age, sex and comorbidity count; Model 5: adjusted for age, sex, comorbidity count and race/ethnicity. Models were conducted at three levels: trial (where condition and treatment were ignored); trial nested within condition; and trial nested within condition and treatment.
Footnotes
Author email addresses: Jennifer S Lees – jennifer.lees{at}glasgow.ac.uk
Jamie Crowther – jamie.crowther{at}glasgow.ac.uk
Peter Hanlon – peter.hanlon{at}glasgow.ac.uk
Elaine Butterly – elaine.butterly{at}glasgow.ac.uk
Sarah H Wild - sarah.wild{at}ed.ac.uk
Frances Mair - frances.mair{at}glasgow.ac.uk
Bruce Guthrie – bruce.guthrie{at}ed.ac.uk
Katie Gillies – k.gillies{at}abdn.ac.uk
Sofia Dias – sofia.dias{at}york.ac.uk
Nicky Welton – nicky.welton{at}bristol.ac.uk
Srinivasa Vittal Katikireddi – vittal.katikireddi{at}glasgow.ac.uk
David A McAllister – david.mcallister{at}glasgow.ac.uk