ABSTRACT
Objectives The active comparator new user (ACNU) cohort design has emerged as a best practice for the estimation of drug effects from observational data. However, despite its advantages, this design requires the selection and evaluation of comparators for appropriateness, a process which can be challenging, especially in the context of many drugs. In this paper, we introduce an empirical approach to rank candidate comparators in terms of their similarity to a target drug in high-dimensional covariate space.
Methods We generated new user cohorts for each RxNorm ingredient in five administrative claims databases, then extracted aggregated pre-treatment covariate data for each cohort across five clinically oriented domains. We formed all pairs of cohorts with ≥ 1,000 patients and computed a scalar similarity score defined as the average of cosine similarities computed within each domain for each pair. Ranked lists of candidate comparators were then generated for each cohort.
Results Across up to 1,350 cohorts forming 922,761 comparisons, drugs that were more similar in the ATC hierarchy tended to have higher cohort similarity scores, and the most similar candidate comparators for each of six example drugs consistently corresponded to alternative treatments for the target drug’s indication(s) that could be identified in the literature or publicly registered studies. 80%- 95% of cohorts had at least one comparator with a cohort similarity score ≥ 0.95.
Conclusion Empirical comparator recommendations may serve as a useful aid to investigators and could ultimately enable the automated generation of ACNU-derived evidence, a process that has previously been limited to self-controlled designs.
BACKGROUND
Pharmacoepidemiologic studies routinely employ cohort designs to estimate the effect of a drug treatment on an outcome using observational data. And while many variations on the cohort design exist, the active comparator new user (ACNU) design[1] has emerged over the past several decades as a best practice. This design, which is intended to emulate a head-to-head randomized trial, involves identifying cohorts of patients at the time of initiation of one of two drugs: a target drug and a comparator. Patients are then followed for the occurrence of the study outcome and the incidence is compared between groups, usually with adjustment for confounding factors. The primary advantage to this approach is the mitigation of biases known to impact comparisons against an inactive – i.e., unexposed or untreated – group. In particular, confounding by indication, healthy user bias, and healthy adherer bias may all be reduced by the adoption of an ACNU design.[1]
However, while the ACNU design has many advantages, it requires the study designer to select an appropriate comparator, and the process of identifying and determining comparator adequacy can be challenging. The ideal comparator group represents a counterfactual determined by the research question of interest. This question may concern whether the incidence of an outcome in the presence of the target exposure is different than what would have occurred if some other exposure were substituted for the target, or alternatively whether that incidence differs from what would have occurred if the target exposure were simply removed. In the former case, which is termed comparative effectiveness or safety research, investigators have historically selected comparators on the basis of perceived clinical equipoise and empirical feasibility. In the latter case, which is common in the drug safety surveillance context, the use of an active comparator otherwise thought to be wholly unrelated to the study outcome has been recommended[2–5], and a theoretical framework for its choice developed[6]. While the counterfactual represented by the comparator thus varies according to the research context, a common goal is to identify a comparator with a similar patient composition in terms of demographics and relevant medical characteristics to the target. And while several sets of authoritative recommendations[7] have emerged to guide investigators in the comparator selection process, it remains at least partially subjective and difficult to scale in a context with many target drugs. With thousands of drugs available on the market, each of which may be a viable comparator for another, the selection of comparators for all drugs is thus a large task, encompassing nearly one million target-comparator possibilities.
In this paper, we present an empirical approach for ranking potential comparator cohorts for pharmacoepidemiologic studies based on their similarity in high-dimensional covariate space. We propose a method to compute a scalar similarity score, the cohort similarity score, for all pairwise combinations of all active drug ingredients and rank candidate comparators for any given target drug based on this composite score. The cohort similarity score is based on the average of cosine similarity scores[8] computed within five separate covariate domains intended to emulate clinical thinking surrounding a treatment decision. These domains are demographics, medical history, prior medications, presentation, and visit context. We evaluate rankings based on the cohort similarity score in the context of existing drug classification schemes and assess whether they produce pairs of cohorts comprised of similar patient populations, based on having high proportions of covariates already in balance.
Throughout, we focus on six example drugs from three therapeutic areas: anticoagulation (warfarin and apixaban), rheumatoid arthritis (methotrexate and adalimumab), and multiple sclerosis (glatiramer and ocrelizumab). We demonstrate that this method produces ranked lists of comparators which broadly align with subject-matter knowledge, and which are already posited as good comparators in the literature or registered studies. Further, we propose that this approach may be of use both as a tool for researchers designing pharmacoepidemiologic studies and as a part of an eventual analytic pipeline to automate the evidence generation process. We provide the full suite of results, including some from data sources not originally included in this analysis, in an interactive web application (data.ohdsi.org/ComparatorSelectionExplorer), which can be used to identify and diagnose potential comparators for a given target cohort, and the code necessary for others to implement our approach in their own data (https://github.com/OHDSI/ComparatorSelectionExplorer).
METHODS
Data sources
We utilized data from five sources of insurance claims: Merative™ MarketScan® Commercial Claims and Encounters (CCAE), Merative™ MarketScan® Multi-State Medicaid (MDCD), Merative™ MarketScan® Medicare Supplemental (MDCR), Japan Medical Data Center (JMDC), and Optum© De- Identified Clinformatics© Data Mart. All data were mapped to the Observational Medical Outcomes Partnership Common Data Model[9] (OMOP CDM). Episodes of continuous exposure to individual RxNorm drug ingredients were inferred from pharmacy claims[10]. ICD diagnosis codes were mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT).
Cohort definitions
Prior to computing similarity scores, it was necessary to create cohorts for each active drug ingredient to be evaluated as either a target or comparator. As such, in each data source, we defined cohorts of new users for each RxNorm ingredient. While there are over 10,000 ingredient concepts in the RxNorm vocabulary, only a minority are used in medical practice and the number varies according to local usage patterns in each data source. To enter a given ingredient-specific cohort, patients were required to have 365 days of continuous observation in the database prior to the first observed exposure to that ingredient, which served as their index date. Patients were allowed to enter a given cohort only once, but there was no restriction on the number of distinct cohorts they could enter. In order to ensure robust estimation of covariate prevalence, we restricted analysis to cohorts with at least 1,000 patients.
Extraction of covariate data
Within each cohort, covariate data was extracted in five domains intended to capture the clinical thinking surrounding a treatment decision. These domains were:
Demographics: one covariate for female sex, one for male sex, and one covariate for each decade of age
Medical history: one covariate for each SNOMED condition observed in the period from the start of patient observation to 31 days before the index date
Presentation: one covariate for each SNOMED condition observed in the period from 30 days before the index date to the index date
Prior medications: one covariate for each RxNorm ingredient with exposure start at least 30 days prior to the index date and exposure end on or before the index date
Visit context: one covariate for the presence of an inpatient visit in the 30 days prior to the index date, and one for the presence of an emergency department visit in the same period
Information on the prevalence of each covariate in each cohort was stored in a separate table in the database from which it was extracted and used for the computation of similarity metrics. All covariates with prevalence < 1% were excluded to avoid distortion of similarity scores by extremely low prevalence estimates.
Computation of similarity metrics
Within each database, we formed all possible pairs of cohorts and computed similarity scores for each pair. We calculated the cosine similarity[8], the dot product of two vectors containing the target and comparator cohorts’ covariate prevalences divided by the product of their lengths, between the two cohorts, separately for each covariate domain, and then averaged across the five domains to get the final similarity score, termed the cohort similarity score. Informally, a cosine similarity score of 1 will occur if all covariates are the same between the two cohorts, while a score of 0 indicates that the cohorts have minimal agreement across the covariates. We then used the cohort similarity scores to generate ranked lists of comparators for each cohort, with scores closer to 1 indicating stronger comparator alignment. Additional information on cosine similarity can be found in the Appendix.
Under the assumption that cohorts with known relationships should have higher similarity than those without such relationships, we used the Anatomical Therapeutic Chemical (ATC) classification system[11] to compare the distribution of cohort similarity scores for cohort pairs related at different levels of the hierarchy. Cohort pairs were divided according to the most granular ATC level at which they co-occurred, if any. Pairs that co-occurred in a class at ATC level 5 were excluded from this analysis as this level is commonly used for individual drug products or combinations.
For a random subset of 100 cohorts within each data source, we also attempted to validate our ranked lists with a more familiar pharmacoepidemiological metric of covariate similarity, the standardized mean difference (SMD)[12]. For each of the randomly selected drugs, we identified their top comparator based on the cohort similarity score, calculated SMDs for all covariates present within the cohort pair and reported the proportion of covariates in various degrees of imbalance based on recommended cutoffs for the absolute SMD at 0.1, 0.2, 0.5, and 0.8 [13,14]. We then repeated this process for the comparator ranked 500th.
All code for cohort generation, covariate extraction, similarity calculation, and for our web application is publicly available at https://github.com/OHDSI/ComparatorSelectionExplorer.
RESULTS
Between 1,729 and 2,409 RxNorm ingredient-level new-user cohorts were generated per data source. Of these, between 988 and 1,359 cohorts had at least 1,000 patients (Table 1). The mean number of covariates with prevalence ≥1% extracted per cohort was 934.5 - 1,479.9 overall, 65.4- 120.0 in the “Presentation” domain, 336.9- 1,016.4 in the “Medical History” domain, and 293.0 - 486.2 in the “Prior Medications” domain. The number of demographic and visit context covariates was roughly constant by design.
Summary of cohorts generated, and covariates extracted, by data source and covariate domain
Between 487,578 and 922,761 pairs of cohorts with at least 1,000 patients were formed per data source. Among these, similarity was lowest in the “Presentation” domain (mean cosine similarity 0.213 to 0.490, depending on data source) and highest in the “Visit context” domain (mean cosine similarity 0.869 to 1.000, depending on data source). Between 90% and 95% of target cohorts with at least 1,000 patients had a comparator with an average cohort similarity score of ≥ 0.95. Among the six example drugs, the distribution of cohort similarity scores was consistently left-skewed, with most comparators lying between 0.7 and 0.9 (Figure 1). Similarity was lowest among pairs of cohorts that did not occur together in any ATC classes and tended to increase among pairs of cohorts that co-occurred within higher-level ATC classes (Figure 2).
Distribution of average cosine similarity scores computed across all available candidate comparators for six example drugs, by data source. Note that glatiramer and ocrelizumab did not meet the 1,000 patient sample size cutoff in MDCR or JMDC and thus results are not present for these data sources. The number of available comparators ranged from 988-1,359, depending on data source.
Distribution of cohort similarity scores (panel A) and number of comparisons (panel B), by relatedness in ATC hierarchy and data source. All target-comparator pairs were stratified by the distance between the two drugs in the ATC hierarchy (“4”, indicating that both drugs are in same low-level class, through “0”, indicating that the drugs do not share any common drug class ancestry). Cohort pairs in which the target and comparator are both members of an ATC level 5 class are omitted as these classes are frequently used for individual drugs or combination products. Whiskers are truncated at 10th and 90th percentiles. White circles indicate mean values.
The top five comparators for each of the six example drugs, by average rank across data sources (Table 2) and within data sources (Table 3), consistently corresponded to alternative treatments for each drug’s indication(s). For example, the top five comparators for glatiramer (interferon beta-1a, interferon beta-1b, dimethyl fumarate, fingolimod, and natalizumab) are all other disease-modifying therapies for multiple sclerosis[15], and those ranked sixth through tenth (teriflunomide, dalfampridine, ocrelizumab, corticotropin, and modafinil) are either disease modifying therapies or are used to treat symptoms of MS or MS relapse. Ranked comparator lists for these example drugs varied according to data source (Table 3, Figure 3 panel A), with the highest degree of correlation between CCAE, MDCD, and Optum. An example of between-data source heterogeneity is shown in Figure 3 panel B where, among 627 candidate comparators for warfarin observed in both CCAE and JMDC, rivaroxaban is identified as highly similar to warfarin in both data sources, whereas xylitol and vitamin A appear much more similar to warfarin in JMDC than in CCAE, while the reverse is true for hydralazine.
Top five comparators for six example drugs, according to average rank across data sources. Restricted to comparators occurring in at least two data sources. Entries are given as comparator name (average rank across data sources, mean cohort similarity score, number of data sources with information).
Top five comparators for six example drugs according to cohort similarity score, by data source. Entries are given as comparator name (cohort similarity score). Note that glatiramer and ocrelizumab did not meet the 1,000 patient sample size cutoff in MDCR or JMDC and thus results are not present for these data sources.
Pairwise rank correlations in cohort similarity scores between data sources for six example drugs (A) and cosine similarity scores among 627 candidate comparators for warfarin observd in both CCAE and JMDC (B).
Among the 100 target drugs randomly selected within each data source and their top comparators, most covariates were well balanced. When the comparator ranked 500th was chosen instead, the proportion of imbalanced covariates was higher (Figure 4). The average proportion of covariates with |SMD| ≤ 0.1 was between 72% and 84% for the top comparator versus between 28% and 45% for the comparator ranked 500th.
Proportion of covariates at various levels of imbalance according to standardized mean difference (SMD) among 100 randomly-selected target cohorts and their comparators ranked 1st and 500th, by data source.
DISCUSSION
Over the past several decades, the ACNU cohort design has emerged as a best practice for the estimation of drug effects from observational data. However, the implementation of this design may be mired by the difficulty of selecting an appropriate comparator for the target drug of interest, either because there are many potential alternatives to the target drug, or because the true target of inference is in fact a contrast between the target drug and its absence. The development of empirical comparator recommendations can aid investigator decision-making, simplifying the process of comparator selection and ultimately improving the quality of pharmacoepidemiological research. Furthermore, these recommendations may ultimately enable the automation of evidence generation using the ACNU design, something that previously relied on self-controlled designs[16,17] and temporal scans[18,19],.
In this paper, we demonstrated a method for ranking potential comparators based on similarity in high-dimensional covariate space. We showed that most target cohorts of reasonable size possessed at least one potential comparator with a high degree of similarity and that cohorts that were more related in the ATC hierarchy also had higher cohort similarity scores. For a random sample of 100 cohorts within each data source, we also showed that selecting the top-most similar comparator would lead to balance on the majority of covariates before any adjustment took place.
We additionally demonstrated that, for six target cohorts from a diverse set of therapeutic areas, the top-ranked comparators aligned well with subject-matter knowledge. That is, all were used in the treatment of at least one of the target cohorts’ indications. Indeed, many of these top-ranked comparators are found to be in use in head-to-head safety or effectiveness comparisons with the target cohort in the literature or in public registries of clinical and observational studies. For example, head-to- head studies exist comparing apixaban and its top comparator rivaroxaban[20,21], adalimumab and its top comparator certolizumab[22,23], and ocrelizumab and its top comparator teriflunomide[24]. While this is reassuring, it should also be noted that the correlation in comparator rankings across data sources was not consistently high (Figure 3). This may indicate that a given comparator is not uniquely fit-for- purpose across all data sources, suggesting that investigators take care to select a comparator using information from a source that at least resembles their intended analytic context.
This study complements a large body of work on drug similarity from within the bioinformatics community, including attempts to define similarity based on chemical structure, gene expression[25], genome-wide associations[26], co-occurrence of drugs with condition terms in the literature[27], individual patient data[28], or a combination of these[28]. However, our work is unique in its focus on the use of similarity indices for comparator selection rather than for identification of novel associations, and for its clinically oriented approach to feature extraction.
The empirical comparator recommendations provided here have several advantages. First, they are agnostic to investigator opinion and bias, both in terms of which drugs may serve as candidate comparators for a given target drug and which covariates should be used to determine empirical feasibility. While these decisions should be guided by subject-matter knowledge[29], no investigator is omniscient and thus empirical recommendations may lessen investigator degrees of freedom and potentially enable less biased comparisons. Additionally, empirical recommendations may reveal previously unknown comparators that are not necessarily alternative treatments for any of the target drug’s indications. They thus hold promise for the identification of appropriate negative control exposures, ultimately allowing for cohort studies that emulate user versus non-user comparisons within the less-biased ACNU framework. Additionally, these recommendations have been assessed across multiple large data sources comprised of diverse populations within and outside the United States. They have been observed, consistently across these data sources, to label drugs that are more closely related in the ATC hierarchy as more similar, to produce balance on a majority of covariates when the most similar comparator is chosen, and to identify top comparators that align with subject-matter knowledge.
A limitation of the approach introduced here is that the cohort similarity score is based on the marginal distributions of covariates in the two cohorts and as such is only a proxy for full covariate similarity. In contrast, propensity scores methods have the advantage of providing metrics (e.g., the c- statistic or the preference score[30]) that reflect the similarity in joint covariate distributions across cohorts, and such methods can furthermore be used to adjust for observed dissimilarity. However, unlike many propensity score models, which only incorporate a small number of covariates, the cohort similarity score dynamically defines a large set of covariates based on all available data elements. It should be noted that this data-adaptive approach has been incorporated into some propensity score methods like the high-dimensional propensity score (hdPS) [31] and, more recently, the large-scale propensity score (LSPS)[32]. However, the cohort similarity score remains the only approach that is scalable to millions of candidate target-comparator combinations, whereas PS models—particularly those using high-dimensional covariate data—can be computationally expensive, making them infeasible for screening large numbers of combinations. The cohort similarity score may therefore be best suited as a tool to narrow the comparator search space before techniques like hdPS/LSPS are employed. Finally, the approach presented here provides a standardized framework for comparator ranking which can be readily implemented by the public. This standardization improves reproducibility and scalability and ultimately lowers the burden on domain expertise for the investigator.
There are several factors that may limit the usefulness of the comparator recommendations provided here. Because this analysis was limited to cohorts of new users of individual RxNorm ingredients, comparator recommendations for ingredients that are used differently depending on dose, form, or route of administration may be less coherent than recommendations for ingredients which are used more consistently regardless of these factors. The same limitation may apply to ingredients that are used in combination preparations or which possess multiple, possibly unrelated, indications. However, because the method presented here is extensible to cohorts defined in an arbitrary manner, these limitations do not preclude the future development of recommendations for such products. Additionally the assessment of these recommendations was limited by the lack of an objective metric for defining comparator “appropriateness”. While we sought to benchmark appropriateness based on comparator location in the ATC hierarchy and appearance in published literature or registered studies, future work might attempt to go further by demonstrating that the use of empirical comparator recommendations can lead to less biased treatment effect estimates than traditional methods of comparator selection.
Finally, it should be clearly stated that empirical comparator recommendations are not a replacement for critical thinking in study design and analysis. Investigators may wish to use them as a “first pass” to identify and evaluate candidate comparators, after which more careful consideration and detailed analysis can determine true feasibility. Crucially, covariate adjustment is almost invariably required regardless of the quality of comparator chosen. Similarity between a target drug and comparator is no guarantee that any remaining covariate imbalance can be successfully adjusted via established methods like restriction or propensity score adjustment. And, conversely, low similarity relative to other recommendations is not a guarantee that successful adjustment is impossible.
Our work leaves open many exciting opportunities for further exploration of empirical comparator recommendations. Among these are additional approaches to validation, including a comparison against more computationally expensive techniques like assessing propensity score overlap. Future work might also explore the utility of adding sources of metadata on known or suspected drug- outcome relationships, such as the OHDSI Common Evidence Model[33,34], in order to improve the ease of identifying negative control exposures.
CONCLUSION
We have developed and evaluated a method for recommending comparators that yields promising results and which may be of use to investigators wishing to conduct ACNU studies of drug safety and effectiveness. An open-source tool set is available for those wishing to identify candidate comparators using their own data, and the full suite of results, including the comparator rankings for up to 1,359 drugs across the five data sources included here, is available to the public. We hope these tools and the results presented here will accelerate and improve the quality of future pharmacoepidemiologic research.
COMPETING INTERESTS
The authors declare no competing interests. All authors are employees or contractors of Janssen Pharmaceuticals and may hold stock or stock options in Johnson & Johnson
FUNDING
This work was funded by Janssen Research and Development, a subsidiary of Johnson & Johnson. All authors are employees of Janssen Research and Development. All authors are employees or contractors of Janssen Pharmaceuticals and may hold stock or stock options in Johnson & Johnson
AUTHOR CONTRIBUTIONS
APPENDIX
Definition of cosine similarity
Cosine similarity is defined as
where Ai and Bi are the ith components of vectors A and B. In the context of this paper, A and B represent vectors of covariate prevalence estimates from the target and comparator cohorts, respectively. Because these prevalence values are always non-negative, cosine similarities based on them will be bounded in the interval [0,1]. Two cohorts with identical vectors of covariate prevalence will have a cosine similarity of 1, as would two cohorts will exactly proportional vectors, whereas two cohorts with orthogonal vectors of covariate prevalence will have a cosine similarity of 0.
Worked example
Below is a demonstration of the computation of the demographic cosine similarity between an example target cohort (ocrelizumab) and one of it’s top-ranked comparator cohorts (teriflunomide) using data from the Merative™ MarksetScan Commerical Claims and Encounters Database. These data can be found and the result of the computation verified at https://data.ohdsi.org/ComparatorSelectionExplorer1.
This process is repeated for the other four covariate domains (which are not demonstrated here) and the domain-specific cosine similarities are averaged to get the final similarity score as below.
Footnotes
↵1 Note that these numbers may change as data sources are updated