Abstract
Background Analytic approaches to clinical validation of results from preclinical models are important in assessment of their relevance to human disease. This systematic review examined consistency in reporting of glioblastoma cohorts from The Cancer Genome Atlas (TCGA) and assessed whether studies included patient characteristics in their survival analyses.
Methods We searched Embase and Medline on 02Feb21 for studies using preclinical models of glioblastoma published after Jan2008 that used data from TCGA to validate the association between at least one molecular marker and overall survival in adult patients with glioblastoma. Main data items included cohort characteristics, statistical significance of the survival analysis, and model covariates.
Results There were 58 eligible studies from 1,751 non-duplicate records investigating 126 individual molecular markers. In 14 studies published between 2017 and 2020 using TCGA RNA microarray data that should have the same cohort, the median number of patients was 464.5 (interquartile range 220.5-525). Of the 15 molecular markers that underwent more than one univariable or multivariable survival analyses, five had discrepancies between studies. Covariates used in the 17 studies that used multivariable survival analyses were age (76.5%), pre-operative functional status (35.3%), sex (29.4%) MGMT promoter methylation (29.4%), radiotherapy (23.5%), chemotherapy (17.6%), IDH mutation (17.6%) and extent of resection (5.9%).
Conclusions Preclinical glioblastoma studies that used TCGA for validation did not provide sufficient information about their cohort selection and there were inconsistent results.Transparency in reporting and the use of analytic approaches that adjust for clinical variables can improve the reproducibility between studies.
Introduction
Glioblastoma, the most common primary brain tumour [1,2], is lethal and therapeutic options have only a modest and temporary impact on survival [3,4]. Discovery science has advanced our understanding of cancer cell biology and is a step towards developing novel therapies [5]. These discoveries are usually based on preclinical models, from which the relevance to human disease must be established. Demonstrating relevance requires quality clinical and biological data. The Cancer Genome Atlas (TCGA) [6] and the Chinese Glioma Genome Atlas (CGGA) [7] are two open-access resources from which laboratory scientists can interrogate human data to verify their findings in glioblastoma research. These resources are valuable for the molecular characterisation of glioblastoma and can also be used to examine the associations between molecular markers of interest and survival. An association with survival might implicate a molecular marker as a potential drug target.
Isolated analyses of genomic data are unlikely to provide an adequate assessment of the role of molecular features in patient outcomes. Univariable survival analyses that take on only one molecular marker do not account for other markers or clinical features [8]. The resulting associations from such analyses are subjected to confounding effects, which may render them unreliable. Multivariable analyses are preferable and should be facilitated by open access policies that permit researchers to use the same set of data for different analyses [9]. This is crucial for replicability and comparison of analyses, and to ensure the science that progresses to clinical trials is well founded.
Clinical validation of results from preclinical glioblastoma studies using TCGA or CGGA data is a common experimental step to substantiate research findings. This systematic review examined these studies for their consistency in reporting of cohorts from TCGA and CGGA and whether they included patient characteristics in their survival analyses.
Methods
Eligibility criteria
This review included studies that used data from TCGA or CGGA to examine the association between at least one molecular marker and overall survival in adult patients aged ≥18 years diagnosed with non-recurrent histopathologically confirmed glioblastoma. Studies using both TCGA and CGGA were eligible if results were stratified by the data resources. We only included studies that used cell or animal models to first identify molecular markers associated with tumour biology, then examined the association between these markers and overall survival in humans using TCGA or CGGA data. We excluded case reports, reviews, editorials and conference abstracts.
Study selection
We searched Embase and Medline on 02 February 2021 for potentially eligible studies published after January 2008 using search terms relating to “glioma”, “survival”, “TCGA” and “CGGA” (Supplementary Materials). The lower limit of the search period was set because data from TCGA first became available in 2008. After removing duplicate studies, two independent reviewers (B.F. and G.L.) performed screening using titles and abstracts followed by full-text eligibility assessment. Any disagreements at each stage were resolved through discussion with a third reviewer (M.T.C.P.).
Data extraction and data items
Two reviewers (B.F. and G.L.) independently collected data from each study using the online systematic review management software Covidence. Disagreements were resolved by discussion between the two reviewers or by involving a third reviewer (M.T.C.P.). Data items included study characteristics, TCGA cohort characteristics, CGGA cohort characteristics, genomic data used, molecular markers, and details of survival analysis. Molecular markers included expression, variants, or methylation of genes, RNAs and microRNAs. A set of molecular markers was defined by a grouping and analysis of >1 molecular markers together. We categorised survival analysis into univariable and multivariable analysis, and we collected the covariates entered into the multivariable analysis. To describe the association between molecular markers and survival, we considered the reported p value of <0.05 as statistical significance. If a study reported results from both TCGA and CGGA cohorts, we extracted the statistical significance of these results separately. Data on effect sizes and their corresponding 95% confidence intervals (CI) were not collected because studies using log-rank (Mantel-Cox) tests to compare survival between study-specific groups do not provide these data and there was no plan for meta-analysis.
Quality assessment
There was no risk of bias assessment tool directly relevant to studies in this review. However, we assessed components of the study design relating to risk of bias. These measures of quality included types and size of cohorts used for survival analysis, types of genomic data used from TCGA or CGGA, and the criteria used to select patients for survival analysis.
Summary statistics
We presented study characteristics, results and quality measures using descriptive statistics with stratification by type of survival analysis, univariable and multivariable, where available. The availability of data in TCGA increased over time and there are different numbers of patients in whom various types of data are available. To assess the reproducibility of cohort selection from TCGA, we summarised the number of patients in studies published between 2017-2020 using TCGA RNA microarray data because these specifications identified studies that have used the same cohort of patients. There was no meta-analysis of any association between molecular markers and overall survival.
Results
Study characteristics
This review included 58 eligible studies from 1,751 non-duplicate records retrieved from our systematic search. These studies investigated 126 individual molecular markers and 32 sets of molecular markers. Most (62.1%) studies were published in 2017-2020 and were from research teams based in the United States (34.5%), China (27.6%) and Europe (24.1%). The pre-clinical glioblastoma models used were cell lines and orthotopic mouse models in 51.7% and 48.3% studies, respectively. All studies used a form of data from TCGA with various combination with other data sources and two studies used data from CGGA (Table 1). RNA microarray data was the most common data type, used in 45 (77.6%) studies. When investigating the association between their markers of interest from pre-clinical models and survival using genomic data, more studies used univariable survival analyses only (70.7%) compared to those that used multivariable analyses (29.3%). All univariable analyses used the non-parametric log-rank (Mantel-Cox) method and all multivariable analyses used the Cox proportional hazards regression. There were 16 (27.6%) studies that described additional criteria for patient inclusion within the selected TCGA cohort.
Reproducibility and survival analysis
The date and requested data type of query in TCGA can result in a different number of patients available for survival analysis. To assess reproducibility of cohort selection from TCGA in the included studies, we summarised the numbers of patients in studies with similar data specifications. In 14 studies published between 2017 and 2020 using TCGA RNA microarray data without additional patient inclusion criteria, the median number of patients included was 464.5 (interquartile range [IQR] 220.5-525). Of these studies, 12 studies did not perform a multivariable survival analysis, therefore all should have the same number of patients included; the median number of patients included in the univariable survival analysis was 467 (IQR 196.75-528.75).
Among the 126 distinct molecular markers investigated in the included studies, 15 markers underwent more than one univariable or multivariable survival analysis (Table 2). The association of these markers with outcomes were consistent between different analyses most of the time. However, there were discrepancies between results for C-X-C Motif Chemokine Ligan 14 (CXCL14), epidermal growth factor receptor (EGFR), netrin 4 (NTN4), SRY-Box transcription factor 2 (SOX2), serglycin (SRGN) and miRNA-17-5p microRNA (Table 2). These discrepancies appear to relate to the type of survival analysis used (CXCL14, SOX2, SRGN) or the data type (EGFR, NTN4).
There were 17 studies that investigated the association between their molecular markers of interest and overall survival using a multivariable survival analysis. All these studies used TCGA data, which have clinical data available. The most frequently included clinical variable in the multivariable model was age (76.5%) (Figure 1). Other variables included pre-operative functional status (35.3%), sex (29.4%), MGMT promoter methylation (29.4%), radiotherapy (23.5%), chemotherapy (17.6%), IDH mutation (17.6%) and extent of resection (5.9%).
Discussion
There were studies in glioblastoma research that used data from publicly available genomic repositories to correlate pre-clinical experimental findings with clinical survival benefit in humans. These studies often had different numbers of patients included despite using the same data source and data type. Survival analyses often did not include other critical clinical variables associated with survival such as extent of resection [10], chemotherapy and radiotherapy [3,11]. In studies that performed a multivariable survival analysis, most clinical variables such as extent of resection and oncological treatment were not included. This yielded some inconsistent results between studies. Other results were subject to confounding effects by clinical variables that were not accounted for.
Reproducibility
Development of novel cancer therapies relies on reproducible results from preclinical research. The need for improving reproducibility is not new [12]. In cancer research, there is a heavy reliance on the preclinical literature for drug development [13]. However, issues with reporting bias, suboptimal reporting quality, varying reproducibility and preclinical model representation of disease impede the success in finding new therapies [14]. The availability of survival data in publicly available data from cancer genomics programmes presents an opportunity for researchers to assess the association between molecular markers and patient survival in a reproducible manner. These open access data sources provide data on the same cohort of patients, which encourages reproducibility between studies. However, our findings demonstrate that patient selection was not adequately described, resulting in different numbers of patients between studies that supposedly used the same dataset. There are reproducible ways of querying TCGA data, for example, using the ‘TCGABiolinks’ R/Bioconductor package [15] where code-based commands can be shared as supplementary materials. Adopting relevant aspects of reporting guidelines such as Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) [16], Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) [17] and REporting recommendations for tumour MARKer prognostic studies (REMARK) [18] can further improve transparency in reporting.
Confounding effects of clinical variables
Most studies did not consider clinical variables as potential confounders to the association between the molecular marker of interest and survival. There are nevertheless examples of associations that no longer exhibit a statistical significance after adjustment to clinical variables in a multivariable analysis (Table 2). Therefore, it is important to explore and consider confounders when assessing the effect of molecular markers on survival [19]. This is not a simple task because of data missingness, relatively small numbers of patients available, as well as correlations between clinical variables. Both data driven and clinically informed choice of covariates would be a reasonable approach [20].
Strengths and limitations
This systematic review assessed all pre-clinical studies that used data from TCGA or CGGA to validate findings from their laboratory experiments. Our data collection allowed comparison of findings between and within studies, which allowed our evaluation of replicability.
Clinical studies that examined associations of previously investigated molecular markers with survival were not included in this review. These studies may provide more detailed descriptions of cohort selection and may be more likely to consider confounding effects from clinical variables.This would mean an overestimation of inconsistencies and suboptimal analytic approaches in our review. However, any omission of consideration about patients being more than their tumours should be highlighted to re-orientate research focus to patient benefits. Collecting data on p values only to denote statistical significance was a pragmatic approach to describing associations reported in the included studies, since most studies did not report any effect sizes. This does not represent our views on the appropriate statistical approach and reporting of findings. We advocate reporting of effect sizes with their corresponding precision, adjusting for confounders. P values should not be used as a cut-off for the significance of an association [21]. There are other aspects of survival analyses that we did not assess, such as whether included studies tested for the proportional hazard assumption when using a Cox regression [22]. While these analytic procedures are important, reporting of these would not affect our findings.
Conclusions
Translational studies in glioblastoma research should increase their transparency to facilitate replicability. The validation of laboratory experimental findings using human data is important to demonstrate translational value; but this should be done with consideration of patient characteristics. Integration of expertise in pre-clinical, genomic and clinical studies may help to address the challenge of producing replicable and meaningful research through collaboration between scientists in different fields.
Data Availability
All data are publicly available through the cited papers and our supplementary materials.
Footnotes
↵* Co-first authors listed in alphabetical order
Declarations
Funding Michael TC Poon is supported by Cancer Research UK Brain Tumour Centre of Excellence Award (C157/A27589).
Conflicts of interest Authors declare no conflict of interest.
Data availability All data are available in the original publications of the included study in this systematic review
Ethical approval This systematic review did not require ethical approval.
Edits made to last paragraph of introduction. No change of data or other sections.