Analytic approaches to clinical validation of results from preclinical models of glioblastoma: a systematic review ================================================================================================================== * Beth Fitt * Grace Loy * Edward Christopher * Paul M. Brennan * Michael T.C. Poon ## Abstract **Introduction** Analytic approaches to clinical validation of results from preclinical models are important in assessment of their relevance to human disease. This systematic review examined consistency in reporting of glioblastoma cohorts from The Cancer Genome Atlas (TCGA) or Chinese Glioma Genome Atlas (CGGA) and assessed whether studies included patient characteristics in their survival analyses. **Methods** We searched Embase and Medline on 02Feb21 for studies using preclinical models of glioblastoma published after Jan2008 that used data from TCGA or CGGA to validate the association between at least one molecular marker and overall survival in adult patients with glioblastoma. Main data items included cohort characteristics, statistical significance of the survival analysis, and model covariates. **Results** There were 58 eligible studies from 1,751 non-duplicate records investigating 126 individual molecular markers. In 14 studies published between 2017 and 2020 using TCGA RNA microarray data that should have the same cohort, the median number of patients was 464.5 (interquartile range 220.5-525). Of the 15 molecular markers that underwent more than one univariable or multivariable survival analyses, five had discrepancies between studies. Covariates used in the 17 studies that used multivariable survival analyses were age (76.5%), pre-operative functional status (35.3%), sex (29.4%) MGMT promoter methylation (29.4%), radiotherapy (23.5%), chemotherapy (17.6%), IDH mutation (17.6%) and extent of resection (5.9%). **Conclusion** Preclinical glioblastoma studies that used TCGA for validation did not provide sufficient information about their cohort selection and there were inconsistent results. Transparency in reporting and the use of analytic approaches that adjust for clinical variables can improve the reproducibility between studies. Keywords * GBM * animal models * cell lines * laboratory * survival analysis ## Introduction Glioblastoma, the most common primary brain cancer, is a fatal disease with patients’ median survival of 6-8 months [1,2]. Novel therapies from translational research are desperately needed because current therapeutic options have only a modest and temporary impact on survival [3,4]. Discovery science has advanced our understanding of cancer cell biology and is a step towards developing novel therapies [5]. These discoveries are usually based on preclinical models, from which the relevance to human disease must be established. Demonstrating relevance requires quality clinical and biological data. The Cancer Genome Atlas (TCGA) [6] and the Chinese Glioma Genome Atlas (CGGA) [7] are two open-access resources from which laboratory scientists can interrogate human data to verify their findings in preclinical glioblastoma research. These resources are valuable for the molecular characterisation of glioblastoma and can also be used to examine the associations between molecular markers of interest and survival. An association with survival might implicate a molecular marker as a potential drug target. Survival analyses using only genomic data are unlikely to have adequate clinical relevance because clinical factors also affect survival. An imbalance of clinical characteristics between comparison groups can confound the association between the molecular marker and survival. Univariable survival analyses that take on only one molecular marker do not account for other markers or clinical characteristics [8]. The resulting associations from such analyses are subjected to confounding effects, which may render them unreliable. Confounding is a fundamental issue that affects observational health-related research, and it should be controlled for when possible [9]. Multivariable analyses are methods to control for confounders and are, therefore, preferable. Open access policies for data and code sharing should facilitate the re-use of data and reproducibility of results [10]. Transparent and detailed reporting of the analytic approach is crucial for replicability and comparison of analyses. These methodological aspects can ensure the science that progresses to clinical trials is well-founded. Clinical validation of results from preclinical glioblastoma studies using TCGA or CGGA data represents a common experimental step to substantiate research findings. This systematic review examined these studies for their consistency in reporting of cohorts from TCGA and CGGA and whether they included patient characteristics in their survival analyses. ## Methods ### Eligibility criteria This review included studies that used data from TCGA or CGGA to examine the association between at least one molecular marker and overall survival in adult patients aged ≥18 years diagnosed with non-recurrent histopathologically confirmed glioblastoma. Studies using any molecular data type from TCGA or CGGA were eligible. Studies using both TCGA and CGGA were eligible if they had separately reported results for TCGA and CGGA. We only included studies that used cell or animal models to first identify molecular markers associated with tumour biology, then examined the association between these markers and overall survival in humans using TCGA or CGGA data. We excluded case reports, reviews, editorials and conference abstracts (S1 Supporting Information). ### Study selection We searched Embase and Medline on 02 February 2021 for potentially eligible studies published after January 2008 using search terms relating to “glioma”, “survival”, “TCGA” and “CGGA” (S2 Supporting Information). The lower limit of the search period was set because data from TCGA first became available in 2008. After removing duplicate studies, two independent reviewers (B.F. and G.L.) performed screening using titles and abstracts followed by full-text eligibility assessment. Any disagreements at each stage were resolved through discussion with a third reviewer (M.T.C.P.). ### Data extraction and data items Two reviewers (B.F. and G.L.) independently collected data from each study using the online systematic review management software Covidence (Veritas Health Innovation, Melbourne, Australia. Available at [www.covidence.org](http://www.covidence.org)). Disagreements were resolved by discussion between the two reviewers or by involving a third reviewer (M.T.C.P.). Data items included study characteristics, TCGA cohort characteristics, CGGA cohort characteristics, genomic data used, molecular markers, and details of survival analysis. Molecular markers included expression, variants, or methylation of genes, RNAs and microRNAs. A set of molecular markers was defined by the analysis of >1 molecular markers together. Each study can report results from multiple survival analyses using the overall cohort or specific subgroups (S1 Figure). We collected information on all survival analyses performed in the studies. We categorised survival analysis into univariable and multivariable analysis, and we collected the covariates entered into the multivariable analysis. To describe the association between molecular markers and survival, we considered the reported p value of <0.05 as statistical significance. If a study reported results from both TCGA and CGGA cohorts, we extracted the statistical significance of these results separately. Data on effect sizes and their corresponding 95% confidence intervals (CI) were not collected because studies using log-rank (Mantel-Cox) tests to compare survival between study-specific groups do not provide these data and there was no plan for meta-analysis. ### Quality assessment There was no risk of bias assessment tool directly relevant to studies in this review. However, we assessed components of the study design relating to risk of bias. These measures of quality included types and size of cohorts used for survival analysis, types of genomic data used from TCGA or CGGA, and the criteria used to select patients for survival analysis. We did not quantify the quality of study based on risk of bias items because this review aimed to assess the reporting and approach to analyses rather than to summarise effect sizes. ### Summary statistics We presented study characteristics, results and quality measures using descriptive statistics with stratification by type of survival analysis, univariable and multivariable, where available. The availability of data in TCGA increased over time and there are different numbers of patients in whom various types of data are available. To assess the reproducibility of cohort selection from TCGA, we summarised the number of patients in studies published between 2017-2020 using TCGA RNA microarray. These studies should have the same number of patients because they all used the same RNA microarray dataset from TCGA when there was no further accrual of patients. There were occasions when two or more survival analyses within or between studies investigated the association between a molecular marker and survival. We presented findings on these molecular markers that underwent two or more analyses to demonstrate the consistencies of results. There was no meta-analysis of any association between molecular markers and overall survival. ## Results ### Study characteristics This review included 58 eligible studies from 1,751 non-duplicate records retrieved from our systematic search (Fig 1 and S1 References). Individual study characteristics are presented in S1 Table. These studies investigated 126 individual molecular markers and 32 sets of molecular markers. Most (62.1%) studies were published in 2017-2020 and were from research teams based in the United States (34.5%), China (27.6%) and Europe (24.1%). The pre-clinical glioblastoma models used were cell lines and orthotopic mouse models in 51.7% and 48.3% studies, respectively. All studies used a form of data from TCGA with various combination with other data sources and two studies used data from CGGA (Table 1). RNA microarray data was the most common data type, used in 45 (77.6%) studies. Three (5.2%) studies did not specify the data type used. Six studies (five using TCGA data and one using both TCGA and CGGA data) did not provide the number of patients included. ![Fig 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/22/2021.09.04.21263119/F1.medium.gif) [Fig 1.](http://medrxiv.org/content/early/2022/01/22/2021.09.04.21263119/F1) Fig 1. PRISMA flowchart of study selection. View this table: [Table 1.](http://medrxiv.org/content/early/2022/01/22/2021.09.04.21263119/T1) Table 1. Characteristics of 58 included studies that used TCGA or CGGA data to validate findings from experiments using pre-clinical models of glioblastoma When investigating the association between their markers of interest from pre-clinical models and survival using genomic data, more studies used univariable survival analyses only (70.7%) compared to those that used multivariable analyses (29.3%). All univariable analyses used the non-parametric log-rank (Mantel-Cox) method and all multivariable analyses used the Cox proportional hazards regression. There were 16 (27.6%) studies that described additional criteria for patient inclusion within the selected TCGA cohort. #### Reproducibility and survival analysis The date and requested data type of query in TCGA can result in a different number of patients available for survival analysis. To assess reproducibility of cohort selection from TCGA in the included studies, we summarised the numbers of patients in studies with similar data specifications. In 14 studies published between 2017 and 2020 using TCGA RNA microarray data without additional patient inclusion criteria, the median number of patients included was 464.5 (interquartile range [IQR] 220.5-525). Of these studies, 12 studies did not perform a multivariable survival analysis, therefore all should have the same number of patients included; the median number of patients included in the univariable survival analysis was 467 (IQR 196.75-528.75). Among the 126 distinct molecular markers investigated in the included studies, 15 markers underwent more than one univariable or multivariable survival analysis (Table 2). The association of these markers with outcomes were consistent between different analyses most of the time. However, there were discrepancies between results for C-X-C Motif Chemokine Ligan 14 (CXCL14), epidermal growth factor receptor (EGFR), netrin 4 (NTN4), SRY-Box transcription factor 2 (SOX2), serglycin (SRGN) and miRNA-17-5p microRNA (Table 2). These discrepancies appear to relate to the type of survival analysis used (CXCL14, SOX2, SRGN) or the data type (EGFR, NTN4). View this table: [Table 2.](http://medrxiv.org/content/early/2022/01/22/2021.09.04.21263119/T2) Table 2. Results of molecular markers that were reported in two or more separate survival analyses There were 17 studies that investigated the association between their molecular markers of interest and overall survival using a multivariable survival analysis. All these studies used TCGA data, which have clinical data available. The most frequently included clinical variable in the multivariable model was age (76.5%) (Fig 2). Other variables included pre-operative functional status (35.3%), sex (29.4%), MGMT promoter methylation (29.4%), radiotherapy (23.5%), chemotherapy (17.6%), IDH mutation (17.6%) and extent of resection (5.9%). ![Fig 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/22/2021.09.04.21263119/F2.medium.gif) [Fig 2.](http://medrxiv.org/content/early/2022/01/22/2021.09.04.21263119/F2) Fig 2. Clinical variables entered analyses in 17 studies that used a multivariable survival model. Rows represent studies that used a multivariable model for survival analysis (S1 References). Columns are clinical variables relevant to survival in patients with glioblastoma. ## Discussion There were studies in glioblastoma research that used data from publicly available genomic repositories to correlate pre-clinical experimental findings with clinical survival benefit in humans. These studies often had different numbers of patients included despite using the same data source and data type. Survival analyses often did not include other critical clinical variables associated with survival such as extent of resection [11], chemotherapy and radiotherapy [3,12]. In studies that performed a multivariable survival analysis, most clinical variables such as extent of resection and oncological treatment were not included. This yielded some inconsistent results between studies. Other results were subject to confounding effects by clinical variables that were not accounted for. ### Reproducibility Research reproducibility encompasses several aspects: consistent results based on the same data and analysis, consistent results based on the same data but different analyses, consistent results from new data based on previous study design of another study, and consistent results from another study with a similar study design [13,14]. Our review addressed the first two of these aspects. Development of novel cancer therapies relies on reproducible results from preclinical research. The need for improving reproducibility is not new [15]. In cancer research, there is a heavy reliance on the preclinical literature for drug development [16]. However, issues with reporting bias, suboptimal reporting quality, varying reproducibility and preclinical model representation of disease impede the success in finding new therapies [17]. The availability of survival data in publicly available data from cancer genomics programmes presents an opportunity for researchers to assess the association between molecular markers and patient survival in a reproducible manner. These open access data sources provide data on the same cohort of patients, which encourages reproducibility between studies. However, our findings demonstrate that patient selection was not adequately described, resulting in different numbers of patients between studies that supposedly used the same dataset. There are reproducible ways of querying TCGA data, for example, using the ‘TCGABiolinks’ R/Bioconductor package [18] where code-based commands can be shared as supplementary materials. Adopting relevant aspects of reporting guidelines such as Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) [19], Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) [20] and REporting recommendations for tumour MARKer prognostic studies (REMARK) [21] can further improve transparency in reporting. ### Confounding effects of clinical variables Confounding is an important consideration in analysing observation data. A confounder can diminish or exaggerate the association between the exposure and the outcome, leading to spurious results [22]. Confounding effects may be controlled by design or by analysis - the latter is most relevant in this review. Control by analysis refers to adopting an analysis method that adjusts for confounders. There are many ways to achieve this, such as stratified and various regression models [9]. The most commonly used multivariable survival analysis is the Cox regression [8]. Most studies in this review did not consider clinical variables as potential confounders to the association between the molecular marker of interest and survival. There are nevertheless examples of associations that no longer exhibit a statistical significance after adjustment to clinical variables in a multivariable analysis (Table 2). Therefore, it is important to explore and consider confounders when assessing the effect of molecular markers on survival [23]. This is not a simple task because of data missingness, relatively small numbers of patients available, as well as correlations between clinical variables. Both data driven and clinically informed choice of covariates would be a reasonable approach [24]. ## Strengths and limitations This systematic review assessed all pre-clinical studies that used data from TCGA or CGGA to validate findings from their laboratory experiments. Our data collection allowed comparison of findings between and within studies, which allowed our evaluation of replicability. Clinical studies that examined associations of previously investigated molecular markers with survival were not included in this review. These studies may provide more detailed descriptions of cohort selection and may be more likely to consider confounding effects from clinical variables. This would mean an overestimation of inconsistencies and suboptimal analytic approaches in our review. However, any omission of consideration about patients being more than their tumours should be highlighted to re-orientate research focus to patient benefits. Collecting data on p values only to denote statistical significance was a pragmatic approach to describing associations reported in the included studies, since most studies did not report any effect sizes. This does not represent our views on the appropriate statistical approach and reporting of findings. We advocate reporting of effect sizes with their corresponding precision, adjusting for confounders. P values should not be used as a cut-off for the significance of an association [25]. There are other aspects of survival analyses that we did not assess, such as whether included studies tested for the proportional hazard assumption when using a Cox regression [26]. While these analytic procedures are important, reporting of these would not affect our findings. We were unable to perform meta-analyses of the associations between molecular markers and survival because studies were not comparable and there were few effect sizes reported. This limitation prevented us from quantifying the consistency based on heterogeneity and variance measures. ## Conclusions Translational studies in glioblastoma research should increase their transparency to facilitate replicability. The validation of laboratory experimental findings using human data is important to demonstrate translational value; but this should be done with consideration of patient characteristics. Integration of expertise in pre-clinical, genomic and clinical studies may help to address the challenge of producing replicable and meaningful research through collaboration between scientists in different fields. ## Supporting information S1 References [[supplements/263119_file04.pdf]](pending:yes) S2 Table [[supplements/263119_file05.pdf]](pending:yes) S1 Table [[supplements/263119_file06.pdf]](pending:yes) S1 Fig [[supplements/263119_file07.pdf]](pending:yes) S2 Supporting Information [[supplements/263119_file08.pdf]](pending:yes) S1 Supporting Information [[supplements/263119_file09.pdf]](pending:yes) ## Data Availability All data are publicly available through the cited papers and our supplementary materials. ## Supporting information **S1. Supporting information. List of eligibility criteria**. **S2 Supporting information. Search strategy in Medline and Embase** **S1 Fig. Common analytic strategy used by included studies**. **S1 Table. Characteristics of 58 included studies**. Data type: A = RNA microarray only, B = RNA microarray and miRNA microarray, C = RNA sequencing only, D = RNA microarray and RNA sequencing, E = miRNA microarray only, F = RNA sequencing, RNA microarray and miRNA microarray, G = RNA sequencing and miRNA microarray, H = RNA microarray and DNA methylation, I = RNA sequencing, RNA microarray and DNA methylation, J = Unspecified. If a study used a data source but not specified the number of patients, the column for data source would be “Yes [NS]” indicating number of patients not specified **S2 Table. References to specific analyses extracted for comparison of results on molecular markers**. Molecular markers ordered alphabetically accompanied with the location of analysis in the original manuscript. U = univariable survival analysis; M = multivariable survival analysis; ▴ = positive association i.e. higher levels of the molecular marker associated with better survival and p<0.05; ▾ = negative association i.e. lower levels of molecular marker associated with worse survival and p<0.05; □ = statistical significance not demonstrated (p≥0.05) **S1 References. Full references of included studies**. ## Footnotes * #a Centre for Medical Informatics, Building Nine, Edinburgh BioQuarter, Edinburgh, EH16 4UX, United Kingdom * Major revision of text in the introduction, methods and discussion. No change to the results. Amendments mostly to clarify review procedures. * Received September 4, 2021. * Revision received January 22, 2022. * Accepted January 22, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Ostrom QT, Patil N, Cioffi G, Waite K, Kruchko C, Barnholtz-Sloan JS. CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2013-2017. Neuro Oncol. 2020;22: iv1–iv96. doi:10.1093/neuonc/noaa200 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/neuonc/noaa200&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 2. 2.Poon MTC, Brennan PM, Jin K, Sudlow CLM, Figueroa JD. Might changes in diagnostic practice explain increasing incidence of brain and central nervous system tumors? A population-based study in Wales (United Kingdom) and the United States. Neuro Oncol. 2021;23: 979–989. doi:10.1093/neuonc/noaa282 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/neuonc/noaa282&link_type=DOI) 3. 3.Stupp R, Mason WP, van den Bent MJ, Weller M, Fisher B, Taphoorn MJB, et al. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N Engl J Med. 2005;352: 987–996. doi:10.1056/NEJMoa043330 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa043330&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15758009&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000227491200007&link_type=ISI) 4. 4.Poon MTC, Sudlow CLM, Figueroa JD, Brennan PM. Longer-term survival in patients with glioblastoma in population-based studies pre- and post-2005: a systematic review and meta-analysis. Sci Rep. 2020;10: 11622. doi:10.1038/s41598-020-68011-4 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-020-68011-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32669604&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 5. 5.O’Duibhir E, Carragher NO, Pollard SM. Accelerating glioblastoma drug discovery: Convergence of patient-derived models, genome editing and phenotypic screening. Molecular and Cellular Neuroscience. 2017;80: 198– 207. doi:10.1016/j.mcn.2016.11.001 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.mcn.2016.11.001&link_type=DOI) 6. 6.Brennan CW, Verhaak RG, McKenna A, Campos B, Noushmehr H, Salama SR, et al. The somatic genomic landscape of glioblastoma. Cell. 2013;155: 462–77. doi:10.1016/j.cell.2013.09.034 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2013.09.034&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24120142&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000325719800021&link_type=ISI) 7. 7.Zhao Z, Zhang K-N, Wang Q, Li G, Zeng F, Zhang Y, et al. Chinese Glioma Genome Atlas (CGGA): A Comprehensive Resource with Functional Genomic Data from Chinese Glioma Patients. Genomics, Proteomics & Bioinformatics. 2021; S1672022921000450. doi:10.1016/j.gpb.2020.10.005 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.gpb.2020.10.005&link_type=DOI) 8. 8.Bradburn MJ, Clark TG, Love SB, Altman DG. Survival Analysis Part II: Multivariate data analysis – an introduction to concepts and methods. Br J Cancer. 2003;89: 431–436. doi:10.1038/sj.bjc.6601119 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sj.bjc.6601119&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12888808&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000184553100003&link_type=ISI) 9. 9.Greenland S, Morgenstern H. Confounding in Health Research. Annu Rev Public Health. 2001;22: 189–212. doi:10.1146/annurev.publhealth.22.1.189 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev.publhealth.22.1.189&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11274518&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000168649000013&link_type=ISI) 10. 10.Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3: 160018. doi:10.1038/sdata.2016.18 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sdata.2016.18&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26978244&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 11. 11.Molinaro AM, Hervey-Jumper S, Morshed RA, Young J, Han SJ, Chunduru P, et al. Association of Maximal Extent of Resection of Contrast-Enhanced and Non–Contrast-Enhanced Tumor With Survival Within Molecular Subgroups of Patients With Newly Diagnosed Glioblastoma. JAMA Oncol. 2020;6: 495. doi:10.1001/jamaoncol.2019.6143 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamaoncol.2019.6143&link_type=DOI) 12. 12.Perry JR, Laperriere N, O’Callaghan CJ, Brandes AA, Menten J, Phillips C, et al. Short-Course Radiation plus Temozolomide in Elderly Patients with Glioblastoma. New England Journal of Medicine. 2017. doi:10.1056/NEJMoa1611977 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1611977&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28296618&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 13. 13.Samsa G, Samsa L. A Guide to Reproducibility in Preclinical Research: Academic Medicine. 2019;94: 47–52. doi:10.1097/ACM.0000000000002351 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/ACM.0000000000002351&link_type=DOI) 14. 14.Amaral OB, Neves K. Reproducibility: expect less of the scientific paper. Nature. 2021;597: 329–331. doi:10.1038/d41586-021-02486-7 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/d41586-021-02486-7&link_type=DOI) 15. 15.Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533: 452–454. doi:10.1038/533452a [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/533452a&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27225100&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 16. 16.Rubin EH, Gilliland DG. Drug development and clinical trials—the path to an approved cancer drug. Nat Rev Clin Oncol. 2012;9: 215–222. doi:10.1038/nrclinonc.2012.22 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrclinonc.2012.22&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22371130&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 17. 17.Begley CG, Ellis LM. Raise standards for preclinical cancer research. Nature. 2012;483: 531–533. doi:10.1038/483531a [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/483531a&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22460880&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000302006100013&link_type=ISI) 18. 18.Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Research. 2016;44: e71–e71. doi:10.1093/nar/gkv1507 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkv1507&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26704973&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 19. 19.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. The Lancet. 2007;370: 1453–1457. doi:10.1016/S0140-6736(07)61602-X [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(07)61602-X&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18064739&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000250386000022&link_type=ISI) 20. 20.Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350: g7594–g7594. doi:10.1136/bmj.g7594 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1136/bmj.g7594&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25569120&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 21. 21.Sauerbrei W, Taube SE, McShane LM, Cavenagh MM, Altman DG. Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK): An Abridged Explanation and Elaboration. JNCI: Journal of the National Cancer Institute. 2018;110: 803–811. doi:10.1093/jnci/djy088 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jnci/djy088&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29873743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 22. 22.Davey Smith G. Data dredging, bias, or confounding. BMJ. 2002;325: 1437–1438. doi:10.1136/bmj.325.7378.1437 [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEzOiIzMjUvNzM3OC8xNDM3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMjIvMjAyMS4wOS4wNC4yMTI2MzExOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 23. 23.Wang X, Lin Y, Song C, Sibille E, Tseng GC. Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: With application to major depressive disorder. BMC Bioinformatics. 2012;13: 52. doi:10.1186/1471-2105-13-52 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2105-13-52&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22458711&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 24. 24.Bradburn MJ, Clark TG, Love SB, Altman DG. Survival Analysis Part III: Multivariate data analysis – choosing a model and assessing its adequacy and fit. Br J Cancer. 2003;89: 605–611. doi:10.1038/sj.bjc.6601120 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sj.bjc.6601120&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12915864&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000184799100001&link_type=ISI) 25. 25.Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31: 337–350. doi:10.1007/s10654-016-0149-3 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s10654-016-0149-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27209009&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F22%2F2021.09.04.21263119.atom) 26. 26.Austin PC. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model. Journal of Statistical Computation and Simulation. 2018;88: 533–552. doi:10.1080/00949655.2017.1397151 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/00949655.2017.1397151&link_type=DOI)