Validation of the PREDICT Breast Version 3.0 Prognostic Tool in US Breast Cancer Patients ========================================================================================= * Yi-Wen Hsiao * Gordon C. Wishart * Paul D.P. Pharoah * Pei-Chen Peng ## ABSTRACT **Background** PREDICT Breast v3 is the latest updated prognostication tool, developed from the breast cancer registry of approximately 35,000 women diagnosed between 2000 and 2018 in the United Kingdom. However, its performance in the United States (US) population is unknown. This study aims to validate PREDICT Breast v3 using newly released Surveillance, Epidemiology, and End Results (SEER) outcome data for US breast cancer patients and to address potential health disparities. **Methods** Over 860,000 female patients diagnosed between 2000 and 2018 with primary breast cancer and followed for at least 10 years were selected from the SEER database. Predicted and observed 10- and 15-year breast cancer-specific survival outcomes were compared for the overall cohort, stratified by estrogen receptor (ER) status, and predefined subgroups. Discriminatory accuracy was determined through the area under the receiver-operator curves (AUC). **Results** PREDICT Breast v3 demonstrated good calibration and discrimination for long-term breast cancer-specific mortality. It provided accurate mortality estimates (within a ±10% error range) across the entire US population for 10-year (−8% in ER-positive and 4% in ER-negative patients) and 15-year (−3 % in ER-positive and 5% in ER-negative patients) all-cause mortality, for both ER statuses. The model also showed good performance for 10- and 15-year all-cause mortality across the U.S. population, with AUC of 0.769 and 0.793 for ER-positive breast cancer as well as AUC of 0.738 and 0.746 for ER-negative breast cancer. However, recalibration is needed for specific groups, such as non-Hispanic Asian and non-Hispanic Black patients with ER-negative status. **Conclusions** PREDICT v3 accurately predicts 10- and 15-year overall survival in contemporary US breast cancer patients. Future work should focus on promoting equitable care by addressing disparities that are observed in predictive tools. Keywords * women’s cancer * breast cancer * survival * prognosis ## 1 Background Breast cancer is the most common type of cancer diagnosed among women worldwide, with around 2.3 million new cases diagnosed in 2022 1. It also has the highest incidence rates and the second-largest mortality rates among women in the US, regardless of race or ethnicity 2: in the United States, a total of 310,720 new female breast cases and 42,250 breast cancer-related deaths among women are estimated in 2024 2. A key decision for women with a new diagnosis of breast cancer, made in discussion with their clinicians, is whether to undergo a course of systemic treatment. Adjuvant systemic treatment after surgery for early-stage breast cancer patients is aimed at reducing the risk of recurrence and mortality 3. Accurate estimates of survival and the benefit of such treatment in early-stage breast cancer ensures that potentially harmful treatment is targeted to those most likely to benefit. These estimates can help oncologists to support optimal clinical decision making which reduces the side effects and maintain the quality of life for breast cancer patients 4. Prediction models such as PREDICT Breast 5, Adjuvant! Online 6, and CancerMath 7 were developed to help decide which adjuvant systemic therapy is most suitable for the patient depending on the patient and tumor characteristics, including tumor size, node status, hormone receptor status, and other factors 8. Adjuvant! Online is no longer available and CancerMath has not been updated since it was first released. PREDICT Breast has been regularly modified and updated since it was released in 2011; PREDICT Breast v3 (breast.v3.predict.cam) 9, released in May 2024, is the most recent version. PREDICT Breast v1 and v2 have been validated in breast cancer cases from multiple countries, including UK 10,11, Canada 12, Malaysia 13, the Netherlands 14,14,15, Japan 16, Indian 17, Spain 18 and New Zealand 19 and the USA 20,20. However, PREDICT Breast v3, has only been validated in breast cancer cases from the UK – the population used to develop the model. It is important that this version should be validated in other populations including the USA and, given the diversity of the population in the USA, it is important to evaluate the performance of PREDICT Breast v3 in the different racial groups within the USA. Thus, this study aims to address this gap by conducting an external validation of PREDICT Breast v3 using the latest release of the Surveillance, Epidemiology, and End Results (SEER) data. This validation will assess the model’s accuracy in predicting patient outcomes across diverse populations of breast cancer patients in the United States. We have also compared the performance of PREDICT Breast v3 with that of PREDICT v2.2 and CancerMath. ## 2 Methods ### 2.1 Study population The study population was from the SEER Research Plus data (2000-2018; November 2023 Submission) 21. SEER is a comprehensive cancer registry program in the US that collects information about cancer patients from multiple cancer registries across the country. In total, 1,291,324 breast cancer cases were recorded in this latest release. The breast cancer registry captures information on patient demographics, the tumor site, time since initial cancer diagnosis, tumor histology and tumor behavior. In this study, women aged 25 to 84 diagnosed with primary breast cancer in 2000 through 2018 were included. Patients with distant metastasis at the time of diagnosis, tumor size exceeding 500 mm or more than 50 positive lymph nodes were excluded, as were those with missing information necessary for PREDICT Breast prognostic score calculations. We also excluded the patients with missing data on survival time or cause of death. The final cohort comprised 628,753 female breast cancer patients: 62,402 Hispanic (All Races), 51,136 Non-Hispanic Asian or Pacific Islander, 57,039 Non-Hispanic Black and 453,297 Non-Hispanic White. ### 2.2 SEER prognostic variables used in the PREDICT Breast model The minimum set of input variables for PREDICT Breast v3 are patient demographics (diagnosis year and age at diagnosis), tumor characteristics (tumor size, histologic grade, number of positive lymph nodes, estrogen receptor status), and treatment types (radiation therapy, adjuvant hormone therapy, adjuvant chemotherapy, trastuzumab and bisphosphonates). Adjuvant chemotherapy is either classified as standard anthracycline based or high-dose anthracycline/taxane based. Optional variables are mode of detection (clinical or screen detected) and tumor HER2 status, progesterone receptor status and KI67 status. KI67 data are not available in SEER and so KI67 status was set to unknown for all cases. Mode of detection is also unavailable and was assumed to be screening for 20% of cases aged 45-69 and clinically detected for all others. SEER also provides information on race and ethnicity encoded as non-Hispanic White, non-Hispanic Black, non-Hispanic American Indian or Alaska Native, non-Hispanic Asian or Pacific Islander, and Hispanic. Each variable is either quantitative or categorical, as detailed in Appendix 1. Treatment data in SEER are limited to indicator variables for radiotherapy and adjuvant chemotherapy with no data on adjuvant hormone therapy, trastuzumab or bisphosphonate therapy. We assumed that: 1) all cases diagnosed under age 65 who received chemotherapy had a high-dose anthracycline/taxane based and all those 65 and over had a standard anthracycline based regimen; 2) all ER-positive patients received hormone therapy, (3) those diagnosed after 2000 with HER2-positive cancer were assumed to have been prescribed trastuzumab; and (4) no patients received bisphosphonate treatment. ### 2.3 Calculating the PREDICT breast v3 predicted survival probabilities Predicted all-cause mortality for PREDICT Breast v3 was calculated using a custom script based on the model described in Grootes at al 9. PREDICT takes the form of a competing risk Cox survival model, with fractional polynomial baseline cumulative hazards. The competing risks are breast cancer mortality and other mortality. The estimated survival from breast cancer at t years after surgery for each patient is given by ![Formula][1] PI is the breast cancer prognostic index (log hazard ratio) given by ![Formula][2] for prognostic factors 1…*i* where *βi* and *f(i)* are the log relative hazards and the function of *i* respectively. Rx is the effect of treatment (log hazard ratio) given by ![Formula][3] for treatments 1..*j,* where *βj* is the log relative hazard for treatment *j*. HC(t) is the baseline hazard for breast cancer mortality. The estimated survival from other causes is given by ![Formula][4] MI is the mortality index ![Formula][5] for other mortality prognostic factors 1…*k* where *βk* and *f(k)* are the coefficients and the function of *k*. The all-cause survival function at time *t* assumes independent, competing risks from breast and other mortality and is given by ![Formula][6] Thus, predicted all-cause mortality at time t is given by ![Formula][7] The baseline hazard functions, and coefficients and functions for all breast cancer and non-breast cancer risk factors were taken from Grootes et al 9. Predicted all-cause mortality for PREDICT Breast v2.2 was calculated using the *nhs.predict* R package 15. The predicted all-cause mortality for CancerMath was calculated using a custom R script derived from the JavaScript extracted from the online tool. The output of the R script was verified comparing them with those generated by the online tool for a small set of cases. ### 2.4 Predictive model performance Model performance was evaluated using calibration, goodness-of-fit and discrimination. Model calibration is given by the ratio of the observed number of events divided by the number of events predicted by the model. Goodness-of-fit was assessed graphically by plotting the observed number of deaths against the predicted number of deaths within quintiles of risk. Model discrimination was evaluated by calculating the area under the receiver-operator characteristic curve (AUC) which measures the probability that the predicted mortality score for a randomly selected patient who died will be higher than that for a randomly selected patient who survived. An AUC value ranges between 0.5 to 1, with a higher AUC indicating a better model in identifying patients with a worse survival. AUC statistics were calculated separately for different ER status and different populations. All analyses were conducted using the R software 22 implemented in the R Studio version 4.3.3 23 and the packages *survival* 24, *pROC* 25 and *tidyverse* 26. ## 3 RESULTS ### 3.1 Demographic and clinical characteristics of breast cancer patients in SEER The study population included 628,753 women diagnosed with breast cancer during 2000 to 2018 in the SEER cancer registry, with 712,233 having ER–positive status and 148,751 having ER–negative status. Of these, 413,280 had a minimum of 10 years follow up (83,084 ER-positive and 330,196 ER-negative) and 220,022 had a minimum of 15 years follow up (172,307 ER+ and 47,715 ER-). Table 1 summarizes patients demographics, tumor characteristics, and treatment types, stratified by ER status. View this table: [Table 1.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T1) Table 1. Breast cancer patients’ demographics, tumor characteristics, and treatment types in SEER, stratified by estrogen receptor status. Characteristics were summarized using the proportions for categorical variables and mean (standard derivation, SD) for continuous variables. ### 3.2 Calibration Overall, PREDICT Breast v3 was well-calibrated. The predicted number of deaths at 10 years was within 10% of the observed deaths in patients with ER-positive cancers (68,114 predicted/74,326 observed) and in patients with ER-negative cancer (26,244 predicted/25,190 observed). At 15-years the predictions were within 5% (ER-positive patients 61,770 predicted/63,718 observed; ER-negative patients 20,191 predicted/19,217 observed). The observed and predicted number of deaths from all-causes at 10 and 15 years stratified by patient demographics, tumor characteristics, and treatment are shown in Table 2 for ER-positive patients and Table 3 for ER-negative patients. In most subgroups the observed and predicted number of deaths were within 10%. However, calibration was poor in non-Hispanic Asians with ER-negative cancer, with PREDICT Breast v3 over-predicting the number of deaths at 10- and 15-years by more than 30%. Similarly, calibration in non-Hispanic black women with ER-positive breast cancer was poor with PREDICT Breast v3 under-predicting the number of deaths by 20% or more at 10- and 15-years. The observed and predicted numbers of deaths from breast cancer are shown in Appendix 2 for ER-positive breast cancer and in Appendix 3 for ER-negative breast cancer, with deaths from other causes shown in Appendix 4 for ER-positive breast cancer and in Appendix 5 for ER-negative breast cancer. In general, breast cancer specific mortality tended to be over-estimated whereas mortality from other causes tended to be under-estimated. View this table: [Table 2.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T2) Table 2. Cumulative observed and predicted all-cause mortality at 10 and 15 years follow up for ER-positive patients. View this table: [Table 3.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T3) Table 3. Cumulative observed and predicted all-cause mortality at 10 and 15 years follow up for ER-negative patients. Results of calibration at 5-years follow-up for all-cause, breast cancer-specific and other-cause mortalities are shown in Appendix 6 for ER-positive breast cancer and in Appendix 7 for ER-negative breast cancer. PREDICT breast v3 tended to under-estimate both breast cancer and other deaths at five years with a more substantial mis-calibration for ER-positive patients. ### 3.3 Goodness-of-fit The comparison between predicted and observed all-cause mortality by quintiles of predicted risk is shown in Figure 1. Overall, PREDICT Breast demonstrated good calibration across most quartiles. Mis-calibration was greatest in patients at highest risk. Goodness-of-fit plots for breast cancer-specific and other causes of mortality at 10 and 15 years are shown in Appendix 8. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/16/2024.10.29.24316401/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/F1) Figure 1. Calibration plot for all-cause mortality by follow up time and tumor ER-status. (A) 10-year, ER-positive, (B) 15-year, ER-positive, (C) 10-year, ER-negative, (D) 15-year, ER-negative. Goodness-of-fit plots for all-cause mortality, breast cancer-specific mortality and other causes of mortality are shown in Appendix 9. ### 3.4 Discrimination Overall model discrimination was very good in women with both ER-positive breast cancer (AUCs of 0.769 for 10-year follow-up and 0.793 for 15-year follow-up) and ER-negative breast cancer (AUCs of 0.738 for 10-year follow-up and 0.746 for 15-year follow-up) (Table 4). There was little difference in discrimination by race. Discrimination was slightly better for other cause mortality than for breast cancer specific mortality is shown in Appendix 10. AUCs for 5-year mortality were similar (Appendix 11). View this table: [Table 4.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T4) Table 4. Model discrimination (area under receiver operator characteristic curve) for 10-year and 15-year all-cause mortality by race and tumor ER-status. ### 3.5 Comparison of performance of PREDICT breast v3 with v2.2 and CancerMath Calibration and discrimination for all three models for all-cause mortality at 10 and 15 years are shown in Table 5. Both PREDICT v2.2 and CancerMath substantially over predicted the number of deaths with calibration being particularly poor for CancerMath. Discrimination was good for all three models, with PREDICT v3 slightly outperforming the other two models. View this table: [Table 5.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T5) Table 5. Performance comparison of other breast cancer prognostication tools with PREDICT v3 in the US population, in terms of the 10-year and 15-year all-cause mortality, stratified by ER status. ## 4 DISCUSSION This study is the first validation of PREDICT Breast v3 in a non-UK population. Overall, the model performed well with good calibration and discrimination at 10 and 15 years for ER-positive and ER-negative patients. The overall performance was similar to that in a large series of patients in the United Kingdom 9. Discrimination was generally very good in all populations for both ER-negative and ER-positive patients, but calibration was poorer in specific populations. In particular, the model over estimated mortality in non-Hispanic Asian patients with ER-negative disease and under-estimated mortality in non-Hispanic black patients with ER-positive disease. The latter finding is consistent with findings from a validation of PREDICT breast v2 in the US population 20. The primary purpose of PREDICT breast is to provide estimates of the absolute survival benefit associated with adjuvant therapies to aid shared decision making between patients and their oncologists. Model performance indicates that PREDICT breast v3 is sufficiently accurate in the US non-Hispanic white population for it to be incorporated into the routine practice of oncologists. However, the model is likely to over-estimate the benefits of adjuvant therapy in non-Hispanic Asian patients with ER-negative disease and under-estimate the benefits of adjuvant therapy in the non-Hispanic black patients with ER-positive disease. The under/over-estimates are by about one third at 10 years, and this should be taken into account when using the model for decision making in these populations. There are two main components to the PREDICT breast model. The first is the baseline hazard and the second is the set of coefficients (log hazard ratios) for each prognostic factor in the model. Poor calibration is primarily dependent on misspecification of the baseline hazard, whereas discrimination depends on the set of coefficients. Given that discrimination was good across all population groups, completely refitting a model to generate different sets of population specific coefficients is unlikely to improve the fit of the model substantially. However, improvements in calibration could easily be achieved by simply modifying the baseline hazard to be population specific. A further limitation is a limitation of the PREDICT breast v3 model itself. There are many markers that have been shown to be prognostic in addition to the variables included in the model. Of particular note are tumor gene expression profiles or genomic risk scores (GRS) such as EndoPredict 27, Mammaprint 28, and OncotypeDx 29. While there are many published ‘validation’ studies of GRS, there has only been one study to evaluate the benefit of adding GRS to standard clinical variables as measured by change in discrimination or reclassification 30; in this study, adding GRS to PREDICT breast v2 had a small effect on the discrimination of the model and reclassification was limited. It seems unlikely that adding GRS to PREDICT breast v3 would make much difference to the performance. We have shown that PREDICT breast v3 works well for the majority of breast cancer patients in the USA. Future work will involve evaluating the benefit of adding GRS to the model and modification of the model to ensure that performance is good in all ancestries to reflect the diverse ancestries of the population. ## Data Availability The data used in this study are available from the National Cancer Institute SEER program, at [https://seer.cancer.gov/](https://seer.cancer.gov/). [https://seer.cancer.gov/](https://seer.cancer.gov/) ## DATA AND CODE AVAILABILITY The data used in this study are available from the National Cancer Institute SEER program, at [https://seer.cancer.gov/](https://seer.cancer.gov/). The R script utilized for data analysis can be accessed on GitHub at [https://github.com/pengpclab/PREDICTv3](https://github.com/pengpclab/PREDICTv3). ## ACKNOWLEDGEMENT This research was supported by the Cedars-Sinai Cancer Center through the 2024 Cancer Prevention and Control Program Research Developmental Funds Award. ## Appendix View this table: [Appendix 1.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T6) Appendix 1. Summary of standard input variables used in the PREDICT Breast model. View this table: [Appendix 2.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T7) Appendix 2. Cumulative observed and predicted breast cancer-specific mortality at 10 and 15 years follow up for ER-positive patients. View this table: [Appendix 3.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T8) Appendix 3. Cumulative observed and predicted breast cancer-specific mortality at 10 and 15 years follow up for ER-negative patients. View this table: [Appendix 4.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T9) Appendix 4. Cumulative observed and predicted other causes of mortality at 10 and 15 years follow up for ER-positive patients. View this table: [Appendix 5.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T10) Appendix 5. Cumulative observed and predicted other causes of mortality at 10 and 15 years follow up for ER-positive patients. View this table: [Appendix 6.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T11) Appendix 6. Observed and predicted all-cause, breast cancer-specific and other causes of mortality at 5 years follow up for ER-positive patients. View this table: [Appendix 7.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T12) Appendix 7. Cumulative observed and predicted all-cause, breast cancer-specific and other causes of mortality at 5 years follow up for ER-negative patients. ![Appendix 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/16/2024.10.29.24316401/F2.medium.gif) [Appendix 8.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/F2) Appendix 8. Calibration plots for (A) 10-year and (B) 15-year breast cancer-specific mortality and (C) 10-year and (D) 15-year other mortality for ER-positive breast cancer, as well as (E) 10-year and (F) 15-year breast cancer-specific mortality and (G) 10-year and (H) 15-year other mortality for ER-positive breast cancer. ![Appendix 9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/16/2024.10.29.24316401/F3.medium.gif) [Appendix 9.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/F3) Appendix 9. Calibration plots for 5-year mortality in ER-positive breast cancer: all-cause mortality for (A) ER-positive breast cancer and (B) ER-negative breast cancer, breast cancer-specific mortality for (C) ER-positive breast cancer and (D) ER-negative breast cancer, and mortality from other causes for (E) ER-positive breast cancer and (F) ER-negative breast cancer. View this table: [Appendix 10.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T13) Appendix 10. Model discrimination (area under receiver operator characteristic curve) for 10-year and 15-year breast cancer-specific and other cause mortality by race and tumor ER-status. View this table: [Appendix 11.](http://medrxiv.org/content/early/2024/11/16/2024.10.29.24316401/T14) Appendix 11. The discrimination for 5-year all-causes, breast cancer-specific and other causes of mortality by race and tumor ER-status. ## Footnotes * A typo has been identified and corrected in the author list. * Received October 29, 2024. * Revision received November 15, 2024. * Accepted November 16, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## REFERENCES 1. 1.Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263. doi:10.3322/caac.21834 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3322/caac.21834&link_type=DOI) 2. 2.Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin. 2024;74(1):12-49. doi:10.3322/caac.21820 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3322/caac.21820&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=38230766&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 3. 3.Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. The Lancet. 2005;365(9472):1687–1717. doi:10.1016/S0140-6736(05)66544-0 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(05)66544-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15894097&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000229082300022&link_type=ISI) 4. 4.Katz SJ, Morrow M. Addressing overtreatment in breast cancer: The doctors’ dilemma. Cancer. 2013;119(20):3584–3588. doi:10.1002/cncr.28260 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/cncr.28260&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23913512&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000325373700036&link_type=ISI) 5. 5.Wishart GC, Azzato EM, Greenberg DC, et al. PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer. Breast Cancer Res. Published online 2010. 6. 6.Campbell HE, Taylor MA, Harris AL, Gray AM. An investigation into the performance of the Adjuvant! Online prognostic programme in early breast cancer for a cohort of patients in the United Kingdom. Br J Cancer. 2009;101(7):1074–1084. doi:10.1038/sj.bjc.6605283 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sj.bjc.6605283&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19724274&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000270355300007&link_type=ISI) 7. 7.Chen LL, Nolan ME, Silverstein MJ, et al. The impact of primary tumor size, lymph node status, and other prognostic factors on the risk of cancer death. Cancer. 2009;115(21):5071–5083. doi:10.1002/cncr.24565 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/cncr.24565&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19658184&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 8. 8.Liao GS, Chou YC, Hsu HM, Dai MS, Yu JC. The prognostic value of lymph node status among breast cancer subtypes. Am J Surg. 2015;209(4):717–724. doi:10.1016/j.amjsurg.2014.05.029 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.amjsurg.2014.05.029&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25192588&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 9. 9.Grootes I, Wishart GC, Pharoah PDP. An updated PREDICT breast cancer prognostic model including the benefits and harms of radiotherapy. Npj Breast Cancer. 2024;10(1):6. doi:10.1038/s41523-024-00612-y [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41523-024-00612-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=38225255&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 10. 10. POSH Steering Group, Maishman T, Copson E, et al. An evaluation of the prognostic model PREDICT using the POSH cohort of women aged ⩽40 years at breast cancer diagnosis. Br J Cancer. 2015;112(6):983–991. doi:10.1038/bjc.2015.57 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/bjc.2015.57&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25675148&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 11. 11.the SATURNE Advisory Group, Gray E, Marti J, Brewster DH, Wyatt JC, Hall PS. Independent validation of the PREDICT breast cancer prognosis prediction tool in 45,789 patients using Scottish Cancer Registry data. Br J Cancer. 2018;119(7):808–814. doi:10.1038/s41416-018-0256-x [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41416-018-0256-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30220705&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 12. 12.Wishart GC, Bajdik CD, Azzato EM, et al. A population-based validation of the prognostic model PREDICT for early breast cancer. Eur J Surg Oncol EJSO. 2011;37(5):411–417. doi:10.1016/j.ejso.2011.02.001 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ejso.2011.02.001&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21371853&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 13. 13.Wong HS, Subramaniam S, Alias Z, et al. The Predictive Accuracy of PREDICT: A Personalized Decision-Making Tool for Southeast Asian Women With Breast Cancer. Medicine (Baltimore*)*. 2015;94(8):e593. doi:10.1097/MD.0000000000000593 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/MD.0000000000000593&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25715267&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 14. 14.Van Maaren MC, Van Steenbeek CD, Pharoah PDP, et al. Validation of the online prediction tool PREDICT v. 2.0 in the Dutch breast cancer population. Eur J Cancer. 2017;86:364–372. doi:10.1016/j.ejca.2017.09.031 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ejca.2017.09.031&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29100191&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 15. 15.De Glas NA, Bastiaannet E, Engels CC, et al. Validity of the online PREDICT tool in older patients with breast cancer: a population-based study. Br J Cancer. 2016;114(4):395–400. doi:10.1038/bjc.2015.466 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/bjc.2015.466&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26783995&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 16. 16.Zaguirre K, Kai M, Kubo M, et al. Validity of the prognostication tool PREDICT version 2.2 in Japanese breast cancer patients. Cancer Med. 2021;10(5):1605–1613. doi:10.1002/cam4.3713 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/cam4.3713&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33452761&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 17. 17.Nair NS, Kothari B, Gupta S, et al. Validation of PREDICT Version 2.2 in a Retrospective Cohort of Indian Women With Operable Breast Cancer. JCO Glob Oncol. 2023;(9):e2300114. doi:10.1200/GO.23.00114 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1200/GO.23.00114&link_type=DOI) 18. 18.Aguirre U, García-Gutiérrez S, Romero A, et al. External validation of the PREDICT tool in Spanish women with breast cancer participating in population-based screening programmes. J Eval Clin Pract. 2019;25(5):873–880. doi:10.1111/jep.13084 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/jep.13084&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30548721&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 19. 19.Grootes I, Keeman R, Blows FM, et al. Incorporating progesterone receptor expression into the PREDICT breast prognostic model. Eur J Cancer. 2022;173:178–193. doi:10.1016/j.ejca.2022.06.011 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ejca.2022.06.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35933885&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 20. 20.Deng Z, Jones MR, Wolff AC, Visvanathan K. Evaluation of Predict, a prognostic risk tool, after diagnosis of a second breast cancer. JNCI Cancer Spectr. 2023;7(6):pkad081. doi:10.1093/jncics/pkad081 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jncics/pkad081&link_type=DOI) 21. 21.Murphy PK, Sellers ME, Bonds SH, Scott S. The SEER Program’s longstanding commitment to making cancer resources available. JNCI Monogr. 2024;2024(65):118–122. doi:10.1093/jncimonographs/lgae028 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jncimonographs/lgae028&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=39102882&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 22. 22.1. Chan BKC Chan BKC. Data Analysis Using R Programming. In: Chan BKC, ed. Biostatistics for Human Genetic Epidemiology. Springer International Publishing; 2018:47–122. doi:10.1007/978-3-319-93791-5_2 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-3-319-93791-5_2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30357717&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 23. 23.Kronthaler F, Zöllner S. Data Analysis with RStudio: An Easygoing Introduction. Springer; 2021. doi:10.1007/978-3-662-62518-7 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-3-662-62518-7&link_type=DOI) 24. 24.Therneau T. A package for survival analysis in R. 25. 25.Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12(1):77. doi:10.1186/1471-2105-12-77 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2105-12-77&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21414208&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) 26. 26.Kabacoff RI. R in Action, Third Edition: Data Analysis and Graphics with R and Tidyverse. Simon and Schuster; 2022. 27. 27.Filipits M, Rudas M, Jakesz R, et al. A New Molecular Predictor of Distant Recurrence in ER-Positive, HER2-Negative Breast Cancer Adds Independent Information to Conventional Clinical Risk Factors. Clin Cancer Res. 2011;17(18):6012–6020. doi:10.1158/1078-0432.CCR-11-0926 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImNsaW5jYW5yZXMiO3M6NToicmVzaWQiO3M6MTA6IjE3LzE4LzYwMTIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMS8xNi8yMDI0LjEwLjI5LjI0MzE2NDAxLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 28. 28.Van De Vijver MJ, He YD, Van ’T Veer LJ, et al. A Gene-Expression Signature as a Predictor of Survival in Breast Cancer. N Engl J Med. 2002;347(25):1999–2009. doi:10.1056/NEJMoa021967 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa021967&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12490681&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000179874500003&link_type=ISI) 29. 29.Paik S, Kim C, Baehner FL, Park T, Wickerham DL, Wolmark N. A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast Cancer. N Engl J Med. Published online 2004. 30. 30.Chowdhury A, Pharoah PD, Rueda OM. Evaluation and comparison of different breast cancer prognosis scores based on gene expression data. Breast Cancer Res. 2023;25(1):17. doi:10.1186/s13058-023-01612-9 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13058-023-01612-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36755280&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F16%2F2024.10.29.24316401.atom) [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/graphic-3.gif [4]: /embed/graphic-4.gif [5]: /embed/graphic-5.gif [6]: /embed/graphic-6.gif [7]: /embed/graphic-7.gif