Unveiling Breast Cancer Risk Profiles: A Comprehensive Survival Clustering Analysis Empowered by an Online Web Application for Personalized Medicine

Yuan Gu; Mingyue Wang; Yishu Gong; Song Jiang; Chen Li; Dan Zhang

doi:10.1101/2023.05.18.23290062

Abstract

Online tools, such as web-based applications, aid medical doctors in recommending treatments and conducting thorough patient profile investigations. Prior studies have created web-based survival analysis tools for cancer survival. However, these often offer basic features and simplistic models, providing shallow data insights. Our research involves an in-depth risk profile analysis using survival clustering on real-world data. We’ve developed a user-friendly Shiny application to simplify the use of our findings. By utilizing survival clustering, we uncover distinct subgroups and unique risk profiles among breast cancer patients. Our online app empowers researchers and clinicians to explore and gain insights into breast cancer risk profiles, enhancing personalized medicine and clinical decision-making.

1. Introduction

Understanding the risk profile and survival outcomes of breast cancer remains a complex and intricate area of research. Despite significant advancements in the field, there are still numerous factors that contribute to the variability in breast cancer progression and patient outcomes. The interplay between genetic predisposition, lifestyle factors, tumor characteristics, and treatment response adds to the challenge of unraveling the intricacies of breast cancer risk profiles. Additionally, the heterogeneity within breast cancer itself further complicates the identification of clear-cut survival patterns. Consequently, there is a pressing need to delve deeper into the complexities of breast cancer risk profiles and survival to uncover new insights and potential avenues for improved diagnosis, treatment, and personalized care for patients.

In this study, we undertake a comprehensive analysis of risk profiles using survival clustering techniques applied to the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset [1]. An unsupervised learning approach was used to cluster patients based on their survival difference and other relevant clinical variables, by combining k-means clustering and Kaplan-Meier (K-M) curves, leveraging the significant risk factors identified through a Cox regression model. Through a deeper exploration of survival clustering, we can discover heterogeneous subpopulations with varying survival characteristics including age, tumor stage and molecular subtypes etc., providing insights into the underlying heterogeneity of breast cancer and reveal potential biomarkers for risk stratification. Additionally, we have developed an accessible online application using Shiny to facilitate easier utilization of our findings.

Currently, there is a scarcity of web-based survival analysis tools available for medical researchers conducting survival studies and all previous web applications have primarily focused on Cox regression [2] or genetic analysis and other data structures [3–6]. Also, a lot of previous relevant studies used semi-supervised or self-supervised learning for medical multi classification problems but lack of survival information [7–8]. In contrast, our study fills an existing void by introducing a unique combination of unsupervised learning techniques and risk factor analysis on the clinician side, demonstrating the potential of survival clustering as a valuable tool in uncovering hidden structures based on distinct risk profiles. Overall, our online tool provides a user-friendly interface for researchers and clinicians to explore the results and derive valuable insights on breast cancer. The link for this app is: https://baran-shad.shinyapps.io/breastcancer

2. Literature review

Breast cancer is a complex disease with diverse clinical outcomes, making accurate prediction of survival crucial for personalized treatment and care. Over the years, various survival models have been developed to aid in understanding and predicting cancer survival rates, from various perspectives, such as increasing the prediction accuracy from the model aspect by using metrics like AUC and AUC-PR [9]. The most widely used classical approach is the Cox regression model, which assumes proportional hazard rate. Numerous studies have applied this model to identifying significant prognostic factors such as age, tumor stage, hormone receptor status, and lymph node involvement [10–11]. However, the Cox model relies on the proportional hazard assumption, which may not always hold true, leading to potential bias in the estimation of survival probabilities.

To address the limitations, researchers have explored alternative techniques, including machine learning algorithms, such as random forests, support vector machines, and neural networks, which have demonstrated promising results [12–15]. These models can handle complex interactions between variables and capture non-linear relationships, thus providing improved accuracy in survival prediction. However, their black-box nature often limits interpretability and understanding of the underlying biological mechanisms.

Another noteworthy approach is the use of gene expression data. Gene expression-based models, such as using multi-omics neural networks to make the survival prediction, have shown the ability to provide deeper insight into which types of data are most relevant to improve prognosis [16–17]. These models provide valuable insights into the underlying biology of breast cancer and offer potential for personalized treatment strategies. However, their reliance on gene expression data may limit their application in clinical settings where gene expression profiling is not routinely performed. Other challenges and limitations exist in the data and visualization, our previous sickle cell disease studies also combined machine learning and survival model but without visualization interactive online tools [18, 19]. Issues like missing data, limited sample sizes, and the lack of accessible visualization tools for physicians and medical professionals with limited modeling expertise hinder the widespread utilization of the models.

In our study, we address all above challenges by developing a user-friendly interface to uncover hidden structures within breast cancer data and identify unique risk profiles. This intuitive tool enables researchers and clinicians to easily explore and interpret the results of our analysis and to gain valuable insights into the complex landscape of breast cancer risk profiles, bridging the gap between sophisticated modeling techniques and practical clinical applications.

3. Methods

3.1 Data

The METABRIC dataset is a valuable and publicly accessible resource for researchers. It encompasses a total of 2,506 subjects and 34 variables, providing a comprehensive foundation for studying breast cancer. To ensure the reliability of our analysis, we diligently handled any missing data and performed thorough feature checks. A total of 528 subjects who lacked survival status information were excluded from the analysis. After removing the 528 subjects, other certain variables with limited variability, such as Sex that comprised all females, were also omitted from the dataset. Following the above meticulous data preprocessing procedures, our final analysis dataset consists of 1,269 subjects and 23 variables without missing values. The 23 variables are : Age at diagnosis, Subtype cohort (termed integrative clusters )[20], neoplasm histologic grade, number of lymph nodes examined to be positive, mutation count, Nottingham prognostic index (NPI), overall survival time in months, relapse free status in months, tumor size, tumor stage, cancer type, ER status, HER2 status, hormone therapy status, survival status, prior radiology status, menopausal status, integrative subgroup, chemotherapy, cellularity, Prediction Analysis of Microarray 50 (PAM50), relapse status. All analysis is built by R (v 4.3.0). The descriptive statistics, including means and 95% confidence intervals for continuous variables and count and proportions for categorical variables, for the 23 variables in each survival status group (living or decreased) are provided in the following Table 1. A t-test was used to compare the Age at diagnosis between the survival status groups (living versus decreased), and a Chi-Square test was used to examine the categorical variables between the survival status groups (living versus deceased). The variables with p-value <0.05 suggest a statistically significant difference between the groups.

View this table:

Table 1: Basic descriptive statistics.

* p value denotes significant level when p<0.05

** continuous variables are summarized by mean (min-max)

3.2 Model

The analysis of the breast cancer data encompassed three distinct phases. In the first phase, Cox regression with stepwise AIC selection was used to identify statistically significant risk factors. This approach allowed us to determine the variables that most significantly influenced breast cancer outcomes among the 22 variables. Following this, K-means clustering was conducted based on the selected risk factors from the Cox model in the first phase. By grouping similar individuals together, this clustering analysis provided insights into distinct subgroups within the dataset. In the final phase, a Kaplan-Meier model was constructed using the predicted clusters, enabling a deeper exploration of the risk profiles associated with each cluster. This comprehensive approach allowed for a thorough examination of the relationship between risk factors, clustering patterns, and breast cancer outcomes, ultimately enhancing our understanding of the disease.

The first phase is the Cox regression with stepwise AIC selection. Cox regression, assumes proportional hazards, is specifically designed for survival analysis, allows us to assess the impact of various variables on the time until death occurs. By using stepwise AIC selection, the model identifies the subset of variables that provide the best fit for the data, while controlling for the risk of overfitting. This approach considers the trade-off between model complexity and goodness of fit, selecting a parsimonious model that optimizes the AIC criterion.

Once significant risk factors were identified, the next phase is the k-means clustering to group individuals based on those factors. To determine the optimal number of clusters, various methods such as the elbow method, Silhouette coefficient, and gap statistics were considered as the standard methods in previous research. However, because K-means clustering is unsupervised learning, without label information, the optimal number of clusters determination is often challenging and subjective. Choosing an inappropriate value for k can lead to very poor clustering results, for example, if a dataset contains some outliers which can significantly affect the position and size of the clusters, the above widely used methods may provide incorrect clustering assignments. In that case, preprocessing steps or outlier detection techniques may be needed to mitigate this issue, however, in survival data, one event outlier might be very informative compared to other normal data, indicating a person who has higher risk to get the event. Given that our ultimate objective was to examine the survival risk profile, in the final phase, we incorporated the Kaplan-Meier (KM) model, the log-rank test, to identify differences between survival curves among clusters. By leveraging the insights provided by the log-rank test, we are able to select a different number of clusters and visualize the results. The visualized clusters were also presented in the web-app, allowing for interactive exploration.

For each number of clusters, we conducted an analysis of the basic characteristics associated with the predicted clusters. By assessing these characteristics, we gained insights into the distribution and patterns within each cluster. For continuous risk variables, the mean values provided an indication of the average risk level within that cluster. Meanwhile, for categorical risk factors, the frequency analysis allowed us to identify the prevalence of specific risk factors within each cluster. This comprehensive examination of basic characteristics facilitated a deeper understanding of the distinct profiles.

4. Results

The Cox regression analysis with AIC selection identified several significant risk factors associated with breast cancer outcomes; the result is presented in Table 2.

View this table:

Table 2: Cox regression results

A total of 6 variables are statistically significant risk factors. Age at diagnosis (p <0.01) showed a significant positive relationship with death, suggesting that older age is associated with increased risk. A higher Nottingham Prognostic (p = 0.01) Index, relapse-free periods (p-value < 0.001), ER-negative status (p < 0.001) and experiencing Hormone and Chemo therapies are also associated with increased hazard rates.

The model has a concordance of 0.897, demonstrating a very strong discriminatory power to distinguish between individuals with different survival outcomes with a high degree of accuracy. In the next step, all the above significant risk factors are standardized by simply converting into a z-score and then used as the input variables into the k-means clustering, to ensure all input variables are on the same scale, preventing variables with larger magnitudes from dominating the clustering process. In this phase, K-means will be used to partition the dataset into K clusters. The algorithm assigns each data point to the cluster with the nearest mean value.

Although traditional k-means is unsupervised learning without predefined labels or target variables, in our case, we improved it by incorporating a pseudo-supervised way, such as the K- means survival difference. We used the survival difference between different clusters to visually guide the selection of the appropriate number of clusters. This approach adds an extra layer of information beyond the traditional K-means, enabling the model to assess the quality of clustering based on survival differences.

Suppose we assign K number of clusters to all observations and the k clusters are denoted □_□ □□□□□□ = 1… □, the log rank statistic is approximately distributed as a chi-square test statistic, and the following is the test formula (1): where ∑O_kt represents the sum of the observed number of events in the □^□□ cluster over time, and ∑E_kt represents the sum of the expected number of events in the □^□□ cluster over time. and the K-means cluster algorithm is by solving the following optimization problem (2): where □(□) = □ stands for assigning the ith observation to the □^□□ cluster C, □_□ is the mean value of □^□□ cluster. □_□ are the observations in the □^□□ cluster. The objective of the algorithm is to minimize the total dissimilarity by assigning N observations to K clusters in a manner that minimizes the average dissimilarity of the observations from their respective cluster means. The dissimilarity is calculated by the 2-norm stand for ‖·‖². Our analysis and the web application are based on incorporating both (1) and (2) for different K assignments. The following Picture 1 and 2 presents the clustering points and the survival curve difference by selecting k = 4 and 5, more other k choices are available in the web application.

Picture 1

4 clusters and the K-M curves

Picture 2

5 clusters and the K-M curves

In Picture 1, four distinct clusters are discernible, each highlighted with a unique color (upper). The accompanying graph below represents the survival plot (below). The colors corresponding to clusters 1 to 4 are red, green, blue and purple. The sample sizes of these clusters vary, with 315, 314, 217 and 423 from cluster 1 to 4. Significant survival differences can be observed when comparing clusters (1 and 4) to clusters (2 and 3). Additionally, within these groupings, minor survival differences are also evident between clusters 1 and 4, as well as between clusters 2 and 3. However, in Picture 2, when we select k = 5, per reconfiguration, we now observe the colors for clusters 1 to 5 are red, yellow, green blue and purple, also clusters 1 and 5, very close to the former clusters 2 and 3 when k was set to 4. Notably, there is a significant survival difference between these two clusters (1 and 5) compared to the remaining three clusters (clusters 2, 3, and 4). In terms of sample size, clusters 1 through 5 consist of 237, 307, 153, 210, and 362 samples, respectively. The p-values obtained from the log-rank test are both (k=4 and k=5) less than 0.0001, as indicated in the accompanying survival plots. The median survival time, represented by the horizontal dashed line in the plots, is also added in the survival plots.

Our final phase involves a comprehensive examination of the risk profiles across the different clusters. After assigning clusters, we can delve deeper into those significant risk factors. Table 3 shows the details for continuous risk factors for 4 clusters when K=4. Picture 3 is the box plot for continuous risk factors by 4 clusters and this illustration provides more specific details in the web application. Furthermore, we also depicted the frequency distribution of each subgroup within categorical variables (e.g., recurrence vs. no recurrence for the ’Relapse Status’) stratified by clusters in the following illustration (Picture 4).

Picture 3

Box plot for Tumor size group by 4 clusters

Picture 4

Categorical subgroup frequency group by 4 clusters

View this table:

Table 3: Summary Statistics for continuous risk factor by 4 clusters

In the provided table and pictures, we observe that cluster 3 exhibits the shortest median survival time (Table 3) and the largest tumor size (Picture 3). Additionally, in the frequency plot of tumor stage, cluster 3 comprises the highest number of patients with stage 3 and above. Furthermore, the Neoplasm Histologic Grade in cluster 3 is also the highest compared to the other cluster groups. Cluster 2 also demonstrates a relatively lower survival rate compared to the other clusters. From Picture 4, we can discern a pattern where cluster 2 has higher Neoplasm Histologic Grade and tumor stages compared to clusters 1 and 4.

Upon setting K = 5, it can be observed that cluster 1 and cluster 5 exhibit lower survival rates in the following Table 4 and Picture 5. Moreover, these two clusters are characterized by relatively higher Neoplasm Histologic Grade compared to the other clusters. In terms of relapse status, cluster 1 comprises the majority of patients who have not experienced recurrence, whereas cluster 5 has the highest number of patients with recurrence. These findings indicate the need for further investigation from a clinical perspective to gain deeper insights.

Picture 5

Categorical subgroup frequency group by 5 clusters

View this table:

Table 4: Summary Statistics for continuous risk factor by 5 clusters

5. Conclusion and Future Work

In conclusion, our comprehensive analysis consisting of three phases, which involved incorporating survival information into the unsupervised machine learning K-means clustering model and developing a web app, has showcased significant advantages, creativity, and contributions in the analysis of breast cancer data. Through a comprehensive three-phase approach, we have provided valuable insights into the risk factors, clustering patterns, and outcomes associated with breast cancer. Our advantage lies in using Cox regression with stepwise AIC selection as the first phase of analysis. The cox model identifies statistically significant risk factors for breast cancer with a very promising concordance value of 0.937. The second phase involves k-means clustering, which groups individuals based on the selected risk factors from the Cox model. By identifying similar individuals within the dataset, this clustering analysis reveals distinct subgroups and provides a deeper understanding of the data. We consider the optimal number of clusters by log rank test according to the KM model to explore the risk profiles associated with each cluster in the last phase. This in-depth examination enhances our knowledge of the distinct profiles and risk factors associated with each predicted cluster, ultimately contributing to our understanding of breast cancer. In summary, our web app empowers healthcare professionals and researchers to make informed decisions and advance their knowledge in the fight against breast cancer.

However, additional research is necessary to achieve a broader application. The current app is designed specifically for this dataset and lacks a comprehensive investigation of generalizability. To enhance usability for medical professionals, it would be beneficial to develop a more convenient pipeline that allows users to import and download their own datasets and select their own potential risk factors. Another limitation is the relatively small number of input variables in this dataset. Therefore, the development of additional models or tools capable of handling larger datasets should be considered for future endeavors.

Footnotes

model revised, Figures are revised.

References

[1].↵
Christina Curtis, Sohrab P Shah, Suet-Feung Chin, Gulisa Turashvili, Oscar M Rueda, Mark J Dunning, Doug Speed, Andy G Lynch, Shamith Samarajiwa, Yinyin Yuan, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486(7403):346–352, 2012.
OpenUrl CrossRef PubMed Web of Science Google Scholar
[2].↵
Lánczky A, Győrffy B. Web-based survival analysis tool tailored for medical research (KMplot): development and implementation. Journal of medical Internet research. 2021 Jul 26;23(7):e27633.
OpenUrl PubMed Google Scholar
[3].↵
Dwivedi B, Mumme H, Satpathy S, Bhasin SS, Bhasin M. Survival Genie, a web platform for survival analysis across pediatric and adult cancers. Scientific Reports. 2022 Feb 23;12(1):3069.
OpenUrl Google Scholar
[4].
Yang J, Zhao S, Wang J, Sheng Q, Liu Q, Shyr Y. Immu-Mela: An open resource for exploring immunotherapy-related multidimensional genomic profiles in melanoma. Journal of Genetics and Genomics. 2021 May 20;48(5):361–8.
OpenUrl Google Scholar
[5].
Zhao T, Wang Z. GraphBio: a shiny web app to easily perform popular visualization analysis for omics data. Frontiers in Genetics. 2022:2265.
Google Scholar
[6].↵
Gu Y, Gong Y, Wang M, Jiang S, Li Z, Yuan Z. Enhancing Kidney Failure Analysis: Web Application Development for Longitudinal Trajectory Clustering. medRxiv. 2023:2023–05.
Google Scholar
[7].↵
Zhang D, Zhou F, Jiang Y, Fu Z. MM-BSN: Self-Supervised Image Denoising for Real-World with Multi-Mask based on Blind-Spot Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (pp. 4188–4197).
Google Scholar
[8].↵
Wang Z, Li T, Zheng JQ, Huang B. When cnn meet with vit: Towards semi-supervised learning for multi-class medical image semantic segmentation. InComputer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII 2023 Feb 12 (pp. 424–441). Cham: Springer Nature Switzerland.
Google Scholar
[9].↵
Wu J, Ye X, Mou C, Dai W. Fineehr: Refine clinical note representations to improve mortality prediction. In2023 11th International Symposium on Digital Forensics and Security (ISDFS) 2023 May 11 (pp. 1–6). IEEE.
Google Scholar
[10].↵
Hajihosseini M, Faradmal J, Sadighi-Pashaki A. Survival analysis of breast cancer patients after surgery with an intermediate event: application of illness-death model. Iranian Journal of Public Health. 2015 Dec;44(12):1677.
OpenUrl Google Scholar
[11].↵
Vahdaninia M, Montazeri A. Breast cancer in Iran: a survival analysis. Asian pacific journal of cancer prevention. 2004 Apr 1;5(2):223–5.
OpenUrl Google Scholar
[12].↵
Tong L, Mitchel J, Chatlin K, Wang MD. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC medical informatics and decision making. 2020 Dec;20:1–2.
OpenUrl Google Scholar
[13].
Cure-Cure CA, Cure P, Gu Y, Tian X, Patel T, Wu CO, Sviglin H, Sopko G, Csako G, Cody S, Dandi G. Predictors of all cause mortality and their gender differences in a hispanic population from barranquilla-colombia using machine learning with random survival forests. Circulation. 2018 Nov 6;138(Suppl_1):A16252-.
OpenUrl Google Scholar
[14].
Jiang S, Gu Y, Kumar E. Magnetic Resonance Imaging (MRI) Brain Tumor Image Classification Based on Five Machine Learning Algorithms. Cloud Computing and Data Science. 2023 May 11:122–33.
Google Scholar
[15].↵
Chi CL, Street WN, Wolberg WH. Application of artificial neural network-based survival analysis on two breast cancer datasets. InAMIA annual symposium proceedings 2007 (Vol. 2007, p. 130). American Medical Informatics Association.
Google Scholar
[16].↵
Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, Zhang J, Salama P, Rizkalla M, Han Z, Huang K. SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Frontiers in genetics. 2019 Mar 8;10:166.
OpenUrl Google Scholar
[17].↵
Yang J, Zhao S, Wang J, Sheng Q, Liu Q, Shyr Y. A pan-cancer immunogenomic atlas for immune checkpoint blockade immunotherapy.
Google Scholar
[18].↵
Sachdev V, Tian X, Gu Y, Nichols J, Sidenko S, Li W, Beri A, Layne WA, Allen D, Wu CO, Thein SL. A phenotypic risk score for predicting mortality in sickle cell disease. British journal of haematology. 2021 Mar;192(5):932–41.
OpenUrl Google Scholar
[19].↵
Sachdev V, Gu Y, Nichols J, Li W, Sidenko S, Allen D, Wu C, Thein SL. A Machine Learning Algorithm to Improve Risk Assessment for Patients with Sickle Cell Disease. Blood. 2019 Nov 13;134:893
OpenUrl Google Scholar
[20].↵
Mukherjee A, Russell R, Chin SF, Liu B, Rueda OM, Ali HR, Turashvili G, Mahler-Araujo B, Ellis IO, Aparicio S, Caldas C. Associations between genomic stratification of breast cancer and centrally reviewed tumour pathology in the METABRIC cohort. NPJ breast cancer. 2018 Mar 7;4(1):5.
OpenUrl Google Scholar

Posted September 14, 2023.

Download PDF

Author Declarations

Data/Code

Revision Summary

Citation Tools

Get QR code

Tweet Widget

Subject Area

Oncology

Reviews and Context

Comment

TRIP Peer Reviews

Community Reviews

Automated Services

Blogs/Media

Author Videos

Subject Areas

All Articles

Addiction Medicine (414)
Allergy and Immunology (735)
Anesthesia (216)
Cardiovascular Medicine (3152)
Dentistry and Oral Medicine (352)
Dermatology (268)
Emergency Medicine (468)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1118)
Epidemiology (13117)
Forensic Medicine (15)
Gastroenterology (875)
Genetic and Genomic Medicine (4939)
Geriatric Medicine (455)
Health Economics (758)
Health Informatics (3109)
Health Policy (1110)
Health Systems and Quality Improvement (1149)
Hematology (414)
HIV/AIDS (981)
Infectious Diseases (except HIV/AIDS) (14408)
Intensive Care and Critical Care Medicine (893)
Medical Education (460)
Medical Ethics (121)
Nephrology (508)
Neurology (4693)
Nursing (249)
Nutrition (695)
Obstetrics and Gynecology (852)
Occupational and Environmental Health (771)
Oncology (2421)
Ophthalmology (689)
Orthopedics (270)
Otolaryngology (334)
Pain Medicine (313)
Palliative Medicine (88)
Pathology (523)
Pediatrics (1258)
Pharmacology and Therapeutics (531)
Primary Care Research (530)
Psychiatry and Clinical Psychology (4033)
Public and Global Health (7259)
Radiology and Imaging (1623)
Rehabilitation Medicine and Physical Therapy (968)
Respiratory Medicine (950)
Rheumatology (463)
Sexual and Reproductive Health (482)
Sports Medicine (408)
Surgery (525)
Toxicology (65)
Transplantation (223)
Urology (193)

Comments

medRxiv aims to provide a venue for anyone to comment on a medRxiv preprint. Comments are moderated for offensive or irrelevant content (this can take ~24 h). Please avoid duplicate submissions and read our Comment Policy before commenting. The content of a comment is not endorsed by medRxiv.

medRxiv aims to inform readers about online discussion of this preprint occurring elsewhere. The content at the links below is not endorsed by either medRxiv or the preprint's authors.

Community reviews for this article:

There are no community reviews for this paper.

Automated Evaluations

Certain services provide automated analysis of preprints. Analyses invited by the authors are displayed at the top of this tab. Those done independently of authors are shown underneath . None of these analyses is endorsed by medRxiv.

Automated Evaluations:

There are no automated evaluations for this paper.

[1] [1].↵
Christina Curtis, Sohrab P Shah, Suet-Feung Chin, Gulisa Turashvili, Oscar M Rueda, Mark J Dunning, Doug Speed, Andy G Lynch, Shamith Samarajiwa, Yinyin Yuan, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486(7403):346–352, 2012.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[2] [2].↵
Lánczky A, Győrffy B. Web-based survival analysis tool tailored for medical research (KMplot): development and implementation. Journal of medical Internet research. 2021 Jul 26;23(7):e27633.
OpenUrl PubMed Google Scholar

[3] [3].↵
Dwivedi B, Mumme H, Satpathy S, Bhasin SS, Bhasin M. Survival Genie, a web platform for survival analysis across pediatric and adult cancers. Scientific Reports. 2022 Feb 23;12(1):3069.
OpenUrl Google Scholar

[4] [4].
Yang J, Zhao S, Wang J, Sheng Q, Liu Q, Shyr Y. Immu-Mela: An open resource for exploring immunotherapy-related multidimensional genomic profiles in melanoma. Journal of Genetics and Genomics. 2021 May 20;48(5):361–8.
OpenUrl Google Scholar

[5] [5].
Zhao T, Wang Z. GraphBio: a shiny web app to easily perform popular visualization analysis for omics data. Frontiers in Genetics. 2022:2265.
Google Scholar

[6] [6].↵
Gu Y, Gong Y, Wang M, Jiang S, Li Z, Yuan Z. Enhancing Kidney Failure Analysis: Web Application Development for Longitudinal Trajectory Clustering. medRxiv. 2023:2023–05.
Google Scholar

[7] [7].↵
Zhang D, Zhou F, Jiang Y, Fu Z. MM-BSN: Self-Supervised Image Denoising for Real-World with Multi-Mask based on Blind-Spot Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (pp. 4188–4197).
Google Scholar

[8] [8].↵
Wang Z, Li T, Zheng JQ, Huang B. When cnn meet with vit: Towards semi-supervised learning for multi-class medical image semantic segmentation. InComputer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII 2023 Feb 12 (pp. 424–441). Cham: Springer Nature Switzerland.
Google Scholar

[9] [9].↵
Wu J, Ye X, Mou C, Dai W. Fineehr: Refine clinical note representations to improve mortality prediction. In2023 11th International Symposium on Digital Forensics and Security (ISDFS) 2023 May 11 (pp. 1–6). IEEE.
Google Scholar

[10] [10].↵
Hajihosseini M, Faradmal J, Sadighi-Pashaki A. Survival analysis of breast cancer patients after surgery with an intermediate event: application of illness-death model. Iranian Journal of Public Health. 2015 Dec;44(12):1677.
OpenUrl Google Scholar

[11] [11].↵
Vahdaninia M, Montazeri A. Breast cancer in Iran: a survival analysis. Asian pacific journal of cancer prevention. 2004 Apr 1;5(2):223–5.
OpenUrl Google Scholar

[12] [12].↵
Tong L, Mitchel J, Chatlin K, Wang MD. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC medical informatics and decision making. 2020 Dec;20:1–2.
OpenUrl Google Scholar

[13] [13].
Cure-Cure CA, Cure P, Gu Y, Tian X, Patel T, Wu CO, Sviglin H, Sopko G, Csako G, Cody S, Dandi G. Predictors of all cause mortality and their gender differences in a hispanic population from barranquilla-colombia using machine learning with random survival forests. Circulation. 2018 Nov 6;138(Suppl_1):A16252-.
OpenUrl Google Scholar

[14] [14].
Jiang S, Gu Y, Kumar E. Magnetic Resonance Imaging (MRI) Brain Tumor Image Classification Based on Five Machine Learning Algorithms. Cloud Computing and Data Science. 2023 May 11:122–33.
Google Scholar

[15] [15].↵
Chi CL, Street WN, Wolberg WH. Application of artificial neural network-based survival analysis on two breast cancer datasets. InAMIA annual symposium proceedings 2007 (Vol. 2007, p. 130). American Medical Informatics Association.
Google Scholar

[16] [16].↵
Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, Zhang J, Salama P, Rizkalla M, Han Z, Huang K. SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Frontiers in genetics. 2019 Mar 8;10:166.
OpenUrl Google Scholar

[17] [17].↵
Yang J, Zhao S, Wang J, Sheng Q, Liu Q, Shyr Y. A pan-cancer immunogenomic atlas for immune checkpoint blockade immunotherapy.
Google Scholar

[18] [18].↵
Sachdev V, Tian X, Gu Y, Nichols J, Sidenko S, Li W, Beri A, Layne WA, Allen D, Wu CO, Thein SL. A phenotypic risk score for predicting mortality in sickle cell disease. British journal of haematology. 2021 Mar;192(5):932–41.
OpenUrl Google Scholar

[19] [19].↵
Sachdev V, Gu Y, Nichols J, Li W, Sidenko S, Allen D, Wu C, Thein SL. A Machine Learning Algorithm to Improve Risk Assessment for Patients with Sickle Cell Disease. Blood. 2019 Nov 13;134:893
OpenUrl Google Scholar

[20] [20].↵
Mukherjee A, Russell R, Chin SF, Liu B, Rueda OM, Ali HR, Turashvili G, Mahler-Araujo B, Ellis IO, Aparicio S, Caldas C. Associations between genomic stratification of breast cancer and centrally reviewed tumour pathology in the METABRIC cohort. NPJ breast cancer. 2018 Mar 7;4(1):5.
OpenUrl Google Scholar

Unveiling Breast Cancer Risk Profiles: A Comprehensive Survival Clustering Analysis Empowered by an Online Web Application for Personalized Medicine

Abstract

1. Introduction

2. Literature review

3. Methods

3.1 Data

3.2 Model

4. Results

5. Conclusion and Future Work

Data Availability

Footnotes

References

Subject Area

Citation Manager Formats

Unveiling Breast Cancer Risk Profiles: A Comprehensive Survival Clustering Analysis Empowered by an Online Web Application for Personalized Medicine

Abstract

1. Introduction

2. Literature review

3. Methods

3.1 Data

3.2 Model

4. Results

5. Conclusion and Future Work

Data Availability

Footnotes

References

Subject Area

Follow this preprint