Abstract
Objectives To develop and validate a deep learning (DL) based primary tumor biopsy signature for predicting axillary lymph node (ALN) metastasis preoperatively in early breast cancer (EBC) patients with clinically negative ALN.
Methods A total of 1058 EBC patients with pathologically confirmed ALN status were enrolled from May 2010 to August 2020. A deep learning core-needle biopsy (DL-CNB) model was built on the attention based multiple instance learning (AMIL) framework to predict ALN status utilizing the deep learning features, which were extracted from the cancer areas of digitized whole-slide images (WSIs) of breast CNB specimens annotated by two pathologists. Accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curves, and areas under the receiver operating characteristic curve (AUCs) were analyzed to evaluate our model.
Results The best performing DL-CNB model with VGG16_BN as the feature extractor achieved an AUC of 0.816 (95% confidence interval (CI): 0.758, 0.865) in predicting positive ALN metastasis in the independent test cohort. Furthermore, our model incorporating the clinical data, which was called DL-CNB+C, yielded the best accuracy of 0.831 (95%CI: 0.775, 0.878), especially for patients younger than 50 years (AUC: 0.918, 95%CI: 0.825, 0.971). The interpretation of DL-CNB model showed that the top signatures most predictive of ALN metastasis were characterized by the nuclei features including density (p=0.015), circumference (p=0.009), circularity (p=0.010), and orientation (p=0.012).
Conclusion Our study provides a novel deep learning-based biomarker on primary tumor CNB slides to predict the metastatic status of ALN preoperatively for patients with early breast cancer.
Introduction
Breast cancer (BC) has become the greatest threat to women’s health worldwide (1). Clinically, identification of axillary lymph node (ALN) metastasis is important for evaluating the prognosis and guiding the treatment for BC patients (2). Sentinel lymph node biopsy (SLNB) has gradually replaced ALN dissection (ALND) to identify ALN status, especially for early breast cancer (EBC) patients with clinically negative lymph nodes. Although SLNB had the advantage of less invasiveness than ALND, SLNB still caused some complications such as lymphedema, axillary seroma, paraesthesia, and impaired shoulder function (3, 4). Moreover, SLNB has been considered a controversial procedure, owing to the availability of radionuclide tracers and the surgeon’s experience (5, 6). In fact, SLNB can be avoided if there are some reliable methods of preoperative prediction of ALN status for EBC patients.
Several studies intended to predict the ALN status by clinicopathological data and genetic testing score (7, 8). However, due to the relatively poor predictive values and high genetic testing costs, these methods are often limited. Recently, deep learning (DL) can perform high-throughput feature extraction on medical images and analyze the correlation between primary tumor features and ALN metastasis information. In a previous study, deep features extracted from conventional ultrasound and shear wave elastography (SWE) were used to predict ALN metastasis, presenting an area under the curve (AUC) of 0.796 in the test set (9). Nevertheless, SWE has not been integrated into routine clinical breast examinations in many hospitals. Another recent study demonstrated that the DL model based on diffusion weighted magnetic resonance imaging (DWI-MRI) database of 172 patients achieved an AUC of 0.852 for preoperative prediction of ALN metastasis (10), but the small sample size enrolled could not be representative.
Currently, DL has enabled rapid advances in computational pathology (11, 12). For example, DL methods have been applied to segment and classify glomeruli with different staining and various pathologic changes, thus achieving the automatic analysis of renal biopsies (13, 14); meanwhile, there has shown promise for colorectal cancer detection (15, 16) by DL based automatic colonoscopy tissue segmentation and classification; besides, the analysis of gastric carcinoma and precancerous status can also benefit from DL schemes (17, 18). More recently, for the ALN metastasis detection, it is reported that DL algorithms on digital lymph node pathology images achieved better diagnostic efficiency of ALN metastasis than pathologists (19, 20). In particular, the assistance of algorithm significantly increases the sensitivity of detection for ALN micro-metastases (21). In addition to diagnosis, several previous studies indicated that deep features based on whole slide images (WSIs) of postoperative tumor samples potentially improved the prediction performance of lymph node metastasis in a variety of cancers (20, 22). So far, there is no relevant research on preoperatively predicting ALN metastasis based on WSIs of primary breast cancer samples. In this study, we investigated a clinical data set of EBC patients treated by preoperative core needle biopsy (CNB) to determine whether DL models based on primary tumor biopsy slides could help to refine the prediction of ALN metastasis.
Patients and Methods
Patients
On approval by the Institutional Ethical Committees of Beijing Chaoyang Hospital affiliated to Capital Medical University, we retrospectively analyzed data from EBC patients with clinically negative ALN from May 2010 to August 2020. Written consent was obtained from all patients and their families.
The detailed inclusion criteria were as follows: (1) patients with CNB pathologically confirmed primary invasive breast cancer; (2) patients who underwent breast surgery with SLN biopsy or ALND; (3) baseline clinicopathological data including age, tumor size, tumor type, ER/PR/HER-2 status and the number of ALN metastasis were comprehensive; (4) complete concordance of molecular status was found between CNB and excision specimens; (5) no history of preoperative radiotherapy and chemotherapy; and (6) adequate volume of biopsy materials with three or more cores for each patient.
The exclusion criteria included the following: (1) patients with physically or imaging positive ALN; (2) missing postoperative pathology information; (3) missing wax blocks and hematoxylin-eosin (HE) slices; (4) low-quality HE slices or WSI images. The patient recruitment workflow was showed in Fig. 1.
Deep learning model development
To avoid the inter-observer heterogeneity, all available tumor regions in each CNB slide were examined and annotated by two independent and experienced pathologists blinded to all patient-related information. A WSI was classified into positive (N(+)) or negative (N0) using the proposed deep learning CNB (DL-CNB) model. Our DL-CNB model was constructed with the attention based multiple instance learning (MIL) approach (23). In MIL, each training sample was called a bag, which consisted of multiple instances (24-26) (each instance corresponds to an image patch of size 256×256 pixels). Different from the general fully-supervised problem where each sample had a label, only the label of bags was available in MIL, and the goal of MIL was to predict the bag label by considering all included instances comprehensively. The whole algorithm pipeline comprised the following five steps:
Training data preparation (Fig. 2a). For each raw WSI, amounts of non-overlapping square patches were first cropped from the selected tumor regions. Then each WSI could be represented as a bag with N randomly selected patches. To increase the training samples, M bags were built for each WSI. All M bags were labeled as positive if the slide is an ALN metastasis case, and vice versa. Note that we could add the clinical information of the slide to all the M constructed bags to involve more useful information for predicting, and in this situation, the developed model was called DL-CNB+C.
Feature extraction (left part of Fig. 2b). N feature vectors were extracted for the N image instances in each bag by using a convolutional neural network (CNN) model. The performances of AlexNet (27), VGG16 (28) with batch norm (VGG16_BN), ResNet50 (29), DenseNet121 (30), and Inception-v3 (31) were compared to find the best feature extractor. At this stage, the clinical data were also preprocessed for feature extraction. Concretely, the numerical properties in clinical data were standardizing by removing the mean and scaling to unit variance, thus eliminating the effect of data range and scale; furthermore, considering that there was not a natural ordinal relationship between different values of the category attributes, the categorical properties in clinical data were encoded as the one-hot vectors, which could express different values equally.
MIL (right part of Fig. 2b). The extracted N feature vectors of image instances were first processed by the max-pooling (32-34) and reshaping, and then were passed to a two-layer fully connected (FC) layer. The N weight factors for the instances in the bag were thus obtained and then were further multiplied to the original feature vectors (23) for adaptively adjusting the effect of instance features. Finally, the weighted image feature vectors and the clinical features were fused by concatenation, due to the large difference of dimensions between image features and clinical features, the clinical features were copied 10 times for expansion. Then, the fused features were fed into the classifier, and the outputs and the ground truth labels were used to calculate the cross-entropy loss.
Model training and testing. We randomly divided the WSIs into training cohort and independent test cohort with the ratio of 4:1, and randomly selected 25% of the training cohort as the validation cohort. We used Adam optimizer with learning rate 1e-4 to update the model parameters, and weight decay 1e-3 for regularization. In the training phase, we used the cosine annealing warm restarts strategy to adjust the learning rate (35). In the testing phase, the ALN status is predicted by aggregating the model outputs of all bags from the same slide (Fig. 2c).
The deep learning models are available at: https://github.com/bupt-ai-cz/BALNMP.
Visualization of salient regions from DL-CNB model
We visualized the important regions which were more associated with metastatic status. After the processing of attention-based MIL pooling, the weights of different patches can be obtained and the corresponding feature maps were then weighted together in the following FC layers to conduct ALN status prediction. With the attention weights, we created a heat map to visualize the important salient regions in each WSI.
Interpretability of DL-CNB model with nuclei features
Interpretability of DL-CNB model with nuclei features was performed to study the contribution of different nuclei morphological characteristics in the prediction of lymph node metastasis (36, 37). Multiple specially designed nuclei features were firstly extracted for each WSI, and these features together formed a training bag. With the constructed feature bags, the proposed DL-CNB model was re-trained. The weights of different features (instances) can be obtained based on the attention-based MIL pooling, and thus the contribution of different features was yielded. The specific process was described in Fig. 3.
Statistical analysis
The logistic regression was used to predict ALN status by clinical data only model. The clinical difference of N0 and N(+) was compared by using the Mann-Whitney U test and chi-square test. The AUCs of different methods were compared by using Delong et al (38). The other measurements like accuracy (ACC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), and negative predictive value (NPV) were also used to estimate the model performance. All the statistics were two-sided and a p-value less than 0.05 was considered statistically significant. All statistical analyses were performed by MedCalc software (V 19.6.1; 2020 MedCalc Software bvba, Mariakerke, Belgium), Python 3.7, and SPSS 24.0 (IBM, Armonk, NY).
Results
Clinical characteristics
A total of 1058 patients with early breast cancer were enrolled for analysis. Among them, 957 (90.5%) patients had invasive ductal carcinomas, and 101 (9.5%) patients had invasive lobular carcinomas. There were 840 patients in the training cohort and 218 patients in the independent test cohort after all WSIs were randomly divided by using N0 as the negative reference standard and others as the positive. The average patient age was 57.6 years (range, 26-90 years) for the training and validation sets, 56.7 years (range, 22-87 years) for the test set. The mean ultrasound tumor size was 2.23 cm (range, 0.5-4.5 cm). A total of 556 patients (52.6%) had T1 tumors, while 502 patients (47.4%) had T2 tumors. According to the results of SLNB or ALND, positive lymph nodes were found in 403 patients. Among them, 210 patients (52.1%) had one or two positive lymph nodes (N+(1-2)) and 193 patients (47.9%) had three or more positive lymph nodes (N+(≥3)). As shown in Table 1, there was no significant difference between the detailed characteristics of the training and independent test cohort (all p > 0.05).
CNN model selection
The detailed results were summarized in supplemental Table 1. Based on the overall analysis, VGG16_BN model pre-trained on ImageNet (39) provided the best performance in the validation cohort and the independent test cohort (AUC: 0.808, 0.816), compared with AlexNet (AUC: 0.764, 0.780), ResNet50 (AUC: 0.644, 0.607), DenseNet121 (AUC: 0.714, 0.739), and Inception-v3 (AUC: 0.753, 0.762). Furthermore, considering other metrics, VGG16_BN achieved the best ACC, SPEC, and PPV in the independent test cohort. VGG16_BN consisted of (convolution layer, batch normalization layer, ReLU) as the basic block where ReLU played a role of activation function to provide the non-linear capability, and max-pooling layers were inserted between basic blocks for down-sampling, besides, there was an adaptive average pooling layer at the end of VGG16_BN for obtaining features with a fixed size. The details of VGG16_BN was described in supplemental Table 2.
Predictive value of DL-CNB+C model between N0 and N(+)
In the training cohort, DL-CNB+C achieved the AUC of 0.878, while DL-CNB and classification by clinical data only model achieved AUCs of 0.901 and 0.661, respectively. And in the validation cohort, the DL-CNB+C model achieved the AUC of 0.823, which was higher than the AUC of 0.808 got by DL-CNB only and the AUC of 0.709 got by classification by clinical data.
In the independent test cohort, the DL-CNB+C model still achieved the highest AUC of 0.831 which was better than the AUC of DL-CNB only (AUC: 0.816, p = 0.453) and classification by clinical data only (AUC: 0.613, p < 0.0001). The ACC, SENS, and NPV of DL-CNB+C were also better than other methods. The detailed statistical results were summarized in Table 2 and its corresponding ROCs were shown in Fig. 4.
We further divided N(+) into low metastatic potential (N+(1-2)) and high metastatic potential (N+(≥3)) according to the number of ALN metastasis. Adopting N0 as the negative reference standard, the combined model showed better discriminating ability between N0 and N+(1-2) (AUC: 0.878), between N0 and N+(≥3) (AUC: 0.838).
The detailed statistical results were summarized in supplemental Table 3 and Table 4, and the corresponding ROCs were shown in supplemental Fig. 1 and Fig. 2.
Predictive value of DL-CNB+C model among N0, N+(1-2) and N+(≥3)
The overall AUC of multi-classification in the independent test cohort based on DL-CNB +C model was 0.791, there existed the highest precision and recall of 0.747 and 0.947 respectively in N0, there existed the precision and recall of 0.556 and 0.400 in N+(1-2), and there existed the precision and recall of 0.375 and 0.162 in N+(≥3). The confusion matrix under the classification threshold of 0.5 was shown in Fig. 5. According to the results, the model performed well in differentiating the N0 group while showed poor diagnostic efficacy in the other two groups.
Subgroup analysis of DL-CNB+C model
Furthermore, we analyzed the measurement results of the different subgroups in the independent test cohort of predicting ALN status between N0 and N(+) by the DL-CNB+C model. The detailed statistical results were summarized in supplemental Table 5. In the independent test cohort, compared with the AUC of 0.794 (95%CI: 0.720, 0.855) in the subgroup of age > 50, there existed better performance in the subgroup of age ≤ 50 with the AUC of 0.918 (95%CI: 0.825, 0.971, p = 0.015). There were no significant differences regarding other subgroups of ER(+) vs. ER(-) (p = 0.125), PR(+) vs. PR(-) (p = 0.659), HER-2(+) vs. HER-2(-) (p = 0.524), T1 vs. T2 stage (p = 0.743) between N0 and N(+).
Interpretability of DL-CNB model
To investigate the interpretability of the DL-CNB, we conducted two studies for digging the correlation factors of ALN status prediction. In the first study, we adopted the attention-based MIL pooling to find the important regions that contributing to the prediction. The heat map in Fig. 6a highlights the red patches as the important regions. Although the obtained important areas can provide some clues to the diagnosis of DL-CNB model, it is not clear that the model makes decisions based on what features of the tumor area.
In the second study, we specially designed and extracted multiple nuclei features for each WSI. The weights of different features were then obtained based on the same attention-based MIL pooling in our DL-CNB. The weights highlighted the nuclei features that were most relevant to the ALN status prediction of each WSI. We found that the WSI of N(+) group had higher nuclear density (p=0.015) and orientation (p=0.012) but lower circumference (p=0.009), circularity (p=0.010) and area (p=0.024) compared with N0 group (Fig. 6b and 6c). There were no significant differences in other nuclei features including major axis (p=0.083), minor axis (p=0.065), rectangularity (p=0.149) between N0 and N(+).
Discussion
In most previous studies, DL signatures of ALN metastases were based on medical images such as ultrasound, CT, and MRI images (10, 40, 41). However, since many patients had undergone CNB at the time of imaging examination, and the reactive changes such as needle path in the tumor would result in the predictive inaccuracy of imaging information. This study focused on preoperative CNB WSI, which also played an important role in breast cancer management and has been increasingly performed in clinical practice. Preoperative CNB can provide not only the histopathological diagnosis of breast cancer but also the molecular status including ER/PR/HER-2 status, which is associated with ALN metastasis(42). Otherwise, the morphological features of tumor cells can be visualized on CNB WSI. Therefore, primary tumor biopsy WSI as a complementary imaging tool has the potential for ALN metastasis prediction. To the best of our knowledge, this is the first study to apply the deep learning-based histopathological features extracted from primary tumor WSIs for ALN prediction analysis.
Here, the best-performing DL-CNB model yielded satisfactory predictions with an AUC of 0.816, a SENS of 81.0%, and a SPEC of 70.9% on the test set, which had superior predictive capability compared with clinical data alone. Furthermore, unlike other combined models incorporating clinical data (7, 9), the DL-CNB+C model slightly improved the ACC to 0.831, which showed that our results were mainly derived from the contribution of DL-CNB model. In addition, during the subgroup analysis stratified by patient’s age, our DL-CNB+C model achieved an AUC of 0.918 for patients younger than 50 years, indicating that age was the critical factor in predicting ALN status. Regarding the number of ALN metastasis, the DL-CNB+C model showed better discriminating ability between N0 and N+(1-2), between N0 and N+(≥3). However, the unfavorable discriminating ability was found between N+(1-2) and N+(≥3). This was consistent with the study of Zheng et al. (9) who also reported the poor efficacy between N+(1-2) and N+(≥3) utilizing the DL radiomics model. In the future, further exploration of ALN staging prediction is needed.
Indeed, computer-assisted histopathological analysis can provide a more practical and objective output (43). For example, different molecular subtypes (44) and Oncotype DX risk score (45) occurring in breast cancer could be directly predicted from the HE slides. On one hand, our DL model can provide significant information for risk stratification and axillary staging, thereby avoiding axillary surgery and reducing the complication and hospitalization costs. On the other hand, our results also highlight the development of algorithms based on artificial intelligence, which will reduce the labor intensity of pathologists. Similar approaches may be used to the pathology of other organs.
In our study, we are first to quantitatively assess the role of nuclear disorder in predicting ALN metastasis in breast cancer. Our finding is consistent with several recent studies that demonstrate the powerful predictive effect of nuclear disorder on patient survival (46, 47). Interestingly, the top predictive signatures that distinguished N0 from N(+) were characterized by the nuclei features including density, circumference, circularity, and orientation. We found that the WSI of N(+) had higher nuclear density and polarity but lower circularity, which was understandable since in the tumors with ALN metastasis, tumor cells became poorly differentiated as a result of rapid cell growth, encouraging the nuclei in these structures to form highly clustered and consistently metastatic patterns. Our results showed that nuanced patterns of nuclei density and orientation of tumor cells are important determinants of ALN metastasis.
There are some limitations in our study. First, the selection of regions of interest within each CNB slide required pathologist guidance. Future studies will explore more advanced methods for automatic segmentation of tumor regions. Secondly, this is a retrospective study, prospective validation of our model in a large multicenter cohort of early breast cancer patients is necessary to assess the clinical applicability of the biomarker. Thirdly, recent evidence indicated that a set of features related to tumor-infiltrating lymphocytes (TILs) was found to be associated with positive LNs in bladder cancer (22). However, due to few TILs on breast CNB slides, we only selected sufficient tumor cells for the identification of salient regions rather than whole slides. Finally, we only chose HE stained images of CNB samples. The clinical utility of immunochemical stained images remains to be established as an interesting attempt.
Conclusion
In brief, we demonstrated that a novel deep learning-based biomarker on primary tumor CNB slides predicted ALN metastasis preoperatively for EBC patients with clinically negative ALN, especially for younger patients. Our methods could help to avoid unnecessary axillary surgery based on the widely collected HE stained histopathology slides, thereby contributing to precision oncology treatment.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Conflict of Interest
The authors declare that they have no competing interests.
Ethics approval and consent to participate
This study was approved by the institutional review board of Beijing Chao-Yang Hospital, Capital Medical University. All patients provided written informed consent.
Authors’ contributions
FX, CZ, JL, YW, and MJ designed the study. CZ, WT, YZ, JL trained the model. FX, YW, ZS, JL, and HJ collected the data. FX, WT, YZ, CZ, YW, MJ and JL analyzed and interpreted the data. FX, CZ, WT, YZ, and MJ prepared the manuscript. All authors read and approved the final manuscript.
Funding
The work was supported by National Natural Science Foundation of China [No. 8197101438].
Data Availability
Excel files containing raw data included in the main figures and tables can be found in the Source Data File in the article. All other data are available in the Article and Supplementary Information. All other data including the imaging data can be provided upon reasonable request to the corresponding author.
Supplemental Materials
Footnotes
Competing interests The authors declared no competing interests existing in this work.