Abstract
Mammography is used as secondary prevention for breast cancer. Computer-aided detection and image-based short-term risk estimation were developed to improve the accuracy of mammography. However, most approaches inherently lack the ability to connect observations at the mammography level to observations of cancer onset and progression seen at a smaller scale, which can occur years before imageable cancer and lead to primary prevention. The Hurst exponent (H) can quantify mammographic tissue into regions of dense tissue undergoing active restructuring and regions that remain passive, with amounts of active and passive dense tissue that differ between cancer and controls at diagnosis. A longitudinal retrospective case-control study was conducted to test the hypothesis that differences can be detected before diagnosis and changes could signal developing cancer. Mammograms and reports were collected from 50 patients from Maine Medical Center in 2015 with at least a 5-year screening history. Age-matching patients within 2 years created a primary dataset, and within 5 years, a secondary dataset was created to test for sensitivity. The amount of passive (H ≥ 0.55) and active dense tissue (0.45 < H < 0.55) was calculated for each breast and was predicted by creating a linear mixed-effects model. Cancer status was a predictor for passive (p = 0.036) and active (p = 0.025) dense tissue using the primary dataset. However, when increasing the power, cancer status was a predictor for active dense tissue (p = 0.013), while breast status (p = 0.004), time (p = 0.009), and interaction (p = 0.038) were predictors for passive dense tissue. This suggests active dense tissue is a risk for cancer and passive dense tissue is an indication of developing cancer.
Required Key Messages
Mammographic dense breast tissue can be separated into regions of active and passive.
There is more active dense breast tissue in pathology-confirmed cancer cases than controls.
Increases in passive dense tissue in a breast could indicate a developing tumor.
INTRODUCTION
Breast cancer stands as the most diagnosed cancer globally and is the second leading cause of cancer-related fatalities among women [1]. Addressing the high incidence rates of this disease through preventive measures can enhance patient outcomes and alleviate the burden of breast cancer on both public health and the economy. Cancer prevention is achieved through interventions, categorized as primary, secondary, and tertiary [2]. Considerable research has been devoted to secondary prevention strategies for breast cancer, aimed at advancing early detection, diagnosis, and removal of cancer and pre-cancerous conditions before they progress beyond their initial site through screening when treatment is most likely successful [3]. Screening guidelines are provided by entities such as the US Preventive Task Force [4], the American Cancer Society Field [5], the American College of Obstetrics & Gynecology [6], and the American College of Radiology [5, 6]. These guidelines recommend mammography, the only modality shown to decrease mortality [7], as the imaging modality for most women. Nevertheless, the extent of mortality reduction attributed to mammography screening ranges from 19% to 40%, contingent on age and breast density [7], with sensitivity varying from 86% to 89% in women with minimal dense breast tissue to 62-68% in those with highly dense breasts [8].
Recently, potentially modifiable risk factors have been causally linked to a wide range of cancers [9], and approximately 40% of cancers can be prevented by reducing risk factors and implementing primary prevention strategies [10]. Taken with the continued increase in incidence rates and with breast cancer becoming more common among younger women [11, 12], there is a growing emphasis on the primary prevention of breast cancer to hinder the start of the carcinogenic process. Risk models and genetic testing can help identify individuals at an increased risk of developing breast cancer [13]. However, known genetic predisposition or heredity plays a limited role in cancer, accounting for only 5% to 10% of all cancer cases [10]. Traditional risk models, such as the Tyrer-Cuzick, Gail, and Breast Cancer Surveillance Consortium (BCSC) models, are based on varying familial and personal health histories and some models are not calibrated for all populations [14].
Breast density has recently been recognized as one of the strongest independent risk factors for breast cancer, with women with dense breasts having a higher risk of developing breast cancer than women with non-dense breasts [15]. Incorporating breast density measurements has marginally improved some models’ predictive performance to ∼70% [16]. However, the association between breast density and its link to cancer remains unclear. In addition, the World Health Organization estimates that 50% of breast cancer cases do not have known identifiable risk factors [17], which creates a missed opportunity to provide enhanced surveillance or risk reduction methods to women at elevated risk to reduce both the societal and economic impact of breast cancer.
New efforts involve applying artificial intelligence to screening mammography to overcome the limitations of traditional approaches to breast cancer risk assessments. Several models that estimate breast cancer risk scores have been developed, including Mirai, Globally-Aware Multiple Instance Classifier, MammoScreen, ProFound AI, and Mia, and these models have better predictive performance at 0 to 5 years than the BCSC risk model that includes traditional risk factors (BCSC area under the receiver operator curve (AUC) = 0.61, AI algorithms’ AUCs= 0.63-0.67) [18]. Furthermore, advancements in radiomics have allowed for improved quantification and inclusion of parenchymal textural complexity and patterns into models to improve risk estimation beyond breast density [19]. However, AI approaches are not always generalizable to new settings and populations, such as races, ethnicities, and mammography equipment outside of the training set [20]. Furthermore, their generalizability has yet to be robustly demonstrated, with one study showing recall rates increased by 3-fold following mammography equipment software upgrades [20].
In addition, AI’s inherent lack of explainability and inability to link to known cancer dynamics plays a role in the hesitancy to adopt it in a clinical setting. The biophysical processes of tumor onset have been studied extensively at the cellular level [21-24]. Still, limited research has been done to explore which, if any, of these processes can lead to large-scale features that could be captured on screening mammograms. The development and progression of malignant tumors are intricately influenced by the cancer cells and the surrounding tissues and cells collectively known as the tumor microenvironment [7]. Comprising stromal cells, immune cells, extracellular matrix, and blood vessels, the tumor microenvironment interacts with cancer cells, crucial in promoting or inhibiting tumor growth and invasion. In breast cancer, the tumor microenvironment assumes particular significance, as events during breast development and exposure to various risk factors can reshape the breast microenvironment, establishing a permissive setting for cancer initiation and progression. It has been established that tumor onset and progression lead to disorganization and begin approximately 8 years before an imageable tumor [25]. Changes in breast tissue seen in mammography, including increased mammographic breast density, may be associated with elevated collagen levels and the structural organization of stroma, which influences tumor invasion dynamics [11, 14]. Therefore, a metric that can quantify subtle signs of dense breast tissue that is undergoing active restructuring vs passive dense breast tissue could provide further insights into developing abnormalities and the associated risk for breast cancer.
The 2D Wavelet-Transform Modulus Maxima (WTMM) method has been used in several fields to analyze complex signals to extract features and quantify spatial structure to gain insights into the underlying mechanisms of complex organizations [26-31]. In previous studies, the 2D WTMM method was employed to capture the structural organization of mammographic tissue, via the Hurst exponent (H), and the calculated organization was inferred to be linked to the structure of the tumor microenvironment at the time of diagnosis [32, 33]. The method allows for segmenting dense breast tissue into regions of active dense tissue, i.e., regions that show structural reorganization occurring and are inferred to be linked to the dynamics of cancer onset and progression, and regions of passive dense tissue. This research aims to computationally quantify mammographic breast tissue composition by detecting active and passive dense tissue regions and assess if longitudinal changes in the tissue differ between cancer cases and controls.
METHODS
This study received IRB Approval with Waiver of Informed Consent/Authorization (IRB #4664) from Maine Medical Center (Portland, ME) on September 6, 2015, and was compliant with the Health Insurance Portability and Accountability Act (HIPAA).
Cohort Description
“FOR PRESENTATION” mammographic images of the standard bilateral mammographic views, i.e. right and left mediolateral oblique (MLO) and cranial caudal (CC), from full-field digital mammography were retrospectively collected from Maine Medical Center (Portland, ME, USA) in 2015 from women with at least a 5-year screening exam history. Screen-detected breast cancer cases were confirmed to be malignant by biopsy within 12 months of the last screening exam. Controls had no history of cancer or benign breast disease. The tumorous breast, i.e. the breast that contained the pathology-confirmed malignancy, and the contralateral breast were identified in the accompanying pathology reports for the malignant cases. Breast density scores of A: almost entirely fatty, B: scattered areas of fibroglandular density, C: heterogeneously dense, or D: extremely dense, were assigned to mammogram exams by two expert breast radiologists (AH and CC) following the BI-RADS 5th edition [34].
The primary dataset was created by age-matching patients using their age at the time of the last screening before diagnosis for malignant cases and the time of the last visit for controls. Using nearest neighbor logistic regression propensity score matching, eligible matches were restricted to be within 2 years of each other. Up to two controls were matched to each malignant case using the MatchIt function in R [35]. To test sensitivity and explore the outcomes associated with increasing the power, a second dataset was created with eligible matches being restricted within 5 years of each other.
Analysis of Mammographic Images (Figure 1)
The analysis used the four standard bilateral mammographic views: right MLO, left MLO, right CC, and left CC. As a preprocessing step, black and white binary masks were generated through visual inspection. The breast tissue was contoured manually using the polygon feature in Fiji [36] to eliminate the image background, label, and pectoral muscle, and a mask that segmented the breast tissue was produced, which was then utilized for subsequent analysis (Fig 1A-C). A 360×360 pixel sliding window was positioned at the top left of the segmented breast tissue. The sliding window shifted from left to right and top to bottom with a step size of 32 pixels between subregions. If the central 256×256 of each subregion was entirely contained inside the mask, the subregion was accepted for further analysis (Fig. 1D-H). Each subimage was wavelet transformed across 50 different size scales. The corresponding maxima chains and their maxima, maxima lines, partition functions, h(a, q) and D(a, q) were generated following the methods described by Marin et al. [15] and Gerasimova-Chechkina [16]. Following these calculations only the central 256×256 pixels of each subimage was kept to mitigate edge effects (Fig. 1I-K).
To objectively determine the optimal scale range for fitting power-law curves in D(q, a) vs. log2(a) and h(q, a) vs. log2(a) plots, a window was varied along log2(a). The window was defined by a lower bound (amax) and an upper bound (amin) of a, varying from log2amin = 0, 0.1, …, 2.1 and from log2amax = 2.0, 2.1, …, 4.9 respectively, in σw units, where σw = 7 pixels. All possible combinations of amin and amax with a window width being at least log2amax – log2amin = 1.0 wide, were considered. For each such (amin, amax) window, h(q) and D(q) were calculated, along with the goodness of fit R2 of h(q = 0), denoted R2h(q=0). Additionally, the weighted standard deviation of h across all q values, denoted sdw, and the weighted average of R2 of h(q) over all values of q, denoted <R2w>, were also calculated, according to the weights in Marin, et al. [15]. The further consideration of (amin, amax) windows was subject to the fulfillment of several conditions. The first requirement was that the support dimension, represented by D(q = 0), fell within the range of 1.7 to 2.5, considering the potential impact of finite size effects on the multiplication of maxima lines as the scale parameter a approached 0. A window was only considered if it had an R2h(0) value exceeding 0.90, ensuring that the h(q = 0) curve was linear enough to provide a dependable exponent. A low weighted standard deviation for h, specifically sdw < 0.06, was also essential to exclude subregions demonstrating multifractal scaling. Finally, the condition <R2w> > 0.90 was imposed to guarantee that all h(q, a) curves were sufficiently linear, with greater weight allocated to those closer to q = 0.
Based on the resulting H, each subregion was classified into one of three groups: fatty tissue (H ≤ 0.45, Fig. 1I5), active dense tissue (0.45<H<0.55, Fig. 1J5), or passive dense tissue (H ≥ 0.55, Fig. 1K5). The area (cm2) of each tissue type was estimated for all four mammographic views (Fig. 1L, 1M). The right and left MLO and CC views were used to calculate both breasts’ maximum area of passive and active dense tissue to obtain a score for each breast.
Statistical Methods
The amount of passive and active dense tissue was predicted by creating linear mixed-effects (LME) models. The fixed effects for this model included time (to diagnosis for cancer cases and to the last visit for controls), cancer status, and breast status. An interaction term between time and breast status was fitted to test the hypothesis of changes occurring in dense breast tissue in tumorous breasts vs non-tumorous breasts (Table 1). Random effects included breast (left or right) nested within participant nested within case control strata obtained from age-matching. The correlation between repeated visits was modeled using autocorrelation of order 1. All statistical analyses were performed in R [37].
RESULTS
Mammogram data and accompanying pathology reports were collected from 50 patients (27 controls and 23 malignant cases), with mammograms obtained using Hologic’s Selena Lorad (Malborough, MA). The patients’ age at the last screening for malignant cases and the time of the last visit for controls ranges from 40 to 85, with a mean age of 65.39 ± 10.03 for malignant cases and 59.56 ± 10.84 for controls. Age-matching with 2 years resulted in keeping 83.7% (24/27 controls and 17/23 malignant cases) of the data and within 5-years, resulted in keeping % (26/27 controls and 20/23 malignant cases) of the data (Table 2). Fatty, active, and passive dense tissue were identified in the four bilateral standardized mammographic views for both controls and malignant cases at each time point (Fig. 2).
Using the primary dataset, which includes controls and cancer cases age-matched within 2 years, the amount of passive and active dense tissue differed between cancer and controls (Table 3A). Cancer cases showed 10.30 cm2 (CI = 0.75 − 19.85) more passive dense tissue than the average 20.46 cm2 found in controls (p = 0.036), and 6.34 cm2 (, CI = 0.88 − 11.79) more active dense tissue than the 18.55 cm2 found in controls (p = 0.025). The amount of passive dense tissue also was affected by the breast status (i.e. tumor or no tumor). Breasts that contained a tumor had t 4.16 cm2 (CI = 0.61 − 7.70) more passive dense tissue than breasts without tumors (p = 0.023). No time effect was detected for the amounts of passive or active dense tissue in the primary dataset.
The model constructed using the secondary dataset, which included patients age-matched within 5 years and increased the power, showed similar estimates for the main effects and interaction terms in the model (Table 3B). Like the results using the primary dataset, the model showed that cancer cases had more active dense tissue than controls (p = 0.013), with cancer cases having 8.15 cm2 (CI = 1.84 − 14.44) than controls and that breast that contained a tumor had more passive dense tissue than breasts that did not contain a tumor (p = 0.004). However, the amount of passive dense tissue was only suggestive of differing between cancer cases and controls (p = 0.074). In addition, there was a time effect detected for the amount of passive dense tissue, with the amount of passive dense tissue decreasing by 0.40 cm2 (CI = −0.69 − −0.10) per year over time for cancer cases and controls (p = 0.009). Furthermore, the amount of passive dense tissue in breasts that contained a tumor increased 0.67 cm2 (CI = 0.04 − 1.30) per year more than breasts that did not contain a tumor (p = 0.038).
CONCLUSION
Computer-aided detection and technologies that estimate short-term risk have improved the secondary prevention of breast cancer. However, there has been limited research to determine how the changes in breast tissue structure on mammograms connect to the smaller-scale biophysical processes of cancer onset and progression. In prior work, mammographic breast tissue was classified into subtypes (i.e., fatty, passive dense, and active dense) using H obtained from the 2D WTMM method and showed that cancer cases had more passive and active dense tissue at the time of diagnosis. Using a LME model with time, cancer status, breast status, and the interaction between time and breast status to predict the amount of passive and active dense tissue has provided insights into a possible new risk factor for breast cancer. Cancer status was a predictor for passive (p = 0.036) and active (p = 0.025) dense tissue using the primary dataset. However, when increasing the power, cancer status was a predictor for active dense tissue (p = 0.013), while breast status (p = 0.004), time (p = 0.009), and interaction (p = 0.038) were predictors for passive dense tissue. This suggests active dense tissue is a risk for cancer and passive dense tissue is an indication of developing cancer.
One limitation of this study was the small sample size, with no information on race or ethnicity. Furthermore, the mammograms used were “FOR PRESENTATION” images from a single vendor. Since algorithms to produce “FOR PRESENTATION” image from “FOR PROCESSING” image vary by vendor, these results must be validated on images obtained from different vendors. In addition to validating our results on a larger and more diverse data set to overcome the limitations discussed, next steps would be to incorporate passive and active dense tissue measurements into a risk model and explore using images obtained from tomosynthesis and “FOR PROCESSING” images.
Data Availability
The authors will make the data available upon reasonable request.
DATA AVAILABILITY STATEMENT
The authors will make the data available upon reasonable request.
AUTHOR CONTRIBUTIONS
Experimental design: KB, AK. WTMM calculations: AK, BW. Radiological breast density assessments: AH, CC. Design and implementation of statistical experiments: CL, KB. Figure and table preparation: KB, AK. Manuscript writing: KB, AK. All authors have read and approved of the manuscript.
FUNDING
Research reported in this manuscript was partially supported by National Cancer Institute of the National Institutes of Health under award number R15CA246335. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. KB acknowledges partial financial support from the University of Maine through a Janet Waldron Doctoral Research Fellowship.
ACKNOWLEDGEMENTS
We are grateful to Drs Anne Breggia, Ivette Emery, and Joe Schulte from MaineHealth for the mammography database, and to CompuMAINE Lab members Arihant Tallapureddy, Melissa Ham, and Sarah Glatter for the manual delineations. We also thank Drs. Karissa Tilbury, Zheng Wei, and David Bradley, for technical discussions.