Longitudinal Case-Control Study of Active and Passive Dense Mammographic Breast Tissue ====================================================================================== * Kendra Batchelder * Basel White * Christina Cinelli * Amy Harrow * Christine Lary * Andre Khalil ## Abstract Mammography is used as secondary prevention for breast cancer. Computer-aided detection and image-based short-term risk estimation were developed to improve the accuracy of mammography. However, most approaches inherently lack the ability to connect observations at the mammography level to observations of cancer onset and progression seen at a smaller scale, which can occur years before imageable cancer and lead to primary prevention. The Hurst exponent (*H*) can quantify mammographic tissue into regions of dense tissue undergoing active restructuring and regions that remain passive, with amounts of active and passive dense tissue that differ between cancer and controls at diagnosis. A longitudinal retrospective case-control study was conducted to test the hypothesis that differences can be detected before diagnosis and changes could signal developing cancer. Mammograms and reports were collected from 50 patients from Maine Medical Center in 2015 with at least a 5-year screening history. Age-matching patients within 2 years created a primary dataset, and within 5 years, a secondary dataset was created to test for sensitivity. The amount of passive (*H* ≥ 0.55) and active dense tissue (0.45 < *H* < 0.55) was calculated for each breast and was predicted by creating a linear mixed-effects model. Cancer status was a predictor for passive (*p* = 0.036) and active (*p* = 0.025) dense tissue using the primary dataset. However, when increasing the power, cancer status was a predictor for active dense tissue (*p* = 0.013), while breast status (*p* = 0.004), time (*p* = 0.009), and interaction (*p* = 0.038) were predictors for passive dense tissue. This suggests active dense tissue is a risk for cancer and passive dense tissue is an indication of developing cancer. **Required Key Messages** * Mammographic dense breast tissue can be separated into regions of active and passive. * There is more active dense breast tissue in pathology-confirmed cancer cases than controls. * Increases in passive dense tissue in a breast could indicate a developing tumor. ## INTRODUCTION Breast cancer stands as the most diagnosed cancer globally and is the second leading cause of cancer-related fatalities among women [1]. Addressing the high incidence rates of this disease through preventive measures can enhance patient outcomes and alleviate the burden of breast cancer on both public health and the economy. Cancer prevention is achieved through interventions, categorized as primary, secondary, and tertiary [2]. Considerable research has been devoted to secondary prevention strategies for breast cancer, aimed at advancing early detection, diagnosis, and removal of cancer and pre-cancerous conditions before they progress beyond their initial site through screening when treatment is most likely successful [3]. Screening guidelines are provided by entities such as the US Preventive Task Force [4], the American Cancer Society Field [5], the American College of Obstetrics & Gynecology [6], and the American College of Radiology [5, 6]. These guidelines recommend mammography, the only modality shown to decrease mortality [7], as the imaging modality for most women. Nevertheless, the extent of mortality reduction attributed to mammography screening ranges from 19% to 40%, contingent on age and breast density [7], with sensitivity varying from 86% to 89% in women with minimal dense breast tissue to 62-68% in those with highly dense breasts [8]. Recently, potentially modifiable risk factors have been causally linked to a wide range of cancers [9], and approximately 40% of cancers can be prevented by reducing risk factors and implementing primary prevention strategies [10]. Taken with the continued increase in incidence rates and with breast cancer becoming more common among younger women [11, 12], there is a growing emphasis on the primary prevention of breast cancer to hinder the start of the carcinogenic process. Risk models and genetic testing can help identify individuals at an increased risk of developing breast cancer [13]. However, known genetic predisposition or heredity plays a limited role in cancer, accounting for only 5% to 10% of all cancer cases [10]. Traditional risk models, such as the Tyrer-Cuzick, Gail, and Breast Cancer Surveillance Consortium (BCSC) models, are based on varying familial and personal health histories and some models are not calibrated for all populations [14]. Breast density has recently been recognized as one of the strongest independent risk factors for breast cancer, with women with dense breasts having a higher risk of developing breast cancer than women with non-dense breasts [15]. Incorporating breast density measurements has marginally improved some models’ predictive performance to ∼70% [16]. However, the association between breast density and its link to cancer remains unclear. In addition, the World Health Organization estimates that 50% of breast cancer cases do not have known identifiable risk factors [17], which creates a missed opportunity to provide enhanced surveillance or risk reduction methods to women at elevated risk to reduce both the societal and economic impact of breast cancer. New efforts involve applying artificial intelligence to screening mammography to overcome the limitations of traditional approaches to breast cancer risk assessments. Several models that estimate breast cancer risk scores have been developed, including *Mirai, Globally-Aware Multiple Instance Classifier, MammoScreen, ProFound AI*, and *Mia*, and these models have better predictive performance at 0 to 5 years than the BCSC risk model that includes traditional risk factors (BCSC area under the receiver operator curve (AUC) = 0.61, AI algorithms’ AUCs= 0.63-0.67) [18]. Furthermore, advancements in radiomics have allowed for improved quantification and inclusion of parenchymal textural complexity and patterns into models to improve risk estimation beyond breast density [19]. However, AI approaches are not always generalizable to new settings and populations, such as races, ethnicities, and mammography equipment outside of the training set [20]. Furthermore, their generalizability has yet to be robustly demonstrated, with one study showing recall rates increased by 3-fold following mammography equipment software upgrades [20]. In addition, AI’s inherent lack of explainability and inability to link to known cancer dynamics plays a role in the hesitancy to adopt it in a clinical setting. The biophysical processes of tumor onset have been studied extensively at the cellular level [21-24]. Still, limited research has been done to explore which, if any, of these processes can lead to large-scale features that could be captured on screening mammograms. The development and progression of malignant tumors are intricately influenced by the cancer cells and the surrounding tissues and cells collectively known as the tumor microenvironment [7]. Comprising stromal cells, immune cells, extracellular matrix, and blood vessels, the tumor microenvironment interacts with cancer cells, crucial in promoting or inhibiting tumor growth and invasion. In breast cancer, the tumor microenvironment assumes particular significance, as events during breast development and exposure to various risk factors can reshape the breast microenvironment, establishing a permissive setting for cancer initiation and progression. It has been established that tumor onset and progression lead to disorganization and begin approximately 8 years before an imageable tumor [25]. Changes in breast tissue seen in mammography, including increased mammographic breast density, may be associated with elevated collagen levels and the structural organization of stroma, which influences tumor invasion dynamics [11, 14]. Therefore, a metric that can quantify subtle signs of dense breast tissue that is undergoing active restructuring vs passive dense breast tissue could provide further insights into developing abnormalities and the associated risk for breast cancer. The 2D Wavelet-Transform Modulus Maxima (WTMM) method has been used in several fields to analyze complex signals to extract features and quantify spatial structure to gain insights into the underlying mechanisms of complex organizations [26-31]. In previous studies, the 2D WTMM method was employed to capture the structural organization of mammographic tissue, via the Hurst exponent (*H*), and the calculated organization was inferred to be linked to the structure of the tumor microenvironment at the time of diagnosis [32, 33]. The method allows for segmenting dense breast tissue into regions of active dense tissue, i.e., regions that show structural reorganization occurring and are inferred to be linked to the dynamics of cancer onset and progression, and regions of passive dense tissue. This research aims to computationally quantify mammographic breast tissue composition by detecting active and passive dense tissue regions and assess if longitudinal changes in the tissue differ between cancer cases and controls. ## METHODS This study received IRB Approval with Waiver of Informed Consent/Authorization (IRB #4664) from Maine Medical Center (Portland, ME) on September 6, 2015, and was compliant with the Health Insurance Portability and Accountability Act (HIPAA). ### Cohort Description “FOR PRESENTATION” mammographic images of the standard bilateral mammographic views, i.e. right and left mediolateral oblique (MLO) and cranial caudal (CC), from full-field digital mammography were retrospectively collected from Maine Medical Center (Portland, ME, USA) in 2015 from women with at least a 5-year screening exam history. Screen-detected breast cancer cases were confirmed to be malignant by biopsy within 12 months of the last screening exam. Controls had no history of cancer or benign breast disease. The tumorous breast, i.e. the breast that contained the pathology-confirmed malignancy, and the contralateral breast were identified in the accompanying pathology reports for the malignant cases. Breast density scores of A: almost entirely fatty, B: scattered areas of fibroglandular density, C: heterogeneously dense, or D: extremely dense, were assigned to mammogram exams by two expert breast radiologists (AH and CC) following the BI-RADS 5th edition [34]. The primary dataset was created by age-matching patients using their age at the time of the last screening before diagnosis for malignant cases and the time of the last visit for controls. Using nearest neighbor logistic regression propensity score matching, eligible matches were restricted to be within 2 years of each other. Up to two controls were matched to each malignant case using the *MatchIt* function in R [35]. To test sensitivity and explore the outcomes associated with increasing the power, a second dataset was created with eligible matches being restricted within 5 years of each other. ### Analysis of Mammographic Images (Figure 1) The analysis used the four standard bilateral mammographic views: right MLO, left MLO, right CC, and left CC. As a preprocessing step, black and white binary masks were generated through visual inspection. The breast tissue was contoured manually using the polygon feature in Fiji [36] to eliminate the image background, label, and pectoral muscle, and a mask that segmented the breast tissue was produced, which was then utilized for subsequent analysis (**Fig 1A-C**). A 360×360 pixel sliding window was positioned at the top left of the segmented breast tissue. The sliding window shifted from left to right and top to bottom with a step size of 32 pixels between subregions. If the central 256×256 of each subregion was entirely contained inside the mask, the subregion was accepted for further analysis (**Fig. 1D-H**). Each subimage was wavelet transformed across 50 different size scales. The corresponding maxima chains and their maxima, maxima lines, partition functions, *h*(*a, q*) and *D*(*a, q*) were generated following the methods described by Marin et al. [15] and Gerasimova-Chechkina [16]. Following these calculations only the central 256×256 pixels of each subimage was kept to mitigate edge effects (**Fig. 1I-K**). ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/18/2024.02.17.24302978/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302978/F1) Figure 1: **Overview of the 2D WTMM multifractal sliding window approach** (Marin et al. [15] and Gerasimova-Chechkina [16]). Sliding window approach to divide mammographic images into subregions. A mammographic image (A) is used to create a mask (B). The mask is used to segment the breast tissue (C). A 360 pixels by 360 pixels box (D) is placed in the upper left corner of the image. The box is then moved by a step size of 32 pixels horizontally and vertically. If the box contained breast tissue and no background, the 360 pixels x 360 pixels subimage is kept for the analysis (E-H). To identify sub-types of mammographic breast tissue, subimages containing fatty tissue (I1), active dense tissue (J1) and passive dense tissue (K1) were wavelet transformed at 50 different size scales with scale a = 10 (I2, J2, K2) and scale a = 30 (I3, J3, K3) shown with the corresponding WTMM and WTMMM. The WTMMM were used to construct the WT skeletons (I4, J4, K4). The subimages were then colored coded (I5, J5, K5). To visualize mammographic tissue structure, a small RGB image was created where each pixel represents the 360 pixel by 360 pixel subimage that was analyzed (L). A semi-transparent overlay was also constructed to highlight mammographic tissue subtypes on the mammogram (M). To objectively determine the optimal scale range for fitting power-law curves in *D*(*q, a*) vs. *log*2(*a*) and *h*(*q, a*) vs. *log*2(*a*) plots, a window was varied along *log*2(*a*). The window was defined by a lower bound (*a**max*) and an upper bound (*a**min*) of *a*, varying from *log*2*a**min* = 0, 0.1, …, 2.1 and from *log*2*a**max* = 2.0, 2.1, …, 4.9 respectively, in *σ**w* units, where *σ**w* = 7 pixels. All possible combinations of *a**min* and *a**max* with a window width being at least *log*2*a**max* – *log*2*a**min* = 1.0 wide, were considered. For each such (*a**min*, *a**max*) window, *h*(*q*) and *D*(*q*) were calculated, along with the goodness of fit *R*2 of *h*(*q* = 0), denoted *R*2*h*(*q*=0). Additionally, the weighted standard deviation of *h* across all *q* values, denoted *sd**w*, and the weighted average of *R*2 of *h*(*q*) over all values of *q*, denoted <*R*2w>, were also calculated, according to the weights in Marin, et al. [15]. The further consideration of (*a**min*, *a**max*) windows was subject to the fulfillment of several conditions. The first requirement was that the support dimension, represented by *D*(*q* = 0), fell within the range of 1.7 to 2.5, considering the potential impact of finite size effects on the multiplication of maxima lines as the scale parameter *a* approached 0. A window was only considered if it had an *R**2**h(0)* value exceeding 0.90, ensuring that the *h*(*q* = 0) curve was linear enough to provide a dependable exponent. A low weighted standard deviation for *h*, specifically *sdw* < 0.06, was also essential to exclude subregions demonstrating multifractal scaling. Finally, the condition <*R**2**w*> > 0.90 was imposed to guarantee that all *h*(*q, a*) curves were sufficiently linear, with greater weight allocated to those closer to *q* = 0. Based on the resulting *H*, each subregion was classified into one of three groups: fatty tissue (*H* ≤ 0.45, **Fig. 1I5**), active dense tissue (*0.45