Development and implementation of optimized endogenous contrast sequences for delineation in adaptive radiotherapy on a 1.5T MR-Linear-accelerator (MR-Linac): A prospective R-IDEAL Stage 0-2a quantitative/qualitative evaluation of in vivo site-specific quality-assurance using a 3D T2 fat-suppressed platform for head and neck cancer

Travis C. Salzillo; M. Alex Dresner; Ashley Way; Kareem A. Wahid; Brigid A. McDonald; Sam Mulder; Mohamed A. Naser; Renjie He; Yao Ding; Alison Yoder; Sara Ahmed; Kelsey L. Corrigan; Gohar S. Manzar; Lauren Andring; Chelsea Pinnix; R. Jason Stafford; Abdallah S.R. Mohamed; John Christodouleas; Jihong Wang; Clifton David Fuller

doi:10.1101/2022.06.24.22276839

Abstract

Purpose In order to improve segmentation accuracy in head and neck cancer (HNC) radiotherapy treatment planning for the 1.5T MR-Linac, 3D fat-suppressed T2-weighted MRI sequences were developed and optimized.

Methods After initial testing of fat suppression techniques, SPectral Attenuated Inversion Recovery (SPAIR) was chosen as the fat suppression technique. Five candidate SPAIR sequences and a non-suppressed T2-weighted sequence were acquired on five HNC patients on the Unity MR-Linac. The primary tumor, metastatic lymph nodes, parotid glands, and pterygoid muscles were delineated by five segmentors. A robust image quality analysis platform was developed to objectively score the SPAIR sequences based on a combination of qualitative and quantitative metrics.

Results Sequences were analyzed for signal-to-noise (SNR), contrast-to-noise (CNR) compared to fat and muscle, conspicuity, pairwise distance metrics, segmentor assessment, and MR physicist assessment. From this analysis, the non-suppressed sequence was inferior to each of the SPAIR sequences for the primary tumor, lymph nodes, and parotid glands, but was superior for the pterygoid muscles. Two SPAIR sequences consistently received the highest scores among the analysis categories and are recommended for use to Unity MR-Linac users for HNC radiotherapy treatment planning.

Conclusions Two deliverables resulted from this study. First, an optimized 3D fat-suppressed T2-weighted sequence was developed that can be disseminated to Unity MR-Linac users. Second, a robust image quality analysis process pathway, used to objectively score the various SPAIR sequences, was developed and can be customized and generalized to any image quality optimization. Improved segmentation accuracy with the proposed SPAIR sequence can potentially lead to improved treatment outcomes and reduced toxicity by maximizing target coverage and minimizing organ-at-risk exposure.

INTRODUCTION

Radiotherapy treatment planning using magnetic resonance imaging (MRI) exclusively, or at least in combination with computed tomography (CT), has become increasingly common over the past couple of decades [1]–[4]. The superior soft tissue contrast of MRI compared to CT makes it an attractive imaging modality for target structure and organ-at-risk (OAR) segmentation [5]–[7]. Furthermore, recent advances in deformable image registration and electron density assignment using synthetic CT generation or alternative atlas-based approaches, have helped address the primary pitfalls of combined MR/CT-based treatment planning—namely geometric distortion and direct dose estimation [8]–[16].

MR-Linac stakeholders are a major beneficiary of these advances in MR-based treatment planning [17], [18]. Hybrid MRI-linear accelerator devices can acquire a variety of imaging data during each fraction of radiotherapy and incorporate them into MR-compatible treatment planning systems [19], [20]. Moreover, these daily images can be used in on-line or off-line adaptive planning workflows when major changes in anatomy or tumor function are detected [21], [22]. Thus, a major area of research is in sequence development for these devices to better visualize relevant structures and discover and acquire useful imaging biomarkers for treatment response and resistance [23], [24].

Among the variety of tumor sites treated on MR-Linac devices, head and neck cancer (HNC), especially HPV-associated HNC, has demonstrated significant success on this device [25]–[27]. These tumors are relatively radiosensitive, which warrants the adaptive re-planning utility [28]–[30]. Furthermore, delineation of the complex anatomy in the head and neck region is difficult to visualize with CT, where most of the structures have uniform signal and little contrast; conversely, T2-weighted (T2w) MRI provides an ample amount of signal-to-noise as well as contrast with surrounding structures, which allows for clearer and more precise segmentation [31], [32].

However, when structures are adjacent to fat, which appears hyperintense on T2w MRI, the boundaries of various structures can become obfuscated [33], [34]. To attenuate the fat signal, while keeping the water signal within tissue intact, several fat-suppression methods have been established, which use pre-pulse inversion recovery and/or bandwidth strategies during image acquisition or post-processing techniques during image reconstruction. These fat suppression methods have been described at-length in the literature [35]–[40]. In the head and neck region, fat-adjacent structures are common, so a high-quality, fat-suppressed T2w MRI sequence is needed for the MR-Linac, which does not exist as of yet in published literature.

Thus, the purpose of this study was to develop and optimize a 3D fat-suppressed T2w sequence that could be used directly on a 1.5 T Unity MR-Linac for treatment planning purposes. Cast in the R-IDEAL (Radiotherapy-predicate studies, Idea, Development, Exploration, Assessment, Long-term study), as charged by the MR-Linac Consortium, this study was designed to represent Stage 0 (Radiotherapy Predicate Studies) to Stage 2a (Development) [41]. This provides a methodological and rigorous foundation for the implementation of this technical development for MR-Linac clinical workflows as well as the starting point for future studies along the R-IDEAL pipeline, which is an assessment methodology for evidence-based clinical evaluation of innovations in radiation oncology. The image quality analyses of the fat-suppressed sequence are presented here, along with the exam card of the optimized sequence itself, so that it may be disseminated to additional users of the Unity MR-Linac. As a secondary goal of this study, a comprehensive and robust image quality analysis platform was developed to objectively score and rank candidate fat-suppressed sequences. This analysis platform is easily customizable and can be generalized for the optimization of any anatomic-based imaging sequence.

METHODS

Data Availability

All patient images and segmentations were anonymized and uploaded to FigShare (DOI: 10.6084/m9.figshare.20140184).

Sequence Development and Optimization

A standard 3D T2-weighted (T2w) turbo spin echo (TSE) sequence was utilized as an initial template for the fat-suppressed sequence. A Philips MR console emulation software (Philips Healthcare, Best, Netherlands) was used to modify sequence parameters and simulate relative image properties such as signal-to-noise (SNR). The first parameter that was iterated was the fat suppression method. Because there is no 3D mDixon sequence clinically available for the Unity device, only SPectral Attenuated Inversion Recovery (SPAIR) and Short Tau Inversion Recovery (STIR) techniques were investigated due to their relative resistance to B0 and B1 inhomogeneities that are known to occur in the neck region in MRI. Initial image acquisitions demonstrated that overall image quality was superior for the SPAIR fat suppression method compared to STIR, so subsequent sequence optimization was limited to SPAIR sequences. Several parameters were logically iterated (in contrast to spanning every possible combination in parameter space) to produce candidate SPAIR iterations which satisfied the following constraints: 5-6 minute acquisition time, ∼1 mm isotropic reconstructed resolution, TE and TR values for T2 weighting. A preliminary round of acquisition and qualitative analysis eliminated sequences that produced severe artifacts or insufficient image quality. Five SPAIR sequences were chosen as the final candidate sequences for further analysis. The parameters of these sequences can be found in Table 1.

View this table:

Table 1:

Relevant pulse sequence parameters for the non-suppressed T2-weighted sequence and candidate SPAIR sequences.

Image Acquisition

Image data for the preliminary and main analysis were acquired on consenting head and neck cancer (HNC) patients who were enrolled in the MOMENTUM clinical trial at our institution (NCT04075305) and MD Anderson Institutional Review Board protocols PA15-0418 and PA18-0341. The images were acquired during both their MR Simulation and daily treatment fractions on the 1.5 T Unity MR-Linac device (Elekta AB, Stockholm, Sweden). The scanner is equipped with four-channel radiolucent RF coils positioned anteriorly and posteriorly to the patient, which is standard for Unity devices. A non-suppressed T2w and five SPAIR T2w sequences were each acquired on five HNC patients.

Image Segmentation

Five post-graduate physicians in radiation oncology were asked to segment the following structures in each of the images using Raystation software (Raysearch Laboratories AB, Stockholm, Sweden). GTV (gross primary tumor volume), suspicious lymph nodes, left/right parotid glands, and left/right pterygoid muscles, which are relevant structures during radiotherapy treatment planning. The physicians (referred hereafter as segmentors) were restricted from looking at each other’s segmentations but were allowed to refer to a radiologist’s report for structure identification (which is a common clinical occurrence). Furthermore, the segmentors were asked to recontour each structure segmentation from scratch on each image, rather than propagating the segmentations onto each image and modifying them. A segmented structure on a particular sequence is referred hereafter as a structure-sequence pair. A non-resident researcher also segmented an air-filled cavity within the trachea in ten slices for each image. These were used for noise calculations since areas surrounding the patient were automatically masked in postprocessing before image export. Additionally, the non-resident researcher segmented three areas of cheek and neck fat on one slice for fat-CNR measurements.

Quantitative Image Quality Analysis

Signal-to-noise and contrast-to-noise measurements

Signal-to-noise (SNR) of each structure was calculated as the mean signal of the sequence-structure pair divided by the standard deviation of the noise segmentation (ten slices of the air-filled cavity in the trachea). SNR was also calculated for the fat segmentations for each sequence as a measure of fat suppression.

Contrast-to-noise measurements between each structure and both fat and muscle were also determined by calculating the SNR difference between the structure and fat or muscle. The stack of SNR and CNR values for each sequence-structure pair was then combined for each segmentor and patient and used in the statistical analysis.

Conspicuity measurements

Conspicuity is a measurement of the ratio between ROI contrast and surrounding signal complexity. It is thought to be a more robust descriptor of structure visibility than SNR and CNR. A script to calculate conspicuity was developed according to the equations first described in [42] and is available at https://github.com/tcsalzillo/ConspicuityAnalysis. The original structure segmentations were isotropically expanded and contracted by 1 mm and 2 mm using Velocity AI software (Varian Medical Systems, Inc., Palo Alto, CA, USA). Because conspicuity was first formulated for 2D images, the conspicuity for each slice occupied by the structure for a particular sequence (sequence-structure pair) was recorded. The stack of conspicuity values for each sequence-structure pair was then combined for each segmentor and patient and inputted into the statistical analysis.

Pairwise Distance Metrics

Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD) metrics for each sequence-structure pair between each segmentor were calculated as previously described [43]. These are amongst the most ubiquitous volumetric and surface distance metrics reported in literature [44]. The stacks of DSC and HD metrics for each sequence-structure pair were then combined for each segmentor and patient and inputted into the statistical analysis.

Qualitative Image Quality Analysis

Segmentor Grading and Comments

As each segmentor worked to delineate the structures, they were asked to complete a rubric (Appendix 1) which asked the segmentor to qualitatively rank each sequence-structure pair, according to his/her preference, within each patient. Additionally, the segmentor was asked to provide specific comments about the appearance or visibility of a structure. These comments were classified into positive (e.g. Structure X looked great), neutral (e.g. Structure X looked acceptable), or negative (e.g. Could not see Structure X) categories. A metric was created to compare the relative amount of positive and negative comments for a specific sequence-structure pair, which was calculated with the following equation.

MR Physicist Assessment

Two MR physicists were asked to analyze the five SPAIR sequences according to a rubric (Appendix 2) which asked the physicist to qualitatively rank each sequence, according to his/her preference, within each patient. Additionally, the physicists were asked to identify any artifacts that were present in the image. One physicist quantified the number of slices that were affected by burnout (depletion of signal due to improper fat suppression) anteriorly and posterolaterally.

Statistical Analysis

Each metric that was classified as quantitative (those in the SNR/CNR, Conspicuity, and Pairwise Distance categories) was subjected to further statistical analysis, which was performed using GraphPad Prism 8 (GraphPad Software, La Jolla, CA, USA). First, the distribution normality was assessed using the Kolmogorov-Smirnov test. If the distributions were found to be normal, the mean value and standard deviation of the metric were calculated. Statistical significance of the difference of means between each sequence-structure pair was then determined using the parametric one-way ANOVA test with follow-up Turkey multiple comparison corrections. If the distributions were found to be non-normal, the median value and interquartile range of the metric were calculated. Statistical significance of the difference of medians between each sequence-structure pair was then determined using the nonparametric Kruskal-Wallis test with follow-up Dunn’s multiple comparison corrections. In either case, statistical significance was attributed to comparisons that produced p<0.05.

Rubric for Overall Sequence Scoring

For each metric that was analyzed, a score was determined for each sequence-structure pair (or just for each sequence if the metric was structure-agnostic such as in the MR Physicist Assessment). Each sequence-structure pair received a score between 1 and 6, where 6 corresponded to the pair with the best performance. For the metrics classified as quantitative, sequence-structure pairs could only receive a higher score than the other pairs if the difference was statistically significant (p<0.05) compared to all other sequence-structure pairs with a lower score. For example, if the SNR of Sequence A was statistically higher than B and C, then sequence A would be scored higher than B and C. However, if Sequence A was statistically higher than B but not C, and Sequence C was not statistically higher than B, then all 3 sequences would receive the same score. For metrics classified as qualitative, the statistical significance requirement was omitted, and sequence-structure pairs were scored purely according to their rank. Sequence-structure pairs that received the same score were rescaled to the average rank between them. For example, if a scoring distribution was scored as 6,5,5,5,5,1, it was rescaled to 6,3.5,3.5,3.5,3.5,1, where 3.5=(5+4+3+2)/4. For clarity, these will be regarded as Normalized Metric Scores

For each analysis category (SNR and CNR, Conspicuity, etc.), the Normalized Metric Scores within that category were summed for each sequence-structure pair. For metrics that are structure-agnostic, the score was added to each structure within the sequence. The summed score for a particular structure within a sequence was then renormalized between 1 and 6 (6 corresponding to highest summed score), according to its rank relative to the same structure among the other sequences. For clarity, these will be regarded as Normalized Category Scores. This normalization was performed so that a category with more metrics (such as SNR and CNR Measurements) would be weighed the same in the overall analysis as a category with fewer metrics (such as Conspicuity).

The Normalized Category Scores for each sequence-structure pair were then summed and normalized to determine the Total Score and Normalized Score. Initially, only the SNR and CNR, Conspicuity, Pairwise Distance, and Segmentor Analysis categories were summed since to compare the sequence-structure pairs between the non-suppressed sequence and SPAIR sequences. Then the MR Physicist Assessment Normalized Category Scores (only analyzed on the SPAIR sequences) were added to the Total Scores and renormalized to determine the Updated Total Scores and Updated Normalized Scores. These scores were used to compare the overall image quality for each structure among the SPAIR sequences. Additionally, the Updated Total Score for each structure within a sequence was summed and normalized to determine the Combined Total Score and Combined Normalized Score. These scores were used to compare the overall image quality across structures among the SPAIR sequences. Refer to Figure 1 for a graphical depiction of the scoring.

Figure 1:

Flowchart of scoring rubric for the analysis platform. Metric Scores for each sequence-structure pair from each analysis are summed and normalized into their respective Normalized Category Scores. These Normalized Category Scores are further summed and normalized to calculate the Total Score for each sequence-structure pair. Lastly, the Total Score per structure within each sequence are summed and normalized to calculate the final Combined Total Score. During each “summed and normalized” step, weights can be applied if the user wishes to weigh an individual metric, individual analysis category, or individual structure higher for their specific application.

RESULTS

Image Acquisition and Segmentation

The non-suppressed T2w and each SPAIR sequence were successfully acquired on each of the five patients. Each of these images for a representative patient are shown in Figure S1. A representative pair of non-suppressed and SPAIR images (SPAIR 4) in two regions of the head and neck are illustrated in Figure 2 (with visible segmentations) and Figure S2 (without visible segmentations). The borders of the primary tumor and metastatic lymph nodes are clearer on the SPAIR image than the non-suppressed image. As a result, the segmentations, initially drawn on the non-suppressed image, clearly overestimate and underestimate the extent of the primary tumor and metastatic lymph nodes when viewed on the SPAIR image.

Figure 2:

Representative cases of non-suppressed T2w (A and C) and SPAIR T2w (SPAIR4) (B and D) in a HNC patient. The top and bottom rows are two different slices in the image and illustrate the differences in primary and metastatic lymph node clarity. The visible segmentations were initially drawn on the non-suppressed T2w image. In the top row, the original segmentation underestimated the extent of the primary tumor, which is clearly visible on the SPAIR image (red arrow in B). The appearance of the inferior portion of the submandibular glands (bilateral structures anterior to the metastatic lymph node) can also be appreciated between the images; in the non-suppressed image, these glands are hypointense with little contrast to surrounding fat, but in the SPAIR image, these glands are hyperintense with exquisite contrast to the suppressed fat signal. In the bottom row, the original segmentations overestimated the extent of the primary tumor and lymph node, whose boundaries are better visualized on the SPAIR image (red arrows in D). Refer to Figure S2 to see the images without the segmentations visible.

Quantitative Analyses

Signal-to-noise and contrast-to-noise measurements

Mean signal-to-noise (SNR) of segmented fat was significantly increased in the non-suppressed sequence compared to all other SPAIR sequences. Thus, each SPAIR sequence could effectively suppress fat signal. Among the SPAIR sequences, SPAIR 1 had the largest mean fat SNR of 3.1, which is still lower than the SNR of target and OAR structures. The SNR values for each structure were generally increased in the non-suppressed sequence, as expected, though it was only significantly increased the parotid glands compared to the SPAIR sequences.

Contrast-to-noise (CNR) between fat and the GTV/lymph nodes was significantly increased in the SPAIR sequences compared to the non-suppressed sequence. However, CNR between fat and the pterygoid muscles was significantly increased in the non-suppressed sequence compared to all SPAIR sequences. There were no significant differences in CNR between fat and the parotid glands among all sequences.

Furthermore, CNR between muscle and lymph nodes was significantly increased in the non-suppressed sequence and SPAIR 4 compared to all other SPAIR sequences, and CNR between muscle and parotid glands was significantly increased in the non-suppressed sequence compared to all SPAIR sequences. There were no significant differences in CNR between muscle and the GTV among all sequences. Refer to Table 2 for all SNR and CNR measurements.

View this table:

Table 2:

Signal-to-noise and contrast-to-noise measurements of the GTV, lymph nodes, parotid glands, and pterygoid muscles in the non-suppressed and SPAIR sequences. For each sequence, the signal-to-noise ratio (SNR) of fat was calculated, which quantifies the degree of fat suppression. For each structure within the sequences, the SNR was also calculated, along with the contrast-to-noise ratio (CNR) with relative to the fat and muscle signals. Values are presented as mean ± standard deviation. Values for a specific sequence-structure pair that are denoted with * are significantly greater (p<0.05) than all values for the structure in the column that are not annotated with *.