Auto-Detection and Segmentation of Involved Lymph Nodes in HPV-Associated Oropharyngeal Cancer Using a Convolutional Deep Learning Neural Network
=================================================================================================================================================

* Nicolette Taku
* Kareem A. Wahid
* Lisanne V. van Dijk
* Jaakko Sahlsten
* Joel Jaskari
* Kimmo Kaski
* C. David Fuller
* Mohamed A. Naser

## Abstract

**Purpose** Segmentation of involved lymph nodes on head and neck computed tomography (HN-CT) scans is necessary for the radiotherapy treatment planning of human papilloma virus (HPV) associated oropharynx cancers (OPC). We aimed to train a deep learning convolutional neural network (DL-CNN) to identify and segment involved lymph nodes on contrast-enhanced HN-CT scans.

**Methods** 90 patients who underwent levels II-IV neck dissection for newly diagnosed, clinically node-positive, HPV-OPC were identified. Ground-truth segmentation of all radiographically and pathologically involved nodes was manually performed on pre-surgical HN-CT scans, which were randomly divided into training/validation dataset (n=70) and testing dataset (n=20). A 5-fold cross validation was used to train 5 separate DL-CNN sub-models based on a residual U-net architecture. Validation and testing segmentation masks were compared to ground-truth segmentation masks using overlap-based, volume-based, and distance-based metrics. A lymph auto-detection model was developed by thresholding segmentation model outputs, and 20 node-negative HN-CT scans were added to the test set to further evaluate auto-detection capabilities. Model discrimination of lymph node “positive” and “negative” HN-CT scans was evaluated using the area under the receiver operating characteristic curve (AUC).

**Results** In the DL-CNN validation phase, all sub-models yielded segmentation masks with median DSC ≥ 0.90 and median volume similarity score of ≥ 0.95. In the testing phase, the DL-CNN produced consensus segmentation masks with median Dice of 0.92 (IQR, 0.89-0.95), median volume similarity of 0.97 (IQR, 0.94-0.99), and median Hausdorff distance of 4.52 mm (IQR, 1.22-8.38). The detection model achieved an AUC of 0.98.

**Conclusion** The results from this single-institution study demonstrate the successful automation of lymph node segmentation for patients with HPV-OPC using a DL-CNN. Future studies, including external validation using a larger dataset, are necessary to clarify the role of the DL-CNN in the routine radiation oncology treatment planning workflow.

## Introduction

Approximately 66,000 cases of head and neck cancer will be diagnosed in the United States in 2021, including 30% of cases pertaining to human papilloma virus (HPV)-associated oropharynx cancers (OPC) 1,2. Accurate assessment of the extent of lymph node involvement and lymph node characteristics on staging studies is necessary for appropriate treatment disposition. Some patients with early-stage HPV-associated OPC, including limited lymph node involvement and no radiographic evidence of extranodal extension, can be managed with transoral robotic resection of the primary site of disease and ipsilateral neck dissection. However, the majority of patients diagnosed with locoregionally advanced disease will receive radiotherapy treatment with definitive intent, thereby necessitating imaging-based segmentation of the primary tumor and involved lymph nodes to ensure adequate radiotherapy dose delivery to all sites of disease 3.

The acquisition of head and neck computed tomography (HN-CT) scans for HPV-associated OPC is an integral component of primary tumor and nodal staging as well as radiotherapy treatment planning. Several studies have demonstrated unique imaging characteristics for HPV-associated OPC 4,5. In a blinded, matched-pair analysis of HN-CT scans for patients with HPV-positive and HPV-negative OPC, Cantrell et al. found that HPV-positive OPC scans were less likely to demonstrate muscle invasion of the primary tumor but more likely to demonstrate cystic morphology of involved lymph nodes 6. Similarly, Chan et al. observed that HPV-positive OPC was more likely to demonstrate multiple lymph node involvement and cystic nodal appearance 7. These unique radiographic features correspond to histopathology findings observed on the surgical specimens of HPV-associated OPC tumors 8.

Deep learning is a subset of machine learning that uses deep neural networks to learn and classify data 9. Within the context of OPC, deep learning algorithms have been used to predict HPV status based on pre-treatment imaging 10,11. Although clinical assessment of involved lymph nodes is necessary for therapy disposition and radiotherapy treatment planning, no deep learning algorithms have focused on the identification and segmentation of involved lymph nodes for HPV-associated OPC. The purpose of this study was to develop a deep learning convolutional neural network (DL-CNN) capable of identifying and segmenting radiographically and pathologically involved lymph nodes for HPV-associated OPC on contrast-enhanced HN-CT scans. Furthermore, we aimed to use the DL-CNN to discriminate between node-negative and node-positive HN-CT scans.

## Methods

After obtaining Institutional Review Board approval, 90 patients who underwent selective, levels II-IV neck dissection for newly diagnosed, clinically node-positive, OPC at our institution were identified from the Steifel Oropharynx Database—a prospective database of clinical and patient-reported outcomes for patients treated at The University of Texas MD Anderson Cancer Center. In addition, 20 randomly selected patients who underwent selective, levels II-IV neck dissection and were found to have clinically and pathologically node-negative OPC were included in the dataset. The inclusion criterion were at least 18 years of age at the time of diagnosis and pathology findings consistent with HPV-associated OPC, while the exclusion criteria were a history of radiotherapy treatment to the head and neck region or a history of prior neck dissection.

### Data Preparation and Preprocessing

Pre-surgical, contrast-enhanced, HN-CT scans were identified for all patients. Expert, “ground-truth segmentation” of all radiographically involved lymph nodes was manually performed on node-positive HN-CT scans using RayStation Research (RaySearch Laboratories, Stockholm, Sweden) 12. Histopathology findings from selective neck dissection were correlated with neuroradiology annotations to ensure that 1) all segmented lymph nodes corresponded to pathologically involved lymph nodes and 2) no radiographically occult lymph nodes were present on surgical pathology. The ground-truth segmentations for each patient were then combined into a solitary “ground-truth mask”.

Pre-processing was performed on HN-CT scans to mitigate the variabilities in image size and resolution. The images and structure files were converted from Digital Imaging and Communications in Medicine (DICOM) format to Neuroimaging Informatics Technology Initiative (NIfTI) format using the Advanced Medical Imaging Registration Engine (ADMIRE, Elekta AB, Stockholm, Sweden). The images were cropped to a specific sub-volume, with the auto-segmented cephalad border of the mandible, the manually-segmented cephalad border of the sternum, and the auto-segmented external patient contour serving as the superior, inferior, and circumferential boundaries, respectively (**Figure 1**). Image intensities were then truncated to the range of [−100, 300] Hounsfield units and rescaled to the range of [-1, 1] to increase soft tissue contrast 13. The images and their respective ground-truth masks were resampled to 1.0 mm isotropic resolution using a trilinear interpolator in ADMIRE.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/21/2022.01.19.22269566/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/F1)

Figure 1: 
Schematic representation pre-processing workflow. Head and neck computed tomography scans were cropped using the mandible, sternum, and external contours as boundaries (A & B). Scans were divided into 4 patches of 96 × 96 × 96 voxels in dimension (C).

### Model Development

A DL-CNN was developed based on a 3-dimensional (3D) residual U-Net architecture included in the Medical Open Network for Artificial Intelligence (MONAI) software package 14. This architecture has been utilized successfully in previous OPC tumor auto-segmentation studies 15. The network consisted of 4 convolution blocks in the encoding and decoding branches with a bottleneck convolution block separating these two branches (**Figure 2**). In the encoding branch, all convolutional layers used a kernel size of 3, with each block consisting of a two-strided convolution layer; the residual connections contained a two-strided and one-strided convolution layer. In the decoding branch, all convolutional layers used a kernel size of 3, with each block consisting of a two-strided transpose convolution layer, a one-strided convolution layer, and a residual connection. In the bottleneck, all convolutional layers used a kernel size of 1 and the residual connection consisted of a two-strided convolution layer. Throughout the architecture, we utilized batch normalization and Parametric Rectified Linear Unit (PReLU) activation functions.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/21/2022.01.19.22269566/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/F2)

Figure 2: 
Schematic representation of the U-Net architecture implemented for the deep learning convolutional neural network with annotations pertaining to the number of channels, batch normalization (BN) layers, and Parametric Rectified Linear Unit (PReLU) layers.

### Model Training & Validation

The 90, node-positive HN-CT scans and their respective ground-truth masks served as the data by which the DL-CNN was developed. The data were randomly divided into 2 datasets—a training/validation dataset (n=70) and a testing dataset (n=20). Each HN-CT scan was split into four, random regions (i.e., patches) of 96 × 96 × 96 voxels in dimension. The input tensor consisted of a batch size of 2, a single channel input, and 4 patches per image, yielding a summative input of (8, 1, 96, 96, 96). Each patch was evaluated for the presence of an involved lymph node with the center as foreground (i.e., involved lymph node present) or background (i.e., involved lymph node absent) with a 50% probability for either condition. Several data augmentation processes were implemented to minimize overfitting. Random spatial cropping was performed to patch the images and ground-truth masks. Random horizontal flips with 50% probability, and random affine transformations with an axial rotation range of 12 degrees, and scale range of 10% were also performed.

We implemented a 5-fold cross-validation approach to train 5 separate sub-models for the DL-CNN. For each of the 5 sub-models, 80% of the HN-CT scans in the training/validation dataset and their respective ground-truth masks acted as model inputs for training purposes. The remaining 20% of HN-CT scans served for internal validation. One “validation segmentation mask” was generated per HN-CT scan, for a total of 70 validation segmentation masks. Validation segmentation masks were compared to ground-truth masks using overlap-based (Dice similarity coefficient [DSC]) and volume-based (volume similarity) metrics. The DL-CNN was trained for 700 epochs, with a learning rate of 2×10−4 for the first 550 epochs and 1×10−4 for the remaining 150 epochs.

### Model Testing

The performance of the DL-CNN to detect and segment involved lymph nodes was evaluated using an independent test dataset of 20 positive HN-CT scans. Additionally, 20 randomly selected HN-CT scans pertaining to patients with no involved lymph nodes were included in the testing dataset to evaluate the ability of the model to discriminate between “positive” (i.e., involved lymph node present) and “negative” (i.e., involved lymph node absent) HN-CT scans. In total, 5 “testing segmentation masks” were generated per HN-CT scan (1 testing segmentation mask per sub-model). For the 20 node-positive scans, the 5 testing segmentation masks for each HN-CT scan were combined to create a “consensus segmentation mask” using the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm (**Figure 3**) 16. The testing segmentation masks and consensus segmentation masks were compared to their respective ground-truth masks using overlap-based (DSC), volume-based (volume similarity), spatial distance-based (Hausdorff distance [HD]), and probabilistic-based (Cohen Kappa Coefficient [CKC]) metrics 17.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/21/2022.01.19.22269566/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/F3)

Figure 3: 
Five sub-model segmentation masks and one consensus segmentation mask were generated for each head and neck computed tomography scan. The red contour corresponds to the ground-truth masks, the blue contours correspond to the predicted sub-model segmentation masks, and the yellow contour corresponds to the consensus segmentation mask generated by combing the 5 sub-model segmentation masks using the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm.

For the model discrimination, each voxel in the 5 testing segmentation masks generated from each sub-model for the 40 HN-CT scans in the testing dataset was scored as either “1” to indicate that a lymph node contour was generated or “0” to indicate that no lymph node contour was generated. The scores for each voxel were averaged for the 5 sub-models to yield an “average score” ranging from 0 (i.e., no testing segmentation mask generated by any of the 5 sub-models) to 1 (i.e., testing segmentation masks were generated by all 5 sub-models). A HN-CT scan was considered “positive” if any voxel average score was equal to 1, and “negative” if any voxel average score was ≤ 0.8. This score threshold was chosen empirically from test results to maximize the accuracy, sensitivity, and positive predictive value of the DL-CNN. The model discrimination was evaluated by determining the area under the receiver operating characteristic curve (AUC). Three image resampling resolutions—high (1.0 mm), medium (1.5 mm), and low (2.0 mm)—were used to evaluate the impact of image resolution on the discriminatory ability of the DL-CNN.

## Results

### Patient and Tumor Characteristics

Patient and tumor characteristics are presented in **Table 1**. The median age at diagnosis was 60 years and there was a male sex predominance (n=101, 92%). The majority of the patients had no history of cigarette smoking (n=72, 66%) and cT1 disease (n=63, 57%). Among cN1 patients, there was a median of 1 involved lymph node (range, 1-4) in the training/validation dataset and 1 involved lymph node (range, 1-3) in the testing dataset.

View this table:
[Table 1:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/T1)

Table 1: 
Patient and tumor clinical characteristics for all patients (N=110), patients in the training/validation dataset (n=70), and patients in the testing dataset (n=40).

Abbreviations: IQR, interquartile range; y, years

### DL-CNN Validation Performance

Segmentation mask metrics for model validation are presented in **Table 2**. When compared to ground-truth masks, sub-model #4 achieved the highest median DSC, with a score of 0.92 (interquartile range [IQR], 0.90-0.94) for the validation segmentation masks. All the 5 sub-models generated validation segmentation masks with a median DSC of at least 0.90. Similarly, all the 5 sub-models generated validation segmentation masks with a median volume similarity score of at least 0.95, with sub-model #1 achieving the highest median volume similarity score and narrowest volume similarity IQR.

View this table:
[Table 2:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/T2)

Table 2: 
Minimum, maximum, median, interquartile range values for the overlap-based (Dice similarity coefficient) and volume-based (volume similarity) metrics for the sub-model validation segmentation masks when compared to the ground-truth masks.

Abbreviations: DSC, Dice similarity coefficient; IQR, interquartile range; Max., maximum; Min., minimum; VS, volume similarity

### DL-CNN Testing Performance

Segmentation mask metrics for model testing are presented in **Table 3**. When compared to ground-truth masks, the median DSC for testing segmentation masks was greater than 0.90 for all sub-models. The median DSC for consensus segmentation masks was 0.92 (IQR, 0.89-0.95). Comparisons between the testing segmentation masks and ground-truth masks for a subset of cases based on DSC are depicted in **Figure 4**. A maximum volume similarity score of 1.0 was achieved by all sub-models for testing segmentation masks, with sub-model #4 achieving the highest minimum volume similarity score and median volume similarity score of 0.97. The median volume similarity score for consensus segmentation masks was 0.97 (IQR, 0.94-0.99). All sub-models achieved a median HD less than 6 mm, with a median HD for consensus segmentation masks of 4.52 mm (IQR, 1.22-8.38). The median CKC for testing segmentation masks was nearly identical across the sub-models, and the median CKC for consensus segmentation masks was 0.92 (IQR, 0.89-0.95).

View this table:
[Table 3:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/T3)

Table 3: 
Minimum, maximum, median, interquartile range values for the overlap-based (Dice similarity coefficient), volume-based (volume similarity), spatial distance-based (Hausdorff distance), and probabilistic-based (Cohen Kappa Coefficient) metrics for the sub-model testing segmentation masks and consensus segmentation masks when compared ground-truth masks.

Abbreviations: CKC, Cohen Kappa Coefficient; DSC, Dice similarity coefficient; HD, Hausdorff distance (in mm); IQR, interquartile range; Max., maximum; Min., minimum; STAPLE, Simultaneous Truth and Performance Level Estimation; VS, volume similarity

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/21/2022.01.19.22269566/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/F4)

Figure 4: 
Comparison of consensus segmentations (yellow) to ground-truth segmentations (red) for a subset of test set patients with greater or equal Dice similarity coefficients (A, B, C; 1 involved lymph node, 3 involved lymph nodes, and 2 involved lymph nodes, respectively), slightly lower Dice similarity coefficients (D, E; 2 involved lymph nodes and 1 involved lymph node, respectively), and much lower Dice similarity coefficient (F; 1 involved lymph node) than the median value of 0.92.

### DL-CNN Discrimination Performance

Confusion matrices and receiver operating characteristic curves for the three imaging resolutions are presented in **Figure 5**. The medium resampled resolution model achieved the most optimal identification for the positive HN-CT scans (AUC = 0.98), with 20 of 20 HN-CT scans with involved lymph nodes correctly identified as positive and 19 of 20 of the remaining HN-CT scans correctly identified as negative. In contrast, the low resampled resolution model had the worst classification of HN-CT scans (AUC = 0.81), with 2 of 20 HN-CT scans with involved lymph nodes incorrectly identified as negative and 6 of 20 of HN-CT scans with no involved lymph nodes incorrectly identified as positive. Illustrative examples of the detection process and individual test case predictions using the best-performing model (medium resolution) are shown in **Figure S1**.

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/21/2022.01.19.22269566/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2022/01/21/2022.01.19.22269566/F5)

Figure 5: 
Receiver operating characteristic curves for positive versus negative HN-CT scan discrimination comparing three resampled image resolutions (High – 1.0 mm, Medium – 1.5 mm, and Low – 2.0 mm) and their corresponding confusion matrices.

## Discussion

The incidence of HPV-associated OPC has risen in recent decades and is projected to continue to increase during the next 30 years 18. Compared to HPV-negative OPC, HPV-associated OPC has been found to have higher rates of clinical and pathological lymph node involvement 19. Additionally, lymph node metastases in HPV-associated OPC are characterized by several distinct features on clinical imaging including cystic composition, and matted conglomeration 20. The acquisition of planning HN-CT scans is germane to the radiotherapy treatment workflow. Intravenous iodinated contrast may be administered during the radiotherapy simulation to enhance vascular visibility and soft tissue contrast, thereby facilitating lymph node delineation and manual target volume segmentation 21,22.

Patient anatomical and tumor characteristics on medical imaging can be harnessed to automate the process of target volume segmentation for radiotherapy planning. More specifically, DL-CNNs can be used to model complex non-linear relationships in radiation oncology training datasets and make segmentation predictions on unseen HN-CT scans acquired during radiotherapy simulation 23. Cardenas et al. used HN-CT scans and their respective, physician-approved contours from 71 patients with head and neck cancers to train, validate, and test a DL-CNN in lymph node clinical target volume (CTV) auto-segmentation. The DL-CNN achieved a DSC of 0.89 for auto-segmented CTVs of neck levels II-V. Additionally, physician review of an independent dataset of 32 HN-CT scans found that over 99% of the DL-CNN auto-segmented lymph node CTVs were either sufficient for clinical use or required minor revisions 24.

We designed a DL-CNN using a residual U-Net, a recognized neural network architecture for medical image segmentation 13,15,23. Using supervised learning and contrast-enhanced HN-CT scans with corresponding ground-truth masks as inputs, we implemented a patch-based approach to train the DL-CNN to auto-segment involved lymph nodes for patients with HPV-associated OPC. As radiographically occult lymph nodes can be identified on surgical specimens for upward of 50% of patients with head and neck cancers following neck dissection, we confirmed that all radiographically abnormal lymph nodes corresponded to pathologically-involved lymph nodes and that no additional, pathologically-involved were present on surgical histopathology25,26.

The role of DL-CNN in the auto-segmentation of head and neck primary tumors on medical imaging has been widely explored 15,27. However, studies on auto-segmentation of involved lymph nodes of the head and neck are limited. Bielak et al. investigated the impact of various magnetic resonance imaging sequences on auto-segmentation of lymph nodes and found a maximum DSC of 0.58 28. Similarly, Wang et al. integrated the extraction of various imaging features into a DL-CNN and achieved a mean DSC score of 0.94 for the highest performing model 29. As computed tomography scans are acquired during the radiotherapy planning process, we chose to use contrast-enhanced, diagnostic HN-CT scans for the training of our DL-CNN. In order to evaluate the generalization capacity of the DL-CNN auto-segmentation model on unseen data, we split the dataset using 80% for training/validation and 20% for testing. In the validation phase, we found that the DL-CNN achieved median DSC and volume similarity scores of at least 0.90 and 0.95, respectively. When tested on unseen data, the DL-CNN was notable for median consensus segmentation mask scores of 0.92 for DSC and 0.97 for volume similarity. Moreover, the DL-CNN was able to successfully identify node positive HN-CT scans, with an AUC of 0.98. These results suggest that our DL-CNN may be used to perform auto-detection and auto-segmentation of involved lymph nodes as part of the radiation oncology treatment planning workflow with a high degree of fidelity and without the need for additional imaging studies.

There are several limitations to our study. We included patients with HPV-associated OPC who had undergone surgical resection of the primary tumor and lymph node dissection. As this cohort reflects a patient population with early-stage disease, it is possible that our results may not be fully generalizable to patients with more locoregionally advanced disease, including greater than 3 or more radiographically involved lymph nodes and/or radiographic evidence of extranodal extension. Furthermore, our DL-CNN was trained, validated, and tested on contrast-enhanced HN-CT scans. Our findings represent the results of a small cohort of HN-CT scans obtained at a single institution. Therefore, additional studies are needed for external validation of the model in a larger dataset of HN-CT scans performed at other institutions, with and without the presence of intravenous contrast.

## Conclusion

Patients diagnosed with HPV-associated OPC are often found to have clinical evidence of lymph node involvement at the time of diagnosis. Manual segmentation of radiographically involved lymph nodes is an integral part of treatment planning for those patients dispositioned to definitive radiotherapy. Here we have presented a DL-CNN that can be used to automate the process of lymph node detection and segmentation for these patients with a high degree of fidelity. Future studies on the validation of the DL-CNN on larger external datasets of HN-CT scans, on HN-CT scans acquired without contrast, and HN-CT scans pertaining to patients with surgically unresectable disease are necessary to further clarify the role of the DL-CNN in the routine radiation oncology workflow.

## Supporting information

Supplementary Materials [[supplements/269566_file02.docx]](pending:yes)

## Data Availability

All data produced in the present study are available upon reasonable request to the authors.

*   Received January 19, 2022.
*   Revision received January 19, 2022.
*   Accepted January 21, 2022.


*   © 2022, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/)

## References

1.  1.Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer Statistics, 2021. CA Cancer J Clin. 2021;71(1):7–33.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3322/caac.21654&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 

2.  2.Prevention CfDCa. Cancers Associated with Human Papillomavirus, United States— 2013–2017. 18 ed. Atlanta, GA: Centers for Disease Control and Prevention, US Department of Health and Human Services; 2020.
    
    
3.  3.Taku N, Wang L, Garden AS, et al. Proton Therapy for HPV-Associated Oropharyngeal Cancers of the Head and Neck: a De-Intensification Strategy. Curr Treat Options Oncol. 2021;22(6):54.
    
    
4.  4.Morani AC, Eisbruch A, Carey TE, Hauff SJ, Walline HM, Mukherji SK. Intranodal cystic changes: a potential radiologic signature/biomarker to assess the human papillomavirus status of cases with oropharyngeal malignancies. J Comput Assist Tomogr. 2013;37(3):343–345.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/RCT.0b013e318282d7c3&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23674003&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 

5.  5.Goldenberg D, Begum S, Westra WH, et al. Cystic lymph node metastasis in patients with head and neck cancer: An HPV-associated phenomenon. Head Neck. 2008;30(7):898–903.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/hed.20796&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18383529&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 

6.  6.Cantrell SC, Peck BW, Li G, Wei Q, Sturgis EM, Ginsberg LE. Differences in imaging characteristics of HPV-positive and HPV-Negative oropharyngeal cancers: a blinded matched-pair analysis. AJNR Am J Neuroradiol. 2013;34(10):2005–2009.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiYWpuciI7czo1OiJyZXNpZCI7czoxMDoiMzQvMTAvMjAwNSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzIxLzIwMjIuMDEuMTkuMjIyNjk1NjYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

7.  7.Chan MW, Yu E, Bartlett E, et al. Morphologic and topographic radiologic features of human papillomavirus-related and -unrelated oropharyngeal carcinoma. Head Neck. 2017;39(8):1524–1534.
    
    
8.  8.Westra WH. The pathology of HPV-related head and neck cancer: implications for the diagnostic pathologist. Semin Diagn Pathol. 2015;32(1):42–53.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.semdp.2015.02.023&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25804343&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 

9.  9.Shrestha A, Mahmood A. Review of Deep Learning Algorithms and Architectures. IEEE Access. 2019;7:53040–53065.
    
    
10. 10.Lang DM, Peeken JC, Combs SE, Wilkens JJ, Bartzsch S. Deep Learning Based HPV Status Prediction for Oropharyngeal Cancer Patients. Cancers (Basel). 2021;13(4).
    
    
11. 11.Cheng N-M, Yao J, Cai J, et al. Deep Learning for Fully Automated Prediction of Overall Survival in Patients with Oropharyngeal Cancer Using FDG-PET Imaging. Clinical Cancer Research. 2021;27(14):3948–3959.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImNsaW5jYW5yZXMiO3M6NToicmVzaWQiO3M6MTA6IjI3LzE0LzM5NDgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMS8yMS8yMDIyLjAxLjE5LjIyMjY5NTY2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

12. 12.Bodensteiner D. RayStation: External beam treatment planning system. Medical Dosimetry. 2018;43(2):168–176.
    
    
13. 13.Naser MA, Wahid KA, Grossberg AJ, et al. Deep Learning Auto-Segmentation of Cervical Neck Skeletal Muscle for Sarcopenia Analysis Using Pre-Therapy CT in Patients with Head and Neck Cancer. medRxiv. 2021:2021.2012.2019.21268063.
    
    
14. 14.Consortium M. MONAI: Medical Open Network for AI. In:2020.
    
    
15. 15.Wahid KA, Ahmed S, He R, et al. Evaluation of deep learning-based multiparametric MRI oropharyngeal primary tumor auto-segmentation and investigation of input channel effects: Results from a prospective imaging registry. Clin Transl Radiat Oncol. 2022;32:6–14.
    
    
16. 16.Warfield SK, Zou KH, Wells WM. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging. 2004;23(7):903–921.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2004.828354&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15250643&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000222428100013&link_type=ISI) 

17. 17.Aydin OU, Taha AA, Hilbert A, et al. On the usage of average Hausdorff distance for segmentation performance assessment: hidden error when used for ranking. Eur Radiol Exp. 2021;5(1):4.
    
    
18. 18.Xu L, Dahlstrom KR, Lairson DR, Sturgis EM. Projected oropharyngeal carcinoma incidence among middle-aged US men. Head Neck. 2019;41(9):3226–3234.
    
    
19. 19.Bauwens L, Baltres A, Fiani DJ, et al. Prevalence and distribution of cervical lymph node metastases in HPV-positive and HPV-negative oropharyngeal squamous cell carcinoma. Radiother Oncol. 2021;157:122–129.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ra-donc.2021.01.028&link_type=DOI) 

20. 20.Joo L, Bae YJ, Choi YJ, et al. Prediction model for cervical lymph node metastasis in human papillomavirus-related oropharyngeal squamous cell carcinomas. Eur Radiol. 2021.
    
    
21. 21.Biau J, Lapeyre M, Troussier I, et al. Selection of lymph node target volumes for definitive head and neck radiation therapy: a 2019 Update. Radiother Oncol. 2019;134:1–9.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.radonc.2019.01.018&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31005201&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 

22. 22.Merlotti A, Alterio D, Vigna-Taglianti R, et al. Technical guidelines for head and neck cancer IMRT on behalf of the Italian association of radiation oncology - head and neck working group. Radiat Oncol. 2014;9:264.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13014-014-0264-9&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25544268&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 

23. 23.Cardenas CE, Yang J, Anderson BM, Court LE, Brock KB. Advances in Auto-Segmentation. Semin Radiat Oncol. 2019;29(3):185–197.
    
    
24. 24.Cardenas CE, Beadle BM, Garden AS, et al. Generating High-Quality Lymph Node Clinical Target Volumes for Head and Neck Cancer Radiation Therapy Using a Fully Automated Deep Learning-Based Approach. Int J Radiat Oncol Biol Phys. 2021;109(3):801–812.
    
    
25. 25.Krabbe CA, Dijkstra PU, Pruim J, et al. FDG PET in oral and oropharyngeal cancer. Value for confirmation of N0 neck and detection of occult metastases. Oral Oncol. 2008;44(1):31–36.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.oraloncology.2006.12.003&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17306603&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000252648900004&link_type=ISI) 

26. 26.Koyfman SA, Ismaila N, Crook D, et al. Management of the Neck in Squamous Cell Carcinoma of the Oral Cavity and Oropharynx: ASCO Clinical Practice Guideline. J Clin Oncol. 2019;37(20):1753–1774.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1200/JCO.18.01921&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30811281&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F21%2F2022.01.19.22269566.atom) 

27. 27.Andrearczyk V, Oreiller V, Depeursinge A. Head and Neck Tumor Segmentation First Challenge, HECKTOR 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Proceedings: First Challenge, HECKTOR 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Proceedings. 2021.
    
    
28. 28.Bielak L, Wiedenmann N, Berlin A, et al. Convolutional neural networks for head and neck tumor segmentation on 7-channel multiparametric MRI: a leave-one-out analysis. Radiat Oncol. 2020;15(1):181.
    
    
29. 29.Wang Y, Zamiela C, Thomas TV, et al. 3D Texture Feature-Based Lymph Node Automated Detection in Head and Neck Cancer Analysis. Paper presented at: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 16-19 Dec. 2020, 2020.