Abstract
Introduction The common approach for organ segmentation in hybrid imaging relies on co-registered CT (CTAC) images. This method, however, presents several limitations in real clinical workflows where mismatch between PET and CT images are very common. Moreover, low-dose CTAC images have poor quality, thus challenging the segmentation task. Recent advances in CT-less PET imaging further highlight the necessity for an effective PET organ segmentation pipeline that does not rely on CT images. Therefore, the goal of this study was to develop a CT-less multi-tracer PET segmentation framework.
Methods We collected 2062 PET/CT images from multiple scanners. The patients were injected with either 18F-FDG (1487) or 68Ga-PSMA (575). PET/CT images with any kind of mismatch between PET and CT images were detected through visual assessment and excluded from our study. Multiple organs were delineated on CT components using previously trained in-house developed nnU-Net models. The segmentation masks were resampled to co-registered PET images and used to train four different deep-learning models using different images as input, including non-corrected PET (PET-NC) and attenuation and scatter-corrected PET (PET-ASC) for 18F-FDG (tasks #1 and #2, respectively using 22 organs) and PET-NC and PET-ASC for 68Ga tracers (tasks #3 and #4, respectively, using 15 organs). The models’ performance was evaluated in terms of Dice coefficient, Jaccard index, and segment volume difference.
Results The average Dice coefficient over all organs was 0.81±0.15, 0.82±0.14, 0.77±0.17, and 0.79±0.16 for tasks #1, #2, #3, and #4, respectively. PET-ASC models outperformed PET-NC models (P-value < 0.05). The highest Dice values were achieved for the brain (0.93 to 0.96 in all four tasks), whereas the lowest values were achieved for small organs, such as the adrenal glands. The trained models showed robust performance on dynamic noisy images as well.
Conclusion Deep learning models allow high performance multi-organ segmentation for two popular PET tracers without the use of CT information. These models may tackle the limitations of using CT segmentation in PET/CT image quantification, kinetic modeling, radiomics analysis, dosimetry, or any other tasks that require organ segmentation masks.
Introduction
PET/CT hybrid imaging provides valuable information by combining structural, molecular, and physiological information with a wide range of indications and radiopharmaceuticals [1]. Since the emergence of hybrid PET/CT imaging, the development and use of different radiotracers has expanded increased. 18F-Fluorodeoxyglucose (18F-FDG), with a wide range of indications for brain, cardiac, and oncological imaging, is the most common used radiotracer in clinical practice [2–4]. Other common radiotracers are prostate-specific membrane antigen radiolabeled ligands, such as 68Ga-PSMA and radiolabeled somatostatin analogues, such as 68Ga-DOTATATE. 68Ga-PSMA has high diagnostic accuracy in the initial staging and biochemical recurrence evaluation in patients diagnosed with prostate cancer [5; 6]. It is also used in the context of PSMA radioligand theragnostics, for lesions’ evaluation and patient selection with a good predictive value [7].
Medical image segmentation in general and in nuclear medicine in particular, is a crucial step towards modern personalized medicine. Segmentation can play an important role in PET image quantification in each organ and volume of interest (VOI) [8; 9]. Quantitative PET provides detailed information for accurate diagnosis and therapy by precise measurement of tracer uptakes and kinetics within each VOI [10; 11]. Nowadays, with the growing interest in personalized dosimetry for radiopharmaceutical therapy (RPT), aiming at delivering a tumoricidal radiation dose to the target volume while sparing organs at risk (OARs) [12; 13], the importance of image segmentation is becoming more pronounced. Moreover, the assessment of treatment response and prognostication, which requires tumor and OAR masks, have recently received significant attention [14–18]. In addition, studies highlighted the importance of non-tumoral organs in prognostication and outcome prediction [19; 20].
In clinical practice, segmentation is performed manually on CT or MRI images after visual inspection of PET images [21–23]. Manual contouring is subjective, time-consuming, labor intensive, and prone to errors and inter-/intra-operator variability because of different levels of expertise and the use of different windowing settings [24; 25]. The available methods for automated organ segmentation in hybrid molecular imaging, predominantly using deep learning (DL), focus on using co-registered CT images [26]. Reliable CT segmentation tools capable of automated segmentation [27; 28] can be used for PET/CT images. However, this approach faces three main limitations.
First, mismatch between emission (PET/SPECT) and transmission (CT) images is highly prevalent in clinical setting [29; 30]. This issue becomes more challenging in dynamic imaging protocols, where CT images are acquired within seconds at the beginning of the exam, while the dynamic PET scan is usually acquired during much longer time, including inevitably averaging multiple respiratory and cardiac cycles. In addition, involuntary changes in the position and size of the organs, such as the bladder getting filled and bowel movements, limit CT segmentation reliability [31]. Additionally, patient bulk motion during prolonged dynamic acquisitions further complicate alignment. Second, the low-dose and ultra-low-dose attenuation correction CT (CTAC) images acquired using lower tube currents and special beam filtering [32], often used in PET/CT, suffer from reduced image quality, thus affecting the accuracy of segmentation. CTAC. Last, the potential advent of CT-less clinical scanners, such as PET-only and PET/MRI scanners [30; 33], which utilize DL-based or Maximum Likelihood estimation of Activity and Attenuation (MLAA)-based attenuation correction methods, pose a significant challenge to CT-based segmentation approaches. DL-guided MRI multi-organ segmentation models were recently introduced to overcome PET/MRI organ segmentation [34]. These limitations highlight the necessity for developing segmentation tools based on emission data only rather than relying on co-registered CT images.
Utilizing the emission data to improve the performance of DL-based organ segmentation has been previously reported [35–38]. Klyuzhin et al. [35] used both PET and CT images for improved organ segmentation in PET/CT. Yazdani et al. [36] developed DL segmentation models to segment both healthy organs and malignant lesions from 68Ga-PSMA PET/CT images and compared them to using only PET, only CT, and both images as input to their DL models. Wang et al. [37] segmented bladder and heart on 18F-FDG PET/CT images to overcome the issue of absence or availability of unreliable CT images. Clement et al. [38] developed a model to perform CT-less organ segmentation on 18F-FDG PET images for dynamic imaging.
This study aimed to develop a reliable CT-less multi-organ segmentation pipeline on two common radiotracers (18F-FDG and 68Ga-PSMA) using a multi-centric dataset to address the limitations of CT-based segmentation approaches in hybrid PET/CT imaging.
Materials and Methods
Common processing and steps for both tracers for reference segmentation generation
Three types of images, including CTAC, Non-corrected PET (NC), and attenuation and scatter-corrected (CT-ASC) PET images were collected in a fully anonymized setup. All images were visualized using the open-source ITK-SNAP software [39]. Images presenting with a mismatch between PET and CT were excluded from training whereas PET/CT images without mismatch were included in the next steps. Figure 1 illustrates an example of a mismatch visualized with segmentation generated based on the co-registered CT scan. Using previously developed DL-based segmentation models in our group [28] based on nnU-Net architecture [40], a total number of 22 organs were delineated on the CTAC component of PET/CT images. All nnU-Net five folds were assembled on images to ensure the highest segmentation accuracy. The previously trained models were separate models, each dedicated to one specific organ. The CT-generated segmentation masks were dilated by 2 mm and resampled with the co-registered PET image voxel spacing. During down sampling from CT voxel spacing (1 to 1.5 mm) to PET spacing (1.6 to 4 mm) without dilation, certain organ shapes, such as ribs and thin parts of pelvic hip bones, may be lost. This occurs because nearest neighborhood interpolation, which is necessary for maintaining a binary segment with values of 0 and 1, has limitations that can result in the removal of fine details in the segmentation. The segmentations were integrated into a unified multi-value segmentation mask using the Simple ITK 2.2 Python library, prioritizing organs with higher Dice values from CT training task. For instance, if a single voxel was segmented as both liver and stomach by separate CT segmentation models, the voxel was classified as liver tissue to avoid overlap between the two organs. The segmentation masks and PET images were used to train an nnU-Net model using a five-fold cross-validation data split. Four different tasks were defined: two utilizing 18F-FDG PET images and two utilizing 68Ga-PET images. Each task involved using either ASC or NC as inputs, namely, 18F-FDG-NC (task #1), 18F-FDG-ASC (task #2), 68Ga-NC (task #3), and 68Ga-ASC (task #4).
Datasets
This study included two separate sets of images acquired from patients injected with 18F-FDG (dataset #1) and 68Ga-PSMA tracers (dataset#2). The study was approved by the local ethics committee and consent was waived owing to the retrospective nature of the study protocol.
Dataset #1 (18F-FDG). This dataset included patients injected with 18F-FDG for oncological indications, with whole-body PET/CT images acquired on Biograph mCT and Biograph Vision scanners (Siemens Healthineers, TN, USA). Initially, 1487 images were included. After excluding PET/CT image pairs with mismatches, that is 947 image pairs (∼64%), 540 cases remained for five-fold training. The semi-diagnostic CTAC images acquired with an average tube current of ∼110 mAs, and PET-NC and PET-ASC images were reconstructed using iterative reconstruction methods. Detailed information about dataset #1 can be found in Table 1. A total number of 22 organs were selected for tasks #1 and #2 on dataset #1, including the adrenal gland (AG), Aorta, Colon, Esophagus, Eyeballs, Femoral Heads (FH), Gall bladder (GB), Heart, Hip bones (including Ilium, Ischium, and Pubis as a single mask), Kidneys, Liver, Lungs, Pancreas, Erectus Spinae, Rib Cage, Sacrum, Spleen, Stomach, Urinary bladder (UB), Vertebrae, Brain, and Clavicle.
From the 575 cases initially collected, 390 were excluded, leaving only 185 clean cases for the remainder of our study. Table 2 summarizes the demographic information for all 575 cases initially included in our study. Some information was missing due to the anonymization process. A total number of 15 organs were selected for tasks #3 and #4 on dataset #2, including AG aorta, Brain, Eyeballs, Hip bones, Kidneys, Liver, Lungs, Pancreas, Rib cage, Sacrum, Spleen, UB, Vertebrae, Heart. Seven organs were excluded from dataset #2 for tasks #3 and #4 as there is less anatomical information in 68Ga images compared to 18F-FDG images. We aimed to include only organs with distinguishable uptake.
Model training parameters
Four separated nnU-Net [40] models were trained for the four defined tasks. For each task, the combined segmentation masks and the corresponding PET images were fed into an nnU-Net version 2 (nnunetv2) pipeline using default parameters except the training length which was increased from the default value of 1000 epochs to 2000 epochs to enhance accuracy. We utilized nnU-Net 3D-fullres training configuration, which uses 3D patches for training. The initial learning rate was set to 1e-2 and decreased every epoch. The decay of 3e-5 and the Dice cross-entropy loss function were used. Five-fold cross-validation data splits were used, with 80% of images used for training and 20% for testing in each fold. The training process was conducted on a PC equipped with an RTX4090 GPU with 24 GB of dedicated memory and a Core i9-13900KF CPU with 32 GB of RAM.
Evaluation strategy
Common segmentation evaluation metrics, including Dice coefficient, Jaccard index, precision, sensitivity, specificity, accuracy, mean surface distance and segment volume difference were used to compare the predicted segmentation with the reference ones. The Mann-Whitney U test was employed to compare the models’ performance on NC and ASC images. In other words, we compared performance between tasks#1 and #2 as well as between tasks #3 and #4, seeking statistically significant differences using a two-tailed P-value of 0.05 as the threshold.
Results
Tasks #1 and #2
For tasks #1 and #2, an average Dice value over all organs of 0.81 ± 0.15 and 0.82 ± 0.14 was achieved, respectively. As shown in Figure 3, in terms of Dice scores, task #1 demonstrated superior performance across most organs compared to task #2.
Table 3 and Table 4 summarize the details of five-fold cross-validation results for tasks #1 and #2, respectively. The highest Dice coefficients were achieved for the Brain and Lungs, while the lowest values were found for smaller organs, such as AGs. The detailed results separated by every fold may be found in supplementary Table 1.
Table 5 compares P-Values between tasks #1 and #2, indicating significant differences across most organs. There are significant differences in Dice values for most organs between tasks #1 and #2, while volume differences show significance in fewer organs. An example of our model output for task #2 tested on a noisy dynamic acquisition is shown in supplementary figure 2.
Figure 4 demonstrates an example of segmented organs for tasks #1 and #2 on a case with a good match between PET and CT images from a cross-validation strategy, depicting the excellent performance of our models.
Figure 5 shows an example of PET/CT image with unreliable CT segmentation due to respiratory mismatch affecting the segmentation of moving organs, especially the Lungs, Liver, and Spleen. This case was excluded from cross-validation training, the trained models of tasks#1 and #2 were ensembled on the corresponding images. In other words, task#1 model was tested on PET-NC and task#2 model on PET-ASC images.
Tasks #3 and #4
The average of the Dice scores over 15 organs from five-fold cross-validation were 0.766 ± 0.171 and 0.788 ± 0.163 for tasks #3 and #4, respectively. The performance metrics for tasks #3 and #4 are summarized in Table 6 and Table 7. The detailed performance metrics are reported separately for each fold in supplementary Table 1. The Dice values for task #4 were significantly higher than those for task #3 with P-Values below 0.05 for most organs, except for larger organs with a clear objective contrast on 68Ga-NC images, such as the Hips, Sacrum, Vertebrae, Kidneys, and UB. The P-Values are reported in Table 8. The lowest Dice value was observed for AG whereas the highest was achieved for the Brain. Figure 6 displays the box plot of Dice scores for the included organs in tasks #3 and #4.
Figure 7 illustrates an example with strong alignment between CT and PET within the 5-fold cross-validation data split for tasks #3 and #4.
Figure 8 depicts an image from the excluded studies demonstrating the mismatch between PET and CT images. This case shows the unreliable CT generated masks and the excellent performance of our model in delineating organs.
Discussion
Automatic, fast, and accurate segmentation of medical images has become one of the hottest topics in precision medicine for personalized dosimetry and image quantification [41–45]. In nuclear medicine, where PET/CT and SPECT/CT are commonly performed for diagnostic and therapeutic purposes, automated organ segmentation is crucial. Recent studies showed the importance of Organomics and organ information in overall survival prediction [15; 20]. as well as pre-therapy dose prediction in PSMA theragnostic procedures [14]. The common method used for automated organ delineation in hybrid PET/CT and SPECT/CT relies on the co-registered CT images. However, this approach is subject to limitations, such as the highly prevalent mismatch between emission and CT images and the low quality of low-dose CTAC images. Urinary bladder filling and bowel movements are inevitable and could cause more problematic mismatch between the CT generated masks and realistic organ position, shape and size depicted on PET images. Additionally, not all PET scanners are equipped with CT for attenuation and scatter correction, and as such, an approach that doesn’t rely on CT is necessary.
We targeted two commonly used tracers for PET imaging and developed a comprehensive DL segmentation pipeline for the automated delineation of multiple organs to be used in different clinical scenarios. To ensure a strong match between emission and CT images in the training set and to prevent the trained DL models from being affected by mismatch, we first cleaned our data by excluding PET/CT image pairs presenting with respiratory, cardiac, and bulk motion mismatches. After visual assessment of our initially included dataset, we excluded more than 65% of the 2062 images from our study, emphasizing the high prevalence of mismatch in PET/CT imaging.
We included different numbers of organs depending on the tracer as 68Ga-PSMA PET images contain less anatomical information. Our goal was to develop a reliable model based on clinical studies for potential implementation in clinical setting. The first scenario for using our models involve performing segmentation on PET ASC images corrected either by CT or any other method, such as MLAA or DL-based ASC techniques. In this scenario, the CT image is either unavailable or too noisy to be segmented through DL . To address this limitation, we provided models from tasks #2 and #4 for delineation on PET ASC images, as they outperformed the models of tasks #1 and #3, which use NC images. PET ASC images benefit from better contrast and contain more information due to corrections for degrading factors, such as attenuation, scatter and point-spread function (PSF).
Additionally, we considered the second scenario of performing CT-less PET segmentation where the PET ASC image is corrected with a mismatched CT, or PET-ASC images are not available. Such corrections can induce unacceptable mismatch artifacts on PET ASC images, removing the useful information e.g., in areas such as chest abdomen interval or causing halo artifacts which are very common in 68Ga-PSMA PET/CT imaging. In this scenario, as shown in Figure 5 and Figure 8, the chest area was affected, and the DL model trained on PET ASC images only on cases without mismatch, may identify it as lung tissue. To address this issue, we implemented two strategies including tasks #1 and #3 to provide a reliable segmentation solution for all potential clinical scenarios. The performance of our models in the second scenario is lower than those in the first scenario as the NC images suffer from multiple artifacts, and are not corrected for attenuation and scatter, and usually do not include time-of-flight (TOF) and PSF correction. We hypothesize that this would be a versatile solution by considering real clinical needs and could tackle the issue more effectively.
We employed state-of-the-art nnU-Net V2 pipeline, which has shown promising results in recent medical image segmentation studies. Our model achieved excellent accuracy in segmenting organs, such as the Lungs, Brain, and Liver. However, it achieved lower performance in a few smaller organs with lower objective contrast and visibility in PET images, especially when using NC images as input, such as AGs. The overall performance was superior in tasks #1 and #2 using FDG PET images compared to tasks #3 and #4 using 68Ga-PET, as anticipated. The difference can be attributed to the lower structural information in 68Ga-PET images because of specific uptake patterns of 68Ga-PSMA tracer. As shown in supplementary figure 2, our models have shown acceptable performance even on very noisy dynamic FDG PET images. It should be noted that these images are acquired shortly after tracer injection, thus a different radiopharmaceutical uptake pattern can be observed compared to delayed PET images (usually acquired around 60 minutes post-injection). Despite this fact, our model showed robust performance on dynamic, noisy frames.
While the Dice coefficient alone may have limitations for evaluation of image segmentation performance [46], we extensively evaluated the performance of our models using multiple metrics, including Dice, Jaccard, mean surface distance, and the volume difference between the reference and the predicted masks. Our study achieved significantly better results compared to the study by Yazdani et al. [47] where few organs were included for segmentation. Our Dice scores were 0.82 vs. 0.80, 0.90 vs. 0.88, 0.84 vs. 0.79, and 0.83 vs. 0.81 for kidneys, liver, spleen, and urinary bladder organs in task #4. The improved performance could be due to our approach of excluding cases with mismatches, which could mislead the DL model during training and underestimate the Dice value when unreliable segmentation masks are used as reference in those cases. Klyuzhin et al. [48] developed a multi-organ segmentation model for 68Ga-PSMA images using both PET and CT images as inputs in their UNET model. Our model, however, utilizes only the emission images to overcome the aforementioned limitations.
This work inherently bears a number of limitations. First, we developed PET organ segmentation models for two common tracers and suggested training new models for other tracers. However, transfer learning is one option that should be considered. CT segmentation does not share this dependency on the tracer used. Another limitation of our study is the limited number of training dataset acquired on only two PET/CT scanners from the same manufacturer., Potential end-users would need to perform finetuning on our publicly available models using their own local datasets.
Conclusion
We developed DL-powered CT-less automated organ segmentation models from PET images for two common tracers used in PET/CT imaging to overcome the limitations of CT segmentation in delineation and quantification. Our model showed acceptable performance; however, the models using PET-ASC images as input achieved a better performance. The method can be used in the clinical to enable a number of clinical and research applications.
Code and data availability
These models and inference instructions will be available on GitHub upon publication of this article.
Competing interests
The authors have no relevant financial or non-financial interests to disclose, and the authors have no competing interests to declare that are relevant to the content of this article.
Ethical approval
The study was approved by the Ethics committee of the Canton of Geneva, Switzerland. Consent forms were waived owing to the retrospective nature of the study.
Data Availability
Datasets are not publicly available. The nnU-Net trained models will be available on Github oage of the first author.
Abbreviations
- PET
- Positron Emission Tomography
- CT
- Computed Tomography
- NC
- Non-Corrected
- ASC
- Attenuation and Scatter Corrected
- PSMA
- Prostate Specific Membrane
- Antigen FDG
- Fluorodeoxyglucose
- NET
- Neuroendocrine tumor
- UB
- Urinary Bladder
- AG
- Adrenal Glands
- CTDIvol
- Volumetric Computed Tomography Dose Index