MedSegBench: A Comprehensive Benchmark for Medical Image Segmentation in Diverse Data Modalities

Musa Aydin; Zeki Kuş

doi:10.1101/2024.08.26.24312619

ABSTRACT

MedSegBench is a comprehensive benchmark designed to evaluate deep learning models for medical image segmentation across a wide range of modalities.. This benchmark includes 35 datasets with over 60,000 images, covering modalities such as ultrasound, MRI, and X-ray. It addresses challenges in medical imaging, such as variability in image quality and dataset imbalances, by providing standardized datasets with train/validation/test splits. The benchmark supports binary and multi-class segmentation tasks with up to 19 classes. Evaluations are conducted using the U-Net architecture with various encoder/decoder networks, including ResNets, EfficientNet, and DenseNet, to evaluate model performance. MedSegBench serves as a valuable resource for developing robust and flexible segmentation algorithms. It allows for fair comparisons across different models and promotes the development of universal models for medical tasks. The datasets and source code are publicly available, encouraging further research and development in medical image analysis.

Background & Summary

Deep learning has become essential in medical image analysis and segmentation, offering powerful methods to help doctors and researchers better understand and diagnose diseases¹. Deep learning can identify patterns and details in medical images that might be difficult for human eyes to detect using complex networks such as convolutional neural networks². These techniques are precious for finding tumors in X-rays, classifying different cell types in whole-slide images, or segmenting different brain parts in MRI scans. However, working with biomedical datasets presents unique challenges, including variability of image quality and resolution, the need for well-annotated examples, imbalances of the datasets, and different modalities. Addressing these challenges and ensuring the effectiveness of deep learning methods in real-world medical settings requires large and diverse datasets³. These comprehensive collections of medical images help train the algorithms to handle different modalities and medical tasks. They also allow researchers to compare deep learning methods fairly, determine the most effective approaches for specific medical tasks, and develop universal models for different medical tasks.

There are limited benchmark studies in the literature focused on medical imaging, with most concentrating on medical image classification problems^4–8. Gelasca et al.⁴ present a comprehensive biomedical segmentation benchmark that evaluates bioimage analysis methods. It includes six datasets with associated ground truth and validation methods, covering different scales from subcellular to tissue levels. Rebuffi et al.⁵ propose the Visual Decathlon Challenge, a benchmark that evaluates models across ten diverse visual classification domains, including datasets such as Aircraft, CIFAR-100, and ImageNet. Medical Segmentation Decathlon⁶ supports the creation and benchmarking of semantic segmentation algorithms. It includes 2633 3D images from ten different anatomical sites and modalities collected from multiple institutions and annotated by experts. Yang et al.⁷ introduce the MedMNIST Benchmark, a collection of ten pre-processed medical image datasets standardized to 28×28 pixels. It covers various medical image modalities and support multiple classification tasks. Yang et al.⁸ extend MedMNIST with MedMNIST v2, a standardized collection of biomedical image datasets. This includes 12 datasets for 2D images and 6 for 3D images, covering various data modalities, scales, and classification tasks,

This study introduces a comprehensive benchmark dataset for medical image segmentation (Figure 1). It includes 35 distinct datasets with over 60,000 images covering various data modalities such as ultrasound, dermoscopy, MRI, X-ray, OCT, and more. It provides a diverse resource for evaluating the performance of deep learning models in medical image segmentation tasks. The dataset includes a wide range of scales, from small collections with just a few dozen images to extensive datasets containing tens of thousands of samples. The segmentation tasks cover both binary and multi-class problems, with some datasets featuring up to 19 different classes. This benchmark offers several powerful advantages as a robust and versatile tool for the research community:

Diversity of modalities: The benchmark includes datasets from various imaging modalities such as Ultrasound, MRI, X-Ray, OCT, Dermoscopy, Endoscopy, and various types of microscopy.
Task complexity: It covers both binary segmentation tasks and multi-class segmentation tasks with up to 19 classes.
Dataset sizes: There’s a wide range in the number of images per dataset, from as few as 28 to as many as 21,165.
Data split: All datasets follow a standard train/validation/test split, which is crucial for the proper evaluation of machine learning models.
Standardization: All datasets are standardized to enhance comparability and ease of use. Samples across all datasets have been resized to three standard resolutions - 128, 256, and 512 pixels - and stored in a uniform format.
Application areas: The datasets cover various medical applications, including cancer detection, COVID-19 diagnosis, cell and nuclei segmentation, and organ segmentation.

Figure 1. Summary of the MedSegBench

We have evaluated each dataset on state-of-the-art segmentation model (U-Net⁹) with different encoder/decoder network types (ResNet-18, ResNet-50, Efficient-Net, MobileNet-v2, DenseNet-121, Mix Vision Transformer)¹⁰. Each experiment are performed 3 times and average results are reported.

This benchmark is carefully designed to assess how well deep learning models can generalize across different medical domains, perform on small and large datasets, and handle varying task complexities. By including such a wide array of medical imaging challenges, this benchmark is a powerful tool for comprehensively evaluating the robustness, flexibility, and overall efficacy of segmentation algorithms in the medical imaging field.

Methods

Data Preparation

The MedSegBench dataset comprises 35 distinct 2D medical image segmentation datasets, some of which are extracted from 3D slices. These datasets cover various data modalities such as Ultrasound, OCT, Chest X-ray, MR, and more. The original datasets differ in scales, segmentation tasks (binary/multi-class), classes, imaging modalities, and annotation styles. Hence, we have selected a standardized format and performed pre-processing to ensure a consistent format across all datasets.

Numerous medical image segmentation datasets are available in the literature, each presenting various challenges in- cluding variations in annotations, image sizes, and file formats. Additionally, many of these datasets lack officially shared train/test/validation splits, making it challenging to fairly compare different methods. To address these issues, we performed pre-processing steps. All image and label pairs are resized to 128×128, 256×256, and 512×512 pixels using the bicubic interpolation method. Although we used 512×512 sized images in our experiments, we have made the 128×128 and 256×256 sized versions publicly available for researchers with limited GPU memory. Also, we have applied a mapping to labels; pixels with values of 0 and 255 are mapped to 0 and 1 for binary segmentation tasks, and for multi-class segmentation tasks, pixels are mapped to integer values between 0 and (#Classes - 1). No additional augmentation or pre-processing steps are applied to the images and labels. We have followed three different scenarios based on MedMNIST v2⁸ to create train/test/validation splits: (1) Utilizing the source train/test/validation splits if published by the authors; (2) Using the source validation set as the test set and splitting the source training set into 90% training and 10% validation (9:1 ratio) if the source training and validation splits are published by the authors; (3) Randomly splitting the dataset into 70% training, 10% validation, and 20% test sets if no public train/test/validation splits are available (7:1:2 ratio). Most of these datasets are publicly published under Creative Commons Licenses, some of which are CC-BY-NC, CC-BY-SA, and CC-BY-NC-SA, permitting the redistribution of datasets. We have published datasets in MedSegBench under Creative Commons Licences, and source codes have been published under Apache License 2.0.

Table 1 presents the summary information for all MedSegBench datasets. In addition, Table 2 shows the data-modality-based overview for MedSegBench datasets. Furthermore, Table 3 provides an overview of various datasets, detailing their sub-categories and the number of samples for training, validation, and testing. In the following sections, we will describe the details of each dataset.

View this table:

Table 1. Overview of the MedSegBench datasets, including source references, modality, task types (binary or multi-class) with number of classes, total sample sizes and train/validation/test splits.

View this table:

Table 2. Medical imaging modality and corresponding image counts

View this table:

Table 3.

Overview of datasets and their sub-categories with Train/Validation/Test splits. Each dataset is split into specific sub-categories by authors, and the corresponding number of samples for each sub-category is listed in Train/Val/Test format.

Details

AbdomenUSMSBench

The AbdomenUSMSBench created from AbdomenUS^11,12 consists of 926 ultrasound images of the abdominal region, each with a resolution of 449×464 pixels. This dataset is designed for multi-class segmentation tasks and includes eight distinct classes. We have used the official train and test splits, and the train set is split into a training and validation set with a ratio of 9:1. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and (#Classes - 1).

Bbbc010MSBench

The Bbbc010MSBench dataset derived from Bbbc010^13,14, contains 100 microscopy images, each with a resolution of 696×520 pixels. These images are created for binary segmentation tasks and are originally captured for a screen in Fred Ausubel’s Massachusetts General Hospital (MGH) lab. The dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1 × 512 × 512 pixels, and the labels are mapped to 0 and 1.

Bkai-Igh-MSBench

The Bkai-Igh-MSBench dataset is derived from the BKAI-IGH NeoPolyp dataset^15–17 and consists of 1,200 endoscopy images, each with a resolution of 1280×995 pixels. It is designed for multi-class segmentation tasks, with three distinct classes. We can not use publicly shared test sets because of a lack of ground truth annotations. The dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and (#Classes - 1).

BriFiSegMSBench

The BriFiSegMSBench, which originates from the BriFiSeg dataset^18,19, includes 1,360 microscopy images with a resolution of 512 512 pixels. This dataset is intended for binary segmentation tasks and contains two classes. The images are single-channel samples derived from various cell lines, such as A549, HeLa, MCF7, and RPE1. The dataset is divided into training and validation sets with a 9:1 ratio. Additionally, task-specific images and annotations are provided in npz file format (see Table 3). The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

BusiMSBench

The BusiMSBench dataset is derived from the Breast Ultrasound Images Dataset^20,21 and contains 647 ultrasound images with an average resolution of 500×500 pixels. This dataset is designed for binary segmentation tasks, categorizing data into two classes: benign and malignant. It is split into three parts: train/val/test, in a 7:1:2 ratio. Additionally, class-based images (benign and malignant) and annotations are provided in .npz file format (see Table 3). The samples are resized to 1 × 512 × 512 pixels, and the labels are mapped to integer values between 0 and 1.

CellNucleiMSBench

The CellNucleiMSBench comes from the 2018 Data Science Bowl^22,23 and consists of 670 nuclei images with a resolution of 320 256 pixels. This dataset is specifically designed for binary segmentation tasks. We could not use 65 test images because ground truths are not published officially. Therefore, the source dataset split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

ChaseDB1MSBench

ChaseDB1MSBench is based on the CHASE_DB1 dataset²⁴, released in 2012 by Kingston University, London, and St. George’s, University of London, consists of 28 fundus images with a resolution of 999×960 pixels. This dataset is designed for binary segmentation tasks, including two classes. We split the source dataset into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

ChuacMSBench

The ChuacMSBench, derived from the CHUAC dataset²⁵, includes 28 fundus images with 189 189 pixels. It is designed for binary segmentation tasks. The source dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

Covid19RadioMSBench

The COVID-19 Radiography Database^26–28 is the source of the Covid19RadioMSBench dataset, which consists of 21,165 chest X-ray images, each with a resolution of 299×299 pixels. This dataset is designed for binary segmentation tasks. We divide the source dataset into three parts: train/val/test sets with a ratio of 7:1:2. It is developed by a collaborative effort of researchers from Qatar University, the University of Dhaka, and partners from Pakistan and Malaysia, working alongside medical professionals. It includes chest X-ray images for COVID-19 positive cases and Normal and Viral Pneumonia images. The authors have also categorized the images into four groups: COVID, Lung_Opacity, Normal, and Viral Pneumonia. We provide these category-based images and their corresponding annotations in .npz file format (see Table 3). The samples are resized to 1 512 512 pixels, and the labels are mapped to integer values between 0 and 1.

CovidQUExMSBench

The CovidQUExMSBench, based on the COVID-QU-Ex Dataset^29,30, consists of 2,913 chest X-ray images, each with a resolution of 256×256 pixels. This dataset is specifically designed for binary segmentation tasks. We use only infection segmentation samples. The source dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

MosMedPlusMSBench

The MosMedPlusMSBench, based on the MosMedDataPlus^57,58 dataset, comprises 2,729 Covid-19 CT images, each sized 512×512 pixels. This dataset is designed for binary segmentation tasks. We split source data into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

CystoFluidMSBench

The CystoFluidMSBench is based on Intraretinal Cystoid Fluid dataset^31–33, comprises 1,006 OCT (Optical Coherence Tomography) images, most of which are sized at 512×512 pixels. This dataset is designed for binary segmentation tasks. The images are carefully chosen by medical experts at Liaquat University of Medical and Health Sciences (LUMHS) Jamshoro, who are trained to identify Cystoid Macular Edema (CME) and its progression, providing a confirmatory diagnosis of CME. The source dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

Dca1MSBench

The Dca1MSBench is derived from the DCA1 dataset^34,35 and contains 134 fundus images, each with a resolution of 300×300 pixels. The images are provided by the Cardiology Department of the Mexican Social Security Institute, UMAE T1-León. This dataset is specifically created for binary segmentation tasks. The dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

DeepbacsMSBench

The DeepbacsMSBench, based on the DeepBacs dataset^36,37, consists of 34 samples of fundus images, each with a size of 1024×1024 pixels. It is designed for binary segmentation tasks. We use official train/validation/test splits published officially by authors. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

DriveMSBench

The DriveMSBench dataset, based on the DRIVE dataset^38,39, includes 40 fundus images, each with dimensions of 565×584 pixels. The images are obtained from a diabetic retinopathy screening program in the Netherlands. It is designed for binary segmentation and uses official splits for training, validation, and testing. The samples are resized to 3 × 512 × 512 pixels, and the labels are mapped to integer values between 0 and 1.

DynamicNuclearMSBench

The DynamicNuclearMSBench, created from the DynamicNuclearNet Segmentation dataset40, ⁴¹, consists of 7084 samples of nuclear cell images, each 128×128 pixels in size. This dataset is utilized for a binary segmentation task. Training, validation, and test splits that are officially published are used. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

FHPsAOPMSBench

The FHPsAOPMSBench dataset is based on a prior dataset^42,43 and comprises 4,000 ultrasound images, each with a resolution of 256×256 pixels. This dataset is designed for a multi-class segmentation task, including three distinct classes. The source dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1 × 512 × 512 pixels, and the labels are mapped to integer values between 0 and (#Classes - 1).

IdribMSBench

The IdribMSBench is based on the Indian Diabetic Retinopathy Image Dataset^{44, 45} and includes 80 high-resolution fundus images (4288×2848 pixels) for a binary segmentation task. We use official train/validation/test splits published officially by authors. The authors have also categorized the labels into four categories: Microaneurysms, hemorrhages, Hard Exudates, and Optic Discs. These category-based labels and annotations are provided in a npz file (see Table 3). The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

Isic2016MSBench

The Isic2016MSBench is derived from the ISIC 2016 Challenge^46,47, which consisted of 1,279 dermoscopy samples of varying sizes designed for binary segmentation tasks. We use official training, validation, and test splits published by authors. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

Isic2018MSBench

The Isic2018MSBench is derived from the ISIC 2018 Challenge^48–50, which consisted of 3,694 dermoscopy samples of varying sizes designed for binary segmentation tasks. We use official training, validation, and test splits published by authors. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

KvasirMSBench

The KvasirMSBench, derived from the Kvasir-SEG dataset^51,52, consists of 1,000 endoscopy images with resolutions ranging from 332×487 to 1920×1072 pixels. The dataset includes images of gastrointestinal polyps and their segmentation masks, which have been annotated and verified by an experienced gastroenterologist. It is structured for a binary classification task. The source dataset is divided into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

M2caiSegMSBench

M2caiSegMSBench is based on a prior dataset^53,54 comprising 614 pathology samples and specifically designed for multi-class segmentation tasks, which include 19 distinct classes. The images within this dataset exhibit variable dimensions, and we use official train/validation/test splits. The samples are resized to 3 × 512 ×512 pixels, and the labels are mapped to integer values between 0 and (#Classes - 1).

MonusacMSBench

MonusacMSBench is based on the MoNuSAC challenge^55,56. It consists of 310 samples and is designed for multi-class segmentation with 6 classes. The images in this dataset are H&E stained digitized tissue images from several patients acquired at multiple hospitals using a standard 40x scanner magnification. The annotations are provided by expert pathologists. We use the officially published train/validation/test splits from the challenge. The samples are resized to 3 ×512 ×512 pixels, and the labels are mapped to integer values between 0 and (#Classes - 1).

NucleiMSBench

The NucleiMSBench is based on a prior dataset⁵⁹, which consisting of 141 pathology samples each with an image size of 2000 2000 pixels. This source dataset is designed for binary segmentation tasks. The source dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3 ×512 ×512 pixels, and the labels are mapped to integer values between 0 and 1.

NusetMSBench

The NusetMSBench, derived from the NuSet dataset^60,61, contains 3,408 pathology samples designed for binary segmentation problems. We split the source dataset into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1 × 512 × 512 pixels, and the labels are mapped to integer values between 0 and 1.

PandentalMSBench

The PandentalMSBench is created from the Panoramic Dental X-rays dataset^{62, 63} and contains 116 X-ray samples of varying sizes. It is specifically intended for binary segmentation tasks. The dataset comprises anonymized and deidentified panoramic dental X-rays of 116 patients taken at Noor Medical Imaging Center in Qom, Iran. The source dataset is divided into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

PolypGenMSBench

The PolypGenMSBench is based on a prior endoscopy dataset^64,65 consisting of 1,412 endoscopy samples, each with an image size of 1920 1080 pixels. It is designed for binary segmentation tasks. It includes colonoscopy video frames captured from a diverse patient population at six different centers in Egypt, France, Italy, Norway, and the United Kingdom. We provide these images and annotations are captured from these centers in a npz file. The source dataset is divided into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

Promise12MSBench

The Promise12MSBench, derived on the PROMISE12 dataset^66,67, contains 1,473 MR samples, each with an image size of 512×512 pixels. It is designed for binary classification. We split the source dataset into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

RoboToolMSBench

The RoboToolMSBench, based on the RoboTool dataset³¹, consisting of 500 samples, designed for binary segmentation tasks. The source dataset is divided into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

TnbcnucleiMSBench

The TnbcnucleiMSBench is based on a prior dataset^68,69, consisting of 50 pathology samples, each with an image size of 512×512 pixels. This dataset is based on the merging of two different datasets: the first dataset, generated at the Curie Institute, consists of annotated H&E stained histology images at 40× magnification, and the second dataset, provided by the Indian Institute of Technology Guwahati, also consists of annotated H&E stained histology images captured at 40× magnification. It is designed for binary segmentation tasks. We split the source dataset into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

JUltrasoundNerveMSBench

The UltrasoundNerveMSBench, derived from prior dataset⁷⁰, contains 2,323 ultrasound samples, each with an image size of 580×420 pixels and designed for binary segmentation tasks. The primary task in this dataset is to segment a collection of nerves known as the Brachial Plexus (BP) in ultrasound images. Due to the lack of test image annotations, we split the source dataset into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1 × 512 × 512 pixels, and the labels are mapped to integer values between 0 and 1.

USforKidneyMSBench

The USforKidneyMSBench is derived from the CT2USforKidneySeg dataset^{71, 72}, comprised of 4,586 ultrasound samples, each with an image size of 256×256 pixels, and designed for binary segmentation tasks. The source dataset is split into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

UWSkinCancerMSBench

The UWSkinCancerMSBench is based on the Skin Cancer Detection dataset⁷³, consisting of 206 dermoscopy samples, designed for binary classification tasks The dataset includes images extracted from the public databases DermIS and DermQuest, along with manual segmentations of the lesions. We split the source dataset into three parts: train/val/test, in a 7:1:2 ratio. The authors have also categorized the labels into two categories: Melenoma and Not-Melenoma. These category-based labels and annotations are provided in a .npz file (see Table 3). The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and 1.

WbcMSBench

The WbcMSBench, based on prior datasets^74,75, is a microscopy imaging dataset consisting of 80 samples, with image sizes of 120×120 and 300×300 pixels. It is designed for multi-class segmentation tasks including 3 classes. The dataset is based on two sources: Dataset 1, obtained from Jiangxi Tecom Science Corporation, China, contains 300 images of white blood cells with a resolution of 120×120 pixels. Dataset 2 consists of 100 color images with a resolution of 300 300 pixels, collected from the CellaVision blog. The authors have grouped the samples into four categories: Lymphocyte, Monocyte, Neutrophil, and Eosinophil, and we provide these category-based images and corresponding labels in npz file format (see Table 3). The source dataset is divided into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 3×512×512 pixels, and the labels are mapped to integer values between 0 and (#Classes - 1).

YeazMSBench

The YeazMSBench, derived from the YeaZ dataset^76,77, consists of 707 microscopy images with varying sizes and is designed for binary segmentation tasks. We split the source dataset into three parts: train/val/test, in a 7:1:2 ratio. The samples are resized to 1 × 512 × 512 pixels, and the labels are mapped to integer values between 0 and 1.

Data Records

We have publicly shared each dataset with varying sizes (128, 256, and 512 sized) in MedSegBench at Zenodo (Link). The MedSegBench consists of 35 pre-processed 2D medical image segmentation datasets (some of them extracted 3D slices) from various data modalities and tasks (binary/multi-class). The data storage format published by MedMNISTv2⁸ is followed. We save each dataset in Numpy npz format, named as {dataset}_{size}.npz. Each npz file contains following keys: [“{train,val,test}_images”, “{train,val,test}_label”]. Also, some authors have published class- or category-based images and labels. We have also added this information with the following keys into the npz file and explain them in source code files: [“{train,val,test}_images_ {classno}”, “{train,val,test}_label_{classno}”]. All images and labels are stored in uint8 data type. {train,val,test}_images: Numpy array contains train, validation and test images with N×W×H×C shape for RGB datasets, and N×W×H for gray-scale datasets. Here, N refers to the number of samples, W is the width, H is the height, and C denotes the number of channels. {train,val,test}_label: It contains train, validation and test labels with N×W×H shape. {train,val,test}_images_{classno} and {train,val,test}_label_{classno}: These contain class or category-based train, validation, and test images and labels with shapes N×W×H×C (for RGB images, and N×W×H for gray-scale images), respectively.

Technical Validation

Baseline methods

In this study, we chose the U-Net architecture as the baseline structure for image segmentation tasks. We have selected six encoder/decoder networks to enhance performance and adaptability. These include ResNet18, ResNet50, and DenseNet121, commonly used as benchmarks in segmentation research. Additionally, we have selected EfficientNet and MobileNetv2 because they are lightweight models that offer a more computationally efficient alternative to ResNets and DenseNet. Furthermore, we have added a transformer-based approach using the Mix Vision Transformer, acknowledging the growing interest in transformer models for vision tasks.

The U-Net structure and diverse encoder/decoder networks are implemented using the qubvel-segmentation framework¹⁰. We have not used pre-trained ImageNet weights; we train each model from scratch on our datasets. We have trained each model with three randomly selected seed values to ensure the robustness of our results. All images are resized to 512×512 pixels, a standardized dimension for the training, validation, and testing phases. Training is conducted over 200 epochs using the Adam Optimizer with a learning rate 1e-3. For binary segmentation tasks, we used dice loss, while categorical cross-entropy loss is used for multi-class tasks. A batch size of 128 is selected throughout the training process. We have not applied weight decay methods or any data augmentation techniques, focusing on the raw performance of the models. The model weights corresponding to the best validation IOU are recorded for each network configuration. Further details regarding the model implementation, training, and evaluation steps are available in our code repository.

Performance Measures

We have evaluated each model on 35 different datasets using four performance measures: Precision (PREC), Recall (REC), F1-score (F1), and Intersection over Union (IOU). Precision measures the accuracy of positive predictions, highlighting its ability to avoid false positives, while Recall evaluates the model’s capacity to identify all relevant positive instances, minimizing false negatives. The F1-Score, as the harmonic mean of Precision and Recall, provides a balanced view, which is especially useful when there is an unbalanced class distribution. IoU, primarily used in image segmentation and object detection, evaluates the overlap between predicted and actual regions, ensuring accurate localization and identification of objects. We have individually reported PREC, REC, F1, and IOU scores for each dataset and averaged the results.

Results

The average PREC and REC results obtained from three different run are showed in Table 4 and average F1 and IOU scores are reported in Table 5 for each individual datasets. Also, the average results for each baseline methods are shown in Table 5,

View this table:

Table 4.

The average precision and recall results for six different encoder networks. RN-18: ResNet-18; RN-50: ResNet-50; EN: Efficient-Net; MN-v2: Mobile-Net-v2; DN-121: DenseNet-121; MVT: Mix Vision Transformer

View this table:

Table 5.

The average F1-score and IOU results for six different encoder networks. RN-18: ResNet-18; RN-50: ResNet-50; EN: Efficient-Net; MN-v2: Mobile-Net-v2; DN-121: DenseNet-121; MVT: Mix Vision Transformer

Table 4 presents a comprehensive overview of the average precision and recall results for six different encoder networks across various datasets. These networks include ResNet-18 (RN-18), ResNet-50 (RN-50), Efficient-Net (EN), Mobile-Net-v2 (MN-v2), DenseNet-121 (DN-121), and Mix Vision Transformer (MVT). The results are divided into two main categories: precision and recall. In terms of precision, DenseNet-121 consistently demonstrated strong performance across numerous datasets. For example, it achieved the highest precision scores in datasets such as BusiMSB (0.794), ChuahMSB(0.870) and Dca1MSB (0.801). Similarly, Efficient-Net also demonstrated strong precision, particularly in datasets like Isic2016MSB and Isic2018MSB, where it scored 0.912 and 0.857, respectively. Although the Mix Vision Transformer is not evaluated on all datasets because it only accepts at least three channel images as input, it performed competitively where applicable, achieving high precision in datasets like Bkai-Igh-MSB (0.983). Regarding Recall, DenseNet-121 has emerged as a top performer, achieving the highest recall in datasets such as Bbbbc010MSB (0.920) and WbcMSB (0.970). Efficient-Net also performed well in recall metric, particularly in datasets like DynamicNuclearMSB (0.966) and USforKidneyMSB (0.982). The results indicate that DenseNet-121 and Efficient-Net are particularly robust across precision and recall metrics, suggesting their effectiveness in various applications. Overall, the analysis highlights DenseNet-121’s consistently high performance across multiple datasets, making it a reliable choice for tasks requiring high precision and recall. Efficient-Net also stands out, especially in recall, indicating its potential for applications where recall is critical.

Table 5 provides a comprehensive evaluation of six encoder networks across various datasets, using F1-score and Intersection over Union (IOU) as performance metrics. DenseNet-121 consistently performs well, frequently achieving the top F1 and IOU metrics scores across numerous datasets. For example, in the Bbbc010MSB and CellNucleiMSB datasets, DenseNet-121 records the highest F1-scores of 0.920 and 0.907, respectively, and similarly high IOU scores, indicating its robustness in handling diverse data types. Efficient-Net also shows significant performance, particularly in datasets like Isic2016MSB and USforKidneyMSB, where it achieves the highest F1-scores of 0.903 and 0.981, respectively. This indicates that Efficient-Net is particularly effective in scenarios requiring high precision and recall, as showed in its F1-scores. ResNet-50 performs best with an F1-score of 0.931 and an IOU of 0.870 for the DeepbacsMSB. Additionally, it has also achieved the highest F1-score of 0.786 and an IOU of 0.648 in the DriveMSB dataset. For the FHPsAOPMSB dataset, ResNet-18 has achieved the highest F1-score of 0.961 and an IOU of 0.929. While Mix Vision Transformer does not frequently perform as well as DenseNet-121, it shows competitive performance in specific datasets such as UWSkinCancerMSB, achieving the second-highest F1 Score of 0.881. This indicates its potential in specialized applications, particularly in medical imaging contexts. Overall, DenseNet-121 is the most robust and effective network, frequently outperforming other networks in achieving high F1-scores and IOU values.

Table 6 shows the average performance metrics for six different encoder networks. Efficient-Net (EN) and DenseNet-121 (DN-121) demonstrate the highest F1 scores, both achieving a value of 0.772. This indicates that these models have a balanced performance in terms of precision and recall. DenseNet-121 also achieves the highest precision at 0.848, suggesting it is effective at minimizing false positives. On the other hand, Efficient-Net leads in recall with a score of 0.788, indicating its strength in capturing true positives. Additionally, DenseNet-121 achieves the highest IOU with 0.702, closely followed by Efficient-Net with 0.700 This suggests that these two models provide the most accurate predictions. Overall, DenseNet-121 and Efficient-Net achieve similar high-performance metrics, with both models performing well in F1 score, precision, recall, and IOU. However, DenseNet-121’s complex architecture causes higher computational demands, whereas Efficient-Net provides a more efficient design, making it suitable for resource-constrained applications.

View this table:

Table 6.

The average results for six different encoder networks. RN-18: ResNet-18; RN-50: ResNet-50; EN: Efficient-Net; MN-v2: Mobile-Net-v2; DN-121: DenseNet-121; MVT: Mix Vision Transformer.

Usage Notes

The MedSegBench datasets are freely available at Zenodo We kindly request that users of the MedSegBench dataset cite this paper, along with the relevant source dataset files, in their publications. This dataset is created in order to fairly compare different models over various segmentation models from different data modalities and to create universal models. It is not suitable for clinical or medical use.

Data Availability

All data produced are available online at Zenodo

https://zenodo.org/records/13359660

Code availability

The Python data API, source code files and evaluation scripts for binary and multi-class segmentation tasks can be found at https://github.com/zekikus/MedSegBench.

Author contributions statement

M.A. conducted data collection, cleaning and pre-processing steps. Z.K. performed the evaluation tests for binary and multi-class task for each network and datasets. All authors wrote and reviewed the manuscript.

Competing interests

The authors state that they have no conflicting interests.

Acknowledgements

We would like to express our gratitude for the authors of the MedMNIST⁸, which served as the baseline for our study, and for the shared source code that we referenced to develop our own code.

References

1.↵
Han, K. et al. Deep semi-supervised learning for medical image segmentation: A review. Expert. Syst. with Appl. 245, 123052, doi:10.1016/j.eswa.2023.123052 (2024).
OpenUrl CrossRef
2.↵
Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, doi:10.1038/s41467-024-44824-z (2024).
OpenUrl CrossRef
3.↵
Carriero, A., Groenhoff, L., Vologina, E., Basile, P. & Albera, M. Deep learning in breast cancer imaging: State of the art and recent advancements in early 2024. Diagnostics 14, 848, doi:10.3390/diagnostics14080848 (2024).
OpenUrl CrossRef
4.↵
Drelie Gelasca, E., Obara, B., Fedorov, D., Kvilekval, K. & Manjunath, B. A biosegmentation benchmark for evaluation of bioimage analysis methods. BMC Bioinforma. 10, doi:10.1186/1471-2105-10-368 (2009).
OpenUrl CrossRef PubMed
5.↵
1. Guyon, I., et al.
Rebuffi, S.-A., Bilen, H. & Vedaldi, A. Learning multiple visual domains with residual adapters. In Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., 2017).
6.↵
Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. CoRR abs/1902.09063 (2019). 1902.09063.
7.↵
Yang, J., Shi, R. & Ni, B. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), doi:10.1109/isbi48211.2021.9434062 (IEEE, 2021).
OpenUrl CrossRef
8.↵
Yang, J. et al. Medmnist v2 - a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Sci. Data 10, doi:10.1038/s41597-022-01721-8 (2023).
OpenUrl CrossRef
9.↵
Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation, 234–241 (Springer International Publishing, 2015).
10.↵
Iakubovskii, P. Segmentation models pytorch. https://github.com/qubvel/segmentation_models.pytorch (2019).
11.↵
Vitale, S., Orlando, J. I., Iarussi, E. & Larrabide, I. Improving realism in patient-specific abdominal ultrasound simulation using cyclegans. Int. J. Comput. Assist. Radiol. Surg. 15, 183–192, doi:10.1007/s11548-019-02046-5 (2019).
OpenUrl CrossRef
12.↵
Orlando, J. I. Us simulation & segmentation (2020).
13.↵
Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy image sets for validation. Nat. Methods 9, 637–637, doi:10.1038/nmeth.2083 (2012).
OpenUrl CrossRef PubMed Web of Science
14.↵
Broad Bioimage Benchmark Collection — bbbc.broadinstitute.org. https://bbbc.broadinstitute.org/BBBC010. [Accessed 06-08-2024].
15.↵
Ngoc Lan, P. et al. NeoUNet: Towards Accurate Colon Polyp Segmentation and Neoplasm Detection, 15–28 (Springer International Publishing, 2021).
16.
An, N. S. et al. Blazeneo: Blazing fast polyp segmentation and neoplasm detection. IEEE Access 10, 43669–43684, doi:10.1109/access.2022.3168693 (2022).
OpenUrl CrossRef
17.↵
Duc, N. T., Oanh, N. T., Thuy, N. T., Triet, T. M. & Dinh, V. S. Colonformer: An efficient transformer based method for colon polyp segmentation. IEEE Access 10, 80575–80586, doi:10.1109/access.2022.3195241 (2022).
OpenUrl CrossRef
18.↵
Mathieu, G. M. L. A., & Bachir, E. D. Brifiseg: a deep learning-based method for semantic and instance segmentation of nuclei in brightfield images, doi:10.48550/ARXIV.2211.03072 (2022).
OpenUrl CrossRef
19.↵
Gendarme, M. & Debs, B. E. Brifiseg datasets, doi:10.5281/ZENODO.7195636 (2022).
OpenUrl CrossRef
20.↵
Al-Dhabyani, W., Gomaa, M., Khaled, H. & Fahmy, A. Dataset of breast ultrasound images. Data Brief 28, 104863, doi:10.1016/j.dib.2019.104863 (2020).
OpenUrl CrossRef
21.↵
Breast Ultrasound Images Dataset — kaggle.com. https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset. [Accessed 06-08-2024].
22.↵
Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nat. Methods 16, 1247–1253, doi:10.1038/s41592-019-0612-7 (2019).
OpenUrl CrossRef
23.↵
2018 Data Science Bowl — kaggle.com. https://www.kaggle.com/competitions/data-science-bowl-2018/data. [Accessed 06-08-2024].
24.↵
Carballal, A. et al. Automatic multiscale vascular image segmentation algorithm for coronary angiography. Biomed. Signal Process. Control. 46, 1–9, doi:10.1016/j.bspc.2018.06.007 (2018).
OpenUrl CrossRef
25.↵
Angiographics — figshare.com. https://figshare.com/s/4d24cf3d14bc901a94bf. [Accessed 06-08-2024].
26.↵
Chowdhury, M. E. H. et al. Can ai help in screening viral and covid-19 pneumonia? IEEE Access 8, 132665–132676, doi:10.1109/access.2020.3010287 (2020).
OpenUrl CrossRef
27.
Rahman, T. et al. Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images. Comput. Biol. Medicine 132, 104319, doi:10.1016/j.compbiomed.2021.104319 (2021).
OpenUrl CrossRef
28.↵
COVID-19 Radiography Database — kaggle.com. https://www.kaggle.com/datasets/tawsifurrahman/ covid19-radiography-database. [Accessed 06-08-2024].
29.↵
Tahir, A. M. et al. Covid-19 infection localization and severity grading from chest x-ray images. Comput. Biol. Medicine 139, 105002, doi:10.1016/j.compbiomed.2021.105002 (2021).
OpenUrl CrossRef
30.↵
Anas M. Tahir et al. Covid-qu-ex dataset, doi:10.34740/KAGGLE/DSV/3122958 (2022).
OpenUrl CrossRef
31.↵
Garcia-Peraza-Herrera, L. C. et al. Image compositing for segmentation of surgical tools without manual annotations. IEEE Transactions on Med. Imaging 40, 1450–1460, doi:10.1109/tmi.2021.3057884 (2021).
OpenUrl CrossRef
32.
Zeeshan Ahmed, Munawar Ahmed, Attiya Baqai & Fahim Aziz Umrani. Intraretinal cystoid fluid, doi:10.34740/KAGGLE/DS/2277068 (2022).
OpenUrl CrossRef
33.↵
Ahmed, Z. et al. Deep learning based automated detection of intraretinal cystoid fluid. Int. J. Imaging Syst. Technol. 32, 902–917, doi:10.1002/ima.22662 (2021).
OpenUrl CrossRef
34.↵
Cervantes-Sanchez, F., Cruz-Aceves, I., Hernandez-Aguirre, A., Hernandez-Gonzalez, M. A. & Solorio-Meza, S. E. Automatic segmentation of coronary arteries in x-ray angiograms using multiscale analysis and artificial neural networks. Appl. Sci. 9, 5507, doi:10.3390/app9245507 (2019).
OpenUrl CrossRef
35.↵
Ivan Cruz Aceves CIMAT — personal.cimat.mx. http://personal.cimat.mx:8181/~ivan.cruz/DB_Angiograms.html. [Accessed 06-08-2024].
36.↵
Spahn, C. et al. Deepbacs for multi-task bacterial image analysis using open-source deep learning approaches. Commun. Biol. 5, doi:10.1038/s42003-022-03634-z (2022).
OpenUrl CrossRef
37.↵
Spahn, C. & Heilemann, M. Deepbacs – escherichia coli bright field segmentation dataset, doi:10.5281/ZENODO.5550934 (2021).
OpenUrl CrossRef
38.↵
Staal, J., Abramoff, M., Niemeijer, M., Viergever, M. & van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Med. Imaging 23, 501–509, doi:10.1109/tmi.2004.825627 (2004).
OpenUrl CrossRef
39.↵
DRIVE - Grand Challenge — drive.grand-challenge.org. https://drive.grand-challenge.org/. [Accessed 06-08-2024].
40.
Van Valen, D. A. et al. Deep learning automates the quantitative analysis of individual cells in live-cell imaging experiments. PLOS Comput. Biol. 12, e1005177, doi:10.1371/journal.pcbi.1005177 (2016).
OpenUrl CrossRef PubMed
41.↵
DeepCell Datasets — datasets.deepcell.org. https://datasets.deepcell.org/data. [Accessed 06-08-2024].
42.↵
Lu, Y. et al. The jnu-ifm dataset for segmenting pubic symphysis-fetal head. Data Brief 41, 107904, doi:10.1016/j.dib.2022.107904 (2022).
OpenUrl CrossRef
43.↵
Jieyun, B. & ZhanHong, O. Pubic symphysis-fetal head segmentation and angle of progression, doi:10.5281/ZENODO.7851338 (2024).
OpenUrl CrossRef
44.↵
Porwal, P. et al. Indian diabetic retinopathy image dataset (idrid): A database for diabetic retinopathy screening research. Data 3, 25, doi:10.3390/data3030025 (2018).
OpenUrl CrossRef
45.↵
Prasanna Porwal, S. P. Indian diabetic retinopathy image dataset (idrid), doi:10.21227/H25W98 (2018).
OpenUrl CrossRef
46.↵
Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), doi:10.1109/isbi.2018.8363547 (IEEE, 2018).
OpenUrl CrossRef
47.↵
ISIC Challenge — challenge.isic-archive.com. https://challenge.isic-archive.com/data/#2016. [Accessed 07-08-2024].
48.↵
Tschandl, P., Rosendahl, C. & Kittler, H. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, doi:10.1038/sdata.2018.161 (2018).
OpenUrl CrossRef
49.
Codella, N. et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic), doi:10.48550/ARXIV.1902.03368 (2019).
OpenUrl CrossRef
50.↵
ISIC Challenge — challenge.isic-archive.com. https://challenge.isic-archive.com/data/#2018. [Accessed 07-08-2024].
51.↵
Jha, D. et al. Kvasir-SEG: A Segmented Polyp Dataset, 451–462 (Springer International Publishing, 2019).
52.↵
Simula Datasets - Kvasir SEG — datasets.simula.no. https://datasets.simula.no/kvasir-seg/. [Accessed 06-08-2024].
53.↵
Maqbool, S., Riaz, A., Sajid, H. & Hasan, O. m2caiseg: Semantic segmentation of laparoscopic images using convolutional neural networks, doi:10.48550/ARXIV.2008.10134 (2020).
OpenUrl CrossRef
54.↵
m2caiSeg — kaggle.com. https://www.kaggle.com/datasets/salmanmaq/m2caiseg. [Accessed 07-08-2024].
55.↵
Verma, R. et al. Monusac2020: A multi-organ nuclei segmentation and classification challenge. IEEE Transactions on Med. Imaging 40, 3413–3423, doi:10.1109/tmi.2021.3085712 (2021).
OpenUrl CrossRef
56.↵
MoNuSAC 2020 - Grand Challenge — monusac-2020.grand-challenge.org. https://monusac-2020.grand-challenge.org/Data/. [Accessed 07-08-2024].
57.↵
Morozov, S. P. et al. Mosmeddata: Chest ct scans with covid-19 related findings dataset, doi:10.48550/ARXIV.2005.06465 (2020).
OpenUrl CrossRef
58.↵
COVID-19 CT scan lesion segmentation dataset — kaggle.com. https://www.kaggle.com/datasets/maedemaftouni/ covid19-ct-scan-lesion-segmentation-dataset. [Accessed 06-08-2024].
59.↵
Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J. Pathol. Informatics 7, 29, doi:10.4103/2153-3539.186902 (2016).
OpenUrl CrossRef
60.↵
Yang, L. et al. Nuset: A deep learning tool for reliably separating and analyzing crowded cells. PLOS Comput. Biol. 16, e1008193, doi:10.1371/journal.pcbi.1008193 (2020).
OpenUrl CrossRef
61.↵
Linfeng Yang. Nuset training dataset/model weights from (nuset: A deep learning tool for reliably separating and analyzing crowded cells), doi:10.5281/ZENODO.3996369 (2020).
OpenUrl CrossRef
62.↵
Abdi, A. H., Kasaei, S. & Mehdizadeh, M. Automatic segmentation of mandible in panoramic x-ray. J. Med. Imaging 2, 044003, doi:10.1117/1.jmi.2.4.044003 (2015).
OpenUrl CrossRef
63.↵
Abdi, A. Panoramic dental x-rays with segmented mandibles, doi:10.17632/HXT48YK462.1 (2017).
OpenUrl CrossRef
64.↵
Ali, S. et al. Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge, doi:10.48550/ARXIV.2202.12031 (2022).
OpenUrl CrossRef
65.↵
Ali, S. et al. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Med. Image Analysis 70, 102002, doi:10.1016/j.media.2021.102002 (2021).
OpenUrl CrossRef
66.↵
Litjens, G. et al. Evaluation of prostate segmentation algorithms for mri: The promise12 challenge. Med. Image Analysis 18, 359–373, doi:10.1016/j.media.2013.12.002 (2014).
OpenUrl CrossRef PubMed
67.↵
Litjens, G. et al. Promise12: Data from the miccai grand challenge: Prostate mr image segmentation 2012, doi:10.5281/ZENODO.8014040 (2023).
OpenUrl CrossRef
68.↵
Jack, N. P., Thomas, W., Laé Marick & Reyal Fabien. Segmentation of nuclei in histopathology images by deep regression of the distance map, doi:10.5281/ZENODO.1175282 (2018).
OpenUrl CrossRef
69.↵
Naylor, P., Laé, M., Reyal, F. & Walter, T. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Transactions on Med. Imaging 38, 448–459, doi:10.1109/tmi.2018.2865709 (2019).
OpenUrl CrossRef
70.↵
Ultrasound Nerve Segmentation — kaggle.com. https://www.kaggle.com/competitions/ultrasound-nerve-segmentation. [Accessed 07-08-2024].
71.↵
Song, Y. et al. Ct2us: Cross-modal transfer learning for kidney segmentation in ultrasound images with synthesized data. Ultrasonics 122, 106706, doi:10.1016/j.ultras.2022.106706 (2022).
OpenUrl CrossRef
72.↵
CT2USforKidneySeg — kaggle.com. https://www.kaggle.com/datasets/siatsyx/ct2usforkidneyseg/data. [Accessed 07-08-2024].
73.↵
Skin Cancer Detection | Vision and Image Processing Lab — uwaterloo.ca. https://uwaterloo.ca/ vision-image-processing-lab/research-demos/skin-cancer-detection. [Accessed 07-08-2024].
74.↵
Zheng, X., Wang, Y., Wang, G. & Liu, J. Fast and robust segmentation of white blood cell images by self-supervised learning. Micron 107, 55–71, doi:10.1016/j.micron.2018.01.010 (2018).
OpenUrl CrossRef
75.↵
Acevedo, A., Alférez, S., Merino, A., Puigví, L. & Rodellar, J. Recognition of peripheral blood cell images using convolutional neural networks. Comput. Methods Programs Biomed. 180, 105020, doi:10.1016/j.cmpb.2019.105020 (2019).
OpenUrl CrossRef PubMed
76.↵
Dietler, N. et al. A convolutional neural network segments yeast microscopy images with high accuracy. Nat. Commun. 11, doi:10.1038/s41467-020-19557-4 (2020).
OpenUrl CrossRef PubMed
77.↵
Data and Software — epfl.ch. https://www.epfl.ch/labs/lpbs/data-and-software/. [Accessed 07-08-2024].

View the discussion thread.

Posted August 28, 2024.

Download PDF

Data/Code

Citation Tools

Subject Area

Health Informatics

Subject Areas

All Articles

Addiction Medicine (383)
Allergy and Immunology (699)
Anesthesia (192)
Cardiovascular Medicine (2853)
Dentistry and Oral Medicine (326)
Dermatology (244)
Emergency Medicine (430)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1011)
Epidemiology (12562)
Forensic Medicine (10)
Gastroenterology (807)
Genetic and Genomic Medicine (4436)
Geriatric Medicine (402)
Health Economics (716)
Health Informatics (2851)
Health Policy (1049)
Health Systems and Quality Improvement (1050)
Hematology (375)
HIV/AIDS (893)
Infectious Diseases (except HIV/AIDS) (13979)
Intensive Care and Critical Care Medicine (830)
Medical Education (413)
Medical Ethics (114)
Nephrology (464)
Neurology (4196)
Nursing (222)
Nutrition (617)
Obstetrics and Gynecology (786)
Occupational and Environmental Health (723)
Oncology (2204)
Ophthalmology (624)
Orthopedics (254)
Otolaryngology (318)
Pain Medicine (268)
Palliative Medicine (82)
Pathology (486)
Pediatrics (1172)
Pharmacology and Therapeutics (489)
Primary Care Research (483)
Psychiatry and Clinical Psychology (3656)
Public and Global Health (6784)
Radiology and Imaging (1490)
Rehabilitation Medicine and Physical Therapy (868)
Respiratory Medicine (900)
Rheumatology (430)
Sexual and Reproductive Health (433)
Sports Medicine (369)
Surgery (473)
Toxicology (57)
Transplantation (202)
Urology (174)

[1] 1.↵
Han, K. et al. Deep semi-supervised learning for medical image segmentation: A review. Expert. Syst. with Appl. 245, 123052, doi:10.1016/j.eswa.2023.123052 (2024).
OpenUrl CrossRef

[2] 2.↵
Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, doi:10.1038/s41467-024-44824-z (2024).
OpenUrl CrossRef

[3] 3.↵
Carriero, A., Groenhoff, L., Vologina, E., Basile, P. & Albera, M. Deep learning in breast cancer imaging: State of the art and recent advancements in early 2024. Diagnostics 14, 848, doi:10.3390/diagnostics14080848 (2024).
OpenUrl CrossRef

[4] 4.↵
Drelie Gelasca, E., Obara, B., Fedorov, D., Kvilekval, K. & Manjunath, B. A biosegmentation benchmark for evaluation of bioimage analysis methods. BMC Bioinforma. 10, doi:10.1186/1471-2105-10-368 (2009).
OpenUrl CrossRef PubMed

[5] 5.↵
Guyon, I., et al.
Rebuffi, S.-A., Bilen, H. & Vedaldi, A. Learning multiple visual domains with residual adapters. In Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., 2017).

[6] Guyon, I., et al.

[7] 6.↵
Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. CoRR abs/1902.09063 (2019). 1902.09063.

[8] 7.↵
Yang, J., Shi, R. & Ni, B. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), doi:10.1109/isbi48211.2021.9434062 (IEEE, 2021).
OpenUrl CrossRef

[9] 8.↵
Yang, J. et al. Medmnist v2 - a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Sci. Data 10, doi:10.1038/s41597-022-01721-8 (2023).
OpenUrl CrossRef

[10] 9.↵
Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation, 234–241 (Springer International Publishing, 2015).

[11] 10.↵
Iakubovskii, P. Segmentation models pytorch. https://github.com/qubvel/segmentation_models.pytorch (2019).

[12] 11.↵
Vitale, S., Orlando, J. I., Iarussi, E. & Larrabide, I. Improving realism in patient-specific abdominal ultrasound simulation using cyclegans. Int. J. Comput. Assist. Radiol. Surg. 15, 183–192, doi:10.1007/s11548-019-02046-5 (2019).
OpenUrl CrossRef

[13] 12.↵
Orlando, J. I. Us simulation & segmentation (2020).

[14] 13.↵
Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy image sets for validation. Nat. Methods 9, 637–637, doi:10.1038/nmeth.2083 (2012).
OpenUrl CrossRef PubMed Web of Science

[15] 14.↵
Broad Bioimage Benchmark Collection — bbbc.broadinstitute.org. https://bbbc.broadinstitute.org/BBBC010. [Accessed 06-08-2024].

[16] 15.↵
Ngoc Lan, P. et al. NeoUNet: Towards Accurate Colon Polyp Segmentation and Neoplasm Detection, 15–28 (Springer International Publishing, 2021).

[17] 16.
An, N. S. et al. Blazeneo: Blazing fast polyp segmentation and neoplasm detection. IEEE Access 10, 43669–43684, doi:10.1109/access.2022.3168693 (2022).
OpenUrl CrossRef

[18] 17.↵
Duc, N. T., Oanh, N. T., Thuy, N. T., Triet, T. M. & Dinh, V. S. Colonformer: An efficient transformer based method for colon polyp segmentation. IEEE Access 10, 80575–80586, doi:10.1109/access.2022.3195241 (2022).
OpenUrl CrossRef

[19] 18.↵
Mathieu, G. M. L. A., & Bachir, E. D. Brifiseg: a deep learning-based method for semantic and instance segmentation of nuclei in brightfield images, doi:10.48550/ARXIV.2211.03072 (2022).
OpenUrl CrossRef

[20] 19.↵
Gendarme, M. & Debs, B. E. Brifiseg datasets, doi:10.5281/ZENODO.7195636 (2022).
OpenUrl CrossRef

[21] 20.↵
Al-Dhabyani, W., Gomaa, M., Khaled, H. & Fahmy, A. Dataset of breast ultrasound images. Data Brief 28, 104863, doi:10.1016/j.dib.2019.104863 (2020).
OpenUrl CrossRef

[22] 21.↵
Breast Ultrasound Images Dataset — kaggle.com. https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset. [Accessed 06-08-2024].

[23] 22.↵
Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nat. Methods 16, 1247–1253, doi:10.1038/s41592-019-0612-7 (2019).
OpenUrl CrossRef

[24] 23.↵
2018 Data Science Bowl — kaggle.com. https://www.kaggle.com/competitions/data-science-bowl-2018/data. [Accessed 06-08-2024].

[25] 24.↵
Carballal, A. et al. Automatic multiscale vascular image segmentation algorithm for coronary angiography. Biomed. Signal Process. Control. 46, 1–9, doi:10.1016/j.bspc.2018.06.007 (2018).
OpenUrl CrossRef

[26] 25.↵
Angiographics — figshare.com. https://figshare.com/s/4d24cf3d14bc901a94bf. [Accessed 06-08-2024].

[27] 26.↵
Chowdhury, M. E. H. et al. Can ai help in screening viral and covid-19 pneumonia? IEEE Access 8, 132665–132676, doi:10.1109/access.2020.3010287 (2020).
OpenUrl CrossRef

[28] 27.
Rahman, T. et al. Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images. Comput. Biol. Medicine 132, 104319, doi:10.1016/j.compbiomed.2021.104319 (2021).
OpenUrl CrossRef

[29] 28.↵
COVID-19 Radiography Database — kaggle.com. https://www.kaggle.com/datasets/tawsifurrahman/ covid19-radiography-database. [Accessed 06-08-2024].

[30] 29.↵
Tahir, A. M. et al. Covid-19 infection localization and severity grading from chest x-ray images. Comput. Biol. Medicine 139, 105002, doi:10.1016/j.compbiomed.2021.105002 (2021).
OpenUrl CrossRef

[31] 30.↵
Anas M. Tahir et al. Covid-qu-ex dataset, doi:10.34740/KAGGLE/DSV/3122958 (2022).
OpenUrl CrossRef

[32] 31.↵
Garcia-Peraza-Herrera, L. C. et al. Image compositing for segmentation of surgical tools without manual annotations. IEEE Transactions on Med. Imaging 40, 1450–1460, doi:10.1109/tmi.2021.3057884 (2021).
OpenUrl CrossRef

[33] 32.
Zeeshan Ahmed, Munawar Ahmed, Attiya Baqai & Fahim Aziz Umrani. Intraretinal cystoid fluid, doi:10.34740/KAGGLE/DS/2277068 (2022).
OpenUrl CrossRef

[34] 33.↵
Ahmed, Z. et al. Deep learning based automated detection of intraretinal cystoid fluid. Int. J. Imaging Syst. Technol. 32, 902–917, doi:10.1002/ima.22662 (2021).
OpenUrl CrossRef

[35] 34.↵
Cervantes-Sanchez, F., Cruz-Aceves, I., Hernandez-Aguirre, A., Hernandez-Gonzalez, M. A. & Solorio-Meza, S. E. Automatic segmentation of coronary arteries in x-ray angiograms using multiscale analysis and artificial neural networks. Appl. Sci. 9, 5507, doi:10.3390/app9245507 (2019).
OpenUrl CrossRef

[36] 35.↵
Ivan Cruz Aceves CIMAT — personal.cimat.mx. http://personal.cimat.mx:8181/~ivan.cruz/DB_Angiograms.html. [Accessed 06-08-2024].

[37] 36.↵
Spahn, C. et al. Deepbacs for multi-task bacterial image analysis using open-source deep learning approaches. Commun. Biol. 5, doi:10.1038/s42003-022-03634-z (2022).
OpenUrl CrossRef

[38] 37.↵
Spahn, C. & Heilemann, M. Deepbacs – escherichia coli bright field segmentation dataset, doi:10.5281/ZENODO.5550934 (2021).
OpenUrl CrossRef

[39] 38.↵
Staal, J., Abramoff, M., Niemeijer, M., Viergever, M. & van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Med. Imaging 23, 501–509, doi:10.1109/tmi.2004.825627 (2004).
OpenUrl CrossRef

[40] 39.↵
DRIVE - Grand Challenge — drive.grand-challenge.org. https://drive.grand-challenge.org/. [Accessed 06-08-2024].

[41] 40.
Van Valen, D. A. et al. Deep learning automates the quantitative analysis of individual cells in live-cell imaging experiments. PLOS Comput. Biol. 12, e1005177, doi:10.1371/journal.pcbi.1005177 (2016).
OpenUrl CrossRef PubMed

[42] 41.↵
DeepCell Datasets — datasets.deepcell.org. https://datasets.deepcell.org/data. [Accessed 06-08-2024].

[43] 42.↵
Lu, Y. et al. The jnu-ifm dataset for segmenting pubic symphysis-fetal head. Data Brief 41, 107904, doi:10.1016/j.dib.2022.107904 (2022).
OpenUrl CrossRef

[44] 43.↵
Jieyun, B. & ZhanHong, O. Pubic symphysis-fetal head segmentation and angle of progression, doi:10.5281/ZENODO.7851338 (2024).
OpenUrl CrossRef

[45] 44.↵
Porwal, P. et al. Indian diabetic retinopathy image dataset (idrid): A database for diabetic retinopathy screening research. Data 3, 25, doi:10.3390/data3030025 (2018).
OpenUrl CrossRef

[46] 45.↵
Prasanna Porwal, S. P. Indian diabetic retinopathy image dataset (idrid), doi:10.21227/H25W98 (2018).
OpenUrl CrossRef

[47] 46.↵
Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), doi:10.1109/isbi.2018.8363547 (IEEE, 2018).
OpenUrl CrossRef

[48] 47.↵
ISIC Challenge — challenge.isic-archive.com. https://challenge.isic-archive.com/data/#2016. [Accessed 07-08-2024].

[49] 48.↵
Tschandl, P., Rosendahl, C. & Kittler, H. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, doi:10.1038/sdata.2018.161 (2018).
OpenUrl CrossRef

[50] 49.
Codella, N. et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic), doi:10.48550/ARXIV.1902.03368 (2019).
OpenUrl CrossRef

[51] 50.↵
ISIC Challenge — challenge.isic-archive.com. https://challenge.isic-archive.com/data/#2018. [Accessed 07-08-2024].

[52] 51.↵
Jha, D. et al. Kvasir-SEG: A Segmented Polyp Dataset, 451–462 (Springer International Publishing, 2019).

[53] 52.↵
Simula Datasets - Kvasir SEG — datasets.simula.no. https://datasets.simula.no/kvasir-seg/. [Accessed 06-08-2024].

[54] 53.↵
Maqbool, S., Riaz, A., Sajid, H. & Hasan, O. m2caiseg: Semantic segmentation of laparoscopic images using convolutional neural networks, doi:10.48550/ARXIV.2008.10134 (2020).
OpenUrl CrossRef

[55] 54.↵
m2caiSeg — kaggle.com. https://www.kaggle.com/datasets/salmanmaq/m2caiseg. [Accessed 07-08-2024].

[56] 55.↵
Verma, R. et al. Monusac2020: A multi-organ nuclei segmentation and classification challenge. IEEE Transactions on Med. Imaging 40, 3413–3423, doi:10.1109/tmi.2021.3085712 (2021).
OpenUrl CrossRef

[57] 56.↵
MoNuSAC 2020 - Grand Challenge — monusac-2020.grand-challenge.org. https://monusac-2020.grand-challenge.org/Data/. [Accessed 07-08-2024].

[58] 57.↵
Morozov, S. P. et al. Mosmeddata: Chest ct scans with covid-19 related findings dataset, doi:10.48550/ARXIV.2005.06465 (2020).
OpenUrl CrossRef

[59] 58.↵
COVID-19 CT scan lesion segmentation dataset — kaggle.com. https://www.kaggle.com/datasets/maedemaftouni/ covid19-ct-scan-lesion-segmentation-dataset. [Accessed 06-08-2024].

[60] 59.↵
Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J. Pathol. Informatics 7, 29, doi:10.4103/2153-3539.186902 (2016).
OpenUrl CrossRef

[61] 60.↵
Yang, L. et al. Nuset: A deep learning tool for reliably separating and analyzing crowded cells. PLOS Comput. Biol. 16, e1008193, doi:10.1371/journal.pcbi.1008193 (2020).
OpenUrl CrossRef

[62] 61.↵
Linfeng Yang. Nuset training dataset/model weights from (nuset: A deep learning tool for reliably separating and analyzing crowded cells), doi:10.5281/ZENODO.3996369 (2020).
OpenUrl CrossRef

[63] 62.↵
Abdi, A. H., Kasaei, S. & Mehdizadeh, M. Automatic segmentation of mandible in panoramic x-ray. J. Med. Imaging 2, 044003, doi:10.1117/1.jmi.2.4.044003 (2015).
OpenUrl CrossRef

[64] 63.↵
Abdi, A. Panoramic dental x-rays with segmented mandibles, doi:10.17632/HXT48YK462.1 (2017).
OpenUrl CrossRef

[65] 64.↵
Ali, S. et al. Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge, doi:10.48550/ARXIV.2202.12031 (2022).
OpenUrl CrossRef

[66] 65.↵
Ali, S. et al. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Med. Image Analysis 70, 102002, doi:10.1016/j.media.2021.102002 (2021).
OpenUrl CrossRef

[67] 66.↵
Litjens, G. et al. Evaluation of prostate segmentation algorithms for mri: The promise12 challenge. Med. Image Analysis 18, 359–373, doi:10.1016/j.media.2013.12.002 (2014).
OpenUrl CrossRef PubMed

[68] 67.↵
Litjens, G. et al. Promise12: Data from the miccai grand challenge: Prostate mr image segmentation 2012, doi:10.5281/ZENODO.8014040 (2023).
OpenUrl CrossRef

[69] 68.↵
Jack, N. P., Thomas, W., Laé Marick & Reyal Fabien. Segmentation of nuclei in histopathology images by deep regression of the distance map, doi:10.5281/ZENODO.1175282 (2018).
OpenUrl CrossRef

[70] 69.↵
Naylor, P., Laé, M., Reyal, F. & Walter, T. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Transactions on Med. Imaging 38, 448–459, doi:10.1109/tmi.2018.2865709 (2019).
OpenUrl CrossRef

[71] 70.↵
Ultrasound Nerve Segmentation — kaggle.com. https://www.kaggle.com/competitions/ultrasound-nerve-segmentation. [Accessed 07-08-2024].

[72] 71.↵
Song, Y. et al. Ct2us: Cross-modal transfer learning for kidney segmentation in ultrasound images with synthesized data. Ultrasonics 122, 106706, doi:10.1016/j.ultras.2022.106706 (2022).
OpenUrl CrossRef

[73] 72.↵
CT2USforKidneySeg — kaggle.com. https://www.kaggle.com/datasets/siatsyx/ct2usforkidneyseg/data. [Accessed 07-08-2024].

[74] 73.↵
Skin Cancer Detection | Vision and Image Processing Lab — uwaterloo.ca. https://uwaterloo.ca/ vision-image-processing-lab/research-demos/skin-cancer-detection. [Accessed 07-08-2024].

[75] 74.↵
Zheng, X., Wang, Y., Wang, G. & Liu, J. Fast and robust segmentation of white blood cell images by self-supervised learning. Micron 107, 55–71, doi:10.1016/j.micron.2018.01.010 (2018).
OpenUrl CrossRef

[76] 75.↵
Acevedo, A., Alférez, S., Merino, A., Puigví, L. & Rodellar, J. Recognition of peripheral blood cell images using convolutional neural networks. Comput. Methods Programs Biomed. 180, 105020, doi:10.1016/j.cmpb.2019.105020 (2019).
OpenUrl CrossRef PubMed

[77] 76.↵
Dietler, N. et al. A convolutional neural network segments yeast microscopy images with high accuracy. Nat. Commun. 11, doi:10.1038/s41467-020-19557-4 (2020).
OpenUrl CrossRef PubMed

[78] 77.↵
Data and Software — epfl.ch. https://www.epfl.ch/labs/lpbs/data-and-software/. [Accessed 07-08-2024].

MedSegBench: A Comprehensive Benchmark for Medical Image Segmentation in Diverse Data Modalities

ABSTRACT

Background & Summary

Methods

Data Preparation

Details

AbdomenUSMSBench

Bbbc010MSBench

Bkai-Igh-MSBench

BriFiSegMSBench

BusiMSBench

CellNucleiMSBench

ChaseDB1MSBench

ChuacMSBench

Covid19RadioMSBench

CovidQUExMSBench

MosMedPlusMSBench

CystoFluidMSBench

Dca1MSBench

DeepbacsMSBench

DriveMSBench

DynamicNuclearMSBench

FHPsAOPMSBench

IdribMSBench

Isic2016MSBench

Isic2018MSBench

KvasirMSBench

M2caiSegMSBench

MonusacMSBench

NucleiMSBench

NusetMSBench

PandentalMSBench

PolypGenMSBench

Promise12MSBench

RoboToolMSBench

TnbcnucleiMSBench

JUltrasoundNerveMSBench

USforKidneyMSBench

UWSkinCancerMSBench

WbcMSBench

YeazMSBench

Data Records

Technical Validation

Baseline methods

Performance Measures

Results

Usage Notes

Data Availability

Code availability

Author contributions statement

Competing interests

Acknowledgements

References

Citation Manager Formats

Subject Area