Deep learning algorithms for automatic segmentation of acute cerebral infarcts on diffusion-weighted images: Effects of training data sample size, transfer learning, and data features

Yoon-Gon Noh; Wi-Sun Ryu; Dawid Schellingerhout; Jonghyeok Park; Jinyong Chung; Sang-Wuk Jeong; Dong-Seok Gwak; Beom Joon Kim; Joon-Tae Kim; Keun-Sik Hong; Kyung Bok Lee; Tai Hwan Park; Sang-Soon Park; Jong-Moo Park; Kyusik Kang; Yong-Jin Cho; Hong-Kyun Park; Byung-Chul Lee; Kyung-Ho Yu; Mi Sun Oh; Soo Joo Lee; Jae Guk Kim; Jae-Kwan Cha; Dae-Hyun Kim; Jun Lee; Man Seok Park; Dongmin Kim; Oh Young Bang; Eung Yeop Kim; Chul-Ho Sohn; Hosung Kim; Hee-Joon Bae; Dong-Eog Kim

doi:10.1101/2023.07.02.23292150

Abstract

Background Deep learning-based artificial intelligence techniques have been developed for automatic segmentation of diffusion-weighted magnetic resonance imaging (DWI) lesions, but currently mostly using single-site training data with modest sample sizes.

Objective To explore the effects of 1) various sample sizes of multi-site vs. single-site training data, 2) domain adaptation, the utilization of target domain data to overcome the domain shift problem, where a model that performs well in the source domain proceeds to perform poorly in the target domain, and 3) data sources and features on the performance and generalizability of deep learning algorithms for the segmentation of infarct on DW images.

Methods In this nationwide multicenter study, 10,820 DWI datasets from 10 hospitals (Internal dataset) were used for the training-and-validation (Training-and-validation dataset with six progressively larger subsamples: n=217, 433, 866, 1,732, 4,330, and 8,661 sets, yielding six algorithms) and internal test (Internal test dataset: 2,159 sets without overlapping sample) of 3D U-net algorithms for automatic DWI lesion segmentation. In addition, 476 DW images from one of the 10 hospitals (Single-site dataset) were used for training-and-validation (n=382) and internal test (n=94) of another algorithm. Then, 2,777 DW images from a different hospital (External dataset) and two ancillary test datasets (I, n=50 from three different hospitals; II, n=250 from Ischemic Stroke Lesion Segmentation Challenge 2022) were used for external validation of the seven algorithms, testing each algorithm performance vs. manual segmentation gold standard using DICE scores as a figure of merit. Additional tests of the six algorithms were performed after stratification by infarct volume, infarct location, and stroke onset-to-imaging time. Domain Adaptation was performed to fine-tune the algorithms with subsamples (50, 100, 200, 500, and 1000) of the 2,777 External dataset, and its effect was tested using a) 1,777 DW images (from the External dataset, without overlapping sample) and b) 2,159 DW images from the Internal test dataset.

Results Mean age of the 8,661 patients in the Training-and-validation dataset was 67.9 years (standard deviation 12.9), and 58.9% (n = 4,431) were male. As the subsample size of the multi-site dataset was increased from 217 to 1,732, algorithm performance increased sharply, with DSC scores rising from 0.58 to 0.65. When the sample size was further increased to 4,330 and 8,661, DSC increased only slightly (to 0.68 and 0.70, respectively). Similar results were seen in external tests. Although a deep learning algorithm that was developed using the Single-site dataset achieved DSC of 0.70 (standard deviation 0.23) in internal test, it showed substantially lower performance in the three external tests, with DSC values of 0.50, 0.51, and 0.33, respectively (all p < 0.001). Stratification of the Internal test dataset and the External dataset into small (< 1.7 ml; n = 994 and 1,046, respectively), medium (1.7-14.0 ml; n = 587 and 904, respectively), and large (> 14.0; n = 446 and 825, respectively) infarct size groups, showed the best performance (DSCs up to ∼0.8) in the large infarct group, lower (up to ∼0.7) in the medium infarct group, and the lowest (up to ∼0.6) in the small infarct group. Deep learning algorithms performed relatively poorly on brainstem infarcts or hyperacute (< 3h) infarcts. Domain adaptation, the use of a small subsample of external data to re-train the algorithm, was successful at improving algorithm performance. The algorithm trained with the 217 DW images from the Internal dataset and fine-tuned with an additional 50 DW images from the External dataset, had equivalent performance to the algorithm trained using a four-fold higher number (n=866) of DW images using the Internal dataset only (without domain adaptation).

Conclusion This study using the largest DWI data to date demonstrates that: a) multi-site data with ∼1,000 DW images are required for developing a reliable infarct segmentation algorithm, b) domain adaptation could contribute to generalizability of the algorithm, and c) further investigation is required to improve the performance for segmentation of small or brainstem infarcts or hyperacute infarcts.

Competing Interest Statement

Yoon-Gon Noh, Wi-Sun Ryu, and Jonghyeok Park are employees of JLK Inc. Oh Young Bang, Hee-Joon Bae, and Dong-Eog Kim are stockholders of JLK Inc.

Funding Statement

This study was supported by the National Priority Research Center Program Grant (NRF-2021R1A6A1A03038865), the Basic Science Research Program Grant (NRF-2020R1A2C3008295), the Multiministry Grant for Medical Device Development (KMDF_PR_20200901_0098), and the Bioimaging Data Curation Center Program Grant (2022M3H9A2083956) of the National Research Foundation, funded by the Korean government.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Institutional Review Boards of Dongguk University Hospital approved this study.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

Disclosure: Yoon-Gon Noh, Wi-Sun Ryu, and Jonghyeok Park are employees of JLK Inc. Oh Young Bang, Hee-Joon Bae, and Dong-Eog Kim are stockholders of JLK inc.
Author affiliations updated.

Data Availability

All data produced in the present study are available upon reasonable request to the authors.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.