RT Journal Article SR Electronic T1 Deep learning algorithms for automatic segmentation of acute cerebral infarcts on diffusion-weighted images: Effects of training data sample size, transfer learning, and data features JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2023.07.02.23292150 DO 10.1101/2023.07.02.23292150 A1 Noh, Yoon-Gon A1 Ryu, Wi-Sun A1 Schellingerhout, Dawid A1 Park, Jonghyeok A1 Chung, Jinyong A1 Jeong, Sang-Wuk A1 Gwak, Dong-Seok A1 Kim, Beom Joon A1 Kim, Joon-Tae A1 Hong, Keun-Sik A1 Lee, Kyung Bok A1 Park, Tai Hwan A1 Park, Sang-Soon A1 Park, Jong-Moo A1 Kang, Kyusik A1 Cho, Yong-Jin A1 Park, Hong-Kyun A1 Lee, Byung-Chul A1 Yu, Kyung-Ho A1 Oh, Mi Sun A1 Lee, Soo Joo A1 Kim, Jae Guk A1 Cha, Jae-Kwan A1 Kim, Dae-Hyun A1 Lee, Jun A1 Park, Man Seok A1 Kim, Dongmin A1 Bang, Oh Young A1 Kim, Eung Yeop A1 Sohn, Chul-Ho A1 Kim, Hosung A1 Bae, Hee-Joon A1 Kim, Dong-Eog YR 2023 UL http://medrxiv.org/content/early/2023/07/09/2023.07.02.23292150.abstract AB Background Deep learning-based artificial intelligence techniques have been developed for automatic segmentation of diffusion-weighted magnetic resonance imaging (DWI) lesions, but currently mostly using single-site training data with modest sample sizes.Objective To explore the effects of 1) various sample sizes of multi-site vs. single-site training data, 2) domain adaptation, the utilization of target domain data to overcome the domain shift problem, where a model that performs well in the source domain proceeds to perform poorly in the target domain, and 3) data sources and features on the performance and generalizability of deep learning algorithms for the segmentation of infarct on DW images.Methods In this nationwide multicenter study, 10,820 DWI datasets from 10 hospitals (Internal dataset) were used for the training-and-validation (Training-and-validation dataset with six progressively larger subsamples: n=217, 433, 866, 1,732, 4,330, and 8,661 sets, yielding six algorithms) and internal test (Internal test dataset: 2,159 sets without overlapping sample) of 3D U-net algorithms for automatic DWI lesion segmentation. In addition, 476 DW images from one of the 10 hospitals (Single-site dataset) were used for training-and-validation (n=382) and internal test (n=94) of another algorithm. Then, 2,777 DW images from a different hospital (External dataset) and two ancillary test datasets (I, n=50 from three different hospitals; II, n=250 from Ischemic Stroke Lesion Segmentation Challenge 2022) were used for external validation of the seven algorithms, testing each algorithm performance vs. manual segmentation gold standard using DICE scores as a figure of merit. Additional tests of the six algorithms were performed after stratification by infarct volume, infarct location, and stroke onset-to-imaging time. Domain Adaptation was performed to fine-tune the algorithms with subsamples (50, 100, 200, 500, and 1000) of the 2,777 External dataset, and its effect was tested using a) 1,777 DW images (from the External dataset, without overlapping sample) and b) 2,159 DW images from the Internal test dataset.Results Mean age of the 8,661 patients in the Training-and-validation dataset was 67.9 years (standard deviation 12.9), and 58.9% (n = 4,431) were male. As the subsample size of the multi-site dataset was increased from 217 to 1,732, algorithm performance increased sharply, with DSC scores rising from 0.58 to 0.65. When the sample size was further increased to 4,330 and 8,661, DSC increased only slightly (to 0.68 and 0.70, respectively). Similar results were seen in external tests. Although a deep learning algorithm that was developed using the Single-site dataset achieved DSC of 0.70 (standard deviation 0.23) in internal test, it showed substantially lower performance in the three external tests, with DSC values of 0.50, 0.51, and 0.33, respectively (all p < 0.001). Stratification of the Internal test dataset and the External dataset into small (< 1.7 ml; n = 994 and 1,046, respectively), medium (1.7-14.0 ml; n = 587 and 904, respectively), and large (> 14.0; n = 446 and 825, respectively) infarct size groups, showed the best performance (DSCs up to ∼0.8) in the large infarct group, lower (up to ∼0.7) in the medium infarct group, and the lowest (up to ∼0.6) in the small infarct group. Deep learning algorithms performed relatively poorly on brainstem infarcts or hyperacute (< 3h) infarcts. Domain adaptation, the use of a small subsample of external data to re-train the algorithm, was successful at improving algorithm performance. The algorithm trained with the 217 DW images from the Internal dataset and fine-tuned with an additional 50 DW images from the External dataset, had equivalent performance to the algorithm trained using a four-fold higher number (n=866) of DW images using the Internal dataset only (without domain adaptation).Conclusion This study using the largest DWI data to date demonstrates that: a) multi-site data with ∼1,000 DW images are required for developing a reliable infarct segmentation algorithm, b) domain adaptation could contribute to generalizability of the algorithm, and c) further investigation is required to improve the performance for segmentation of small or brainstem infarcts or hyperacute infarcts.Competing Interest StatementYoon-Gon Noh, Wi-Sun Ryu, and Jonghyeok Park are employees of JLK Inc. Oh Young Bang, Hee-Joon Bae, and Dong-Eog Kim are stockholders of JLK Inc.Funding StatementThis study was supported by the National Priority Research Center Program Grant (NRF-2021R1A6A1A03038865), the Basic Science Research Program Grant (NRF-2020R1A2C3008295), the Multiministry Grant for Medical Device Development (KMDF_PR_20200901_0098), and the Bioimaging Data Curation Center Program Grant (2022M3H9A2083956) of the National Research Foundation, funded by the Korean government.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Institutional Review Boards of Dongguk University Hospital approved this study.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors.