Biometry and volumetry in multi-centric fetal brain MRI: assessing the bias of super-resolution reconstruction ============================================================================================================== * Thomas Sanchez * Angeline Mihailov * Mériam Koob * Nadine Girard * Aurélie Manchon * Ignacio Valenzuela * Marta Gómez-Chiari * Gerard Martí Juan * Alexandre Pron * Elisenda Eixarch * Gemma Piella * Miguel A. González Ballester * Oscar Camara * Vincent Dunet * Guillaume Auzias * Meritxell Bach Cuadra ## Abstract **Background** Super-resolution reconstruction (SRR) of fetal brain magnetic resonance imaging has the potential to enable the development of new imaging biomarkers to better study *in utero* neurodevelopment. However, potential biases in 2D biometric and 3D volumetric measurements due to different SRR techniques remain understudied. **Purpose** To assess the consistency of biometric and volumetric measurements across three hospitals using three widely used SRR pipelines. **Materials and Methods** This retrospective study used T2-weighted (T2w) fetal brain MRI scans acquired in routine clinical practice at three hospitals. MRIs from each subject were reconstructed with each of the 3 SRR methods. Four experts did biometric measurements on each SRR volume blinded to the method used. Automated 3D volumetry was performed using a state-of-the-art segmentation method. A univariate analysis was first carried out with Friedman tests with post-hoc Wilcoxon rank-sum tests, and results were confirmed in a multivariate analysis accounting for the effect of gestational age and different raters, using a t-distributed generalized additive model. An additional qualitative evaluation was performed to assess how likely clinicians would be to use the current SRR volumes in their practice, and whether they would prefer it to low-resolution T2w acquisitions. Differences were assessed with Friedman tests and post-hoc Wilcoxon rank-sum tests. **Results** 84 healthy subjects were included in three gestational age groups ([21-28): 25.4±1.9, [28-32): 29.3±1.3, [32-36): 33.5±1.2). Statistically significant differences in biometric measurements were found, but consistently remained below voxel width (0.8 mm). Automated 3D volumetry revealed systematic but very small effects (<2.8%). The qualitative evaluation showed systematic differences between SRR methods for the perception of white matter intensity (p=0.02) and sharpness of the image (p=0.01). **Conclusion** Variations in 2D and 3D quantitative measurements did not show any large systematic bias when using different SRR methods for radiological assessment in clinical routine across multiple centers, scanners, and raters. **Summary** Different super-resolution reconstruction methods for fetal brain MRI volumes lead to negligible variations in 2D or 3D quantitative measurements; this may help achieve larger sample sizes in prenatal development studies. **Key Results** * - In this multi-centric retrospective study, 252 super-resolution reconstructions (SRR) scans from 84 healthy subjects showed negligible variations in 2D in biometric measures (below the voxel with of 0.8 mm; p<0.001). * - 3D measurements revealed small variations ranging from 0.8 % in supratentorial tissues (p<0.001) to 2.8% in the extra-cerebral cerebrospinal fluid (p<0.001). * - Clinicians favored having both low resolution and SRR volumes available. ## Introduction Fetal brain Magnetic Resonance Imaging (MRI) is increasingly used as a complement to ultrasound (US) imaging for confirming or ruling out equivocal findings1. Its excellent soft tissue contrast and image resolution enables more accurate measurements of the fetal brain as well as a better parenchymal signal, critical for detecting cortical malformations and subtle white matter anomalies2. Antenatal brain MRI routine assessment combines qualitative morphological evaluation and biometric measurements. In routine clinical practice, fetal brain MRI biometry is performed on T2-weighted (T2w) stacks of two-dimensional slices with 2-5 mm thickness and 0.5-1 mm in-plane resolution, usually acquired following three orthogonal planes. However, fetal and maternal motion can lead to oblique acquisition planes, which, combined with the anisotropic image resolution, can make it difficult to carry out precise biometric measurements. Although some studies have compared measurements done on MRI to US reference values 3–7 used to establish deviation from normality, MRI-based biometric measurements are still not recommended in clinical practice because of the challenge of acquiring a precise slice orientation with MRI. In the past decade, super-resolution reconstruction (SRR) methods8–14 have emerged, allowing the combination of motion-corrupted, low-resolution (LR) T2w series into a high-resolution 3D isotropic volume. These 3D volumes are valuable for fetal brain biometry, since they enable flexible navigation in any plane, facilitating the selection of optimal planes for precise biometric measurements15–17. Moreover, they enable a volumetric (3D) analysis, supported by several automated pipelines8,10,12–14,18. These techniques pave the road towards a more accurate characterization of normal and pathological fetal neurodevelopment using MRI. Early work on SRR 3D volumes have compared the consistency of their biometric measurements with those from US and LR slices 16,19–21. Kyriakopoulou et al.16 used SRR volumes reconstructed using the Slice-to-Volume Reconstruction method8,10 to build normative models of both biometric and volumetric structures. Khawam et al.19 studied the inter-rater reliability between biometric measurements on T2w series and MIALSRTK-reconstructed volumes12,18, while Lamon et al.20 focused on corpus callosum biometry, comparing US, T2w, and SRR volumes reconstructed using MIALSTRK12,18. However, these works relied on a single SRR method, thus its replication with other SRR methods remains to be proven. Recently, Ciceri et al.21 compared for the first time 2D biometry across multiple SRR methods (MIALSRTK12,18, NiftyMIC13, and SVRTK10,22,23), focusing on the 20-21 gestational weeks period. They showed that MIALSRTK and NiftyMIC achieved a good reconstruction success rate and were consistent with T2w series measurements, while SVRTK showed many failed reconstructions and was excluded. However, these works were all limited to mono-centric data, and did not consider whether SRR methods could improve inter-rater reliability or if they introduced systematic biases in quantitative measurements. Ciceri et al21. did not disentangle the effect of data quality from the impact of the SRR algorithm. By conflating the success rate of the compared SRR methods and the quality of the biometric measurements they could not answer the following question: when different SRR methods yield good quality results, will the biometric measurement values remain consistent? Or, framed differently: does the reconstruction process of any SRR method introduce alterations that systematically bias the biometric evaluation, even when the SRR is of good quality? We hypothesized that given high-quality reconstructions, 2D and 3D measurements would be consistent across different SRR methods, but that experts would remain cautious about using SRR reconstructions for clinical assessments, because of alterations in the intensity of the reconstructed image. The purpose of this study was to evaluate the clinical usefulness of SRR and assess whether these methods could introduce artifacts that would systematically bias measurements taken from the reconstructed volumes. ## Materials and methods ### Dataset #### Population Brain MRI examinations were retrospectively collected from ongoing research studies at the three hospitals: Hospital Clínic de Barcelona (Barcelona, Spain), La Timone (Marseilles, France) and Lausanne University Hospital (CHUV, Lausanne, Switzerland). Exclusion criteria included twin pregnancies and any pathology or malformation in the fetal MRI scans. The study received ethical approval from each center’s institutional review board (CHUV: CER-VD 2021-00124, La Timone: Aix-Marseille University N°2022-04-14-003, Hospital Clínic: HCB/2022/0533). Fetal examinations were equally distributed across three gestational age (GA) bins representing different stages of fetal brain development: [21, 28) weeks, [28, 32) weeks and [32, 36) weeks. A flow diagram of included and excluded MRI examinations is shown in Figure 1.a. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/29/2024.09.23.24313965/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/F1) Figure 1. **(a)** Flowchart of our study sample shows inclusion and exclusion. There was a total of 219 pregnant patients who were imaged across three centers. Seventy-four MRI examinations were excluded due to poor-quality reconstruction, resulting in 145 MRI examinations that were annotated and automatically segmented. After selection of subjects in relevant age bins, this resulted in 84 MRI examinations analyzed (27 for ages [21-28) 31 for [28,32) and 26 for [32-36)). **(b)** Distribution of gestational ages across the different sites. **(c)** Design of the study. The subjects are nested within the raters. The raters considered the subjects from their center (NG, AM for La Timone, IV for Hospital Clínic, MK for CHUV) and performed the measurements on every reconstruction for each subject. #### MRI Data Fetal MRI data were acquired with different Siemens scanners (Erlangen, Germany) at 1.5T or 3T across hospitals. The fetal brain MRI protocol included T2w HASTE (Half-Fourier Acquisition Single-shot Turbo spin Echo imaging) sequences acquired in three orthogonal directions (axial, coronal, sagittal). Details on the different MRI acquisition parameters, and number of acquisitions per subject are available in Table 1. View this table: [Table 1.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T1) Table 1. Metadata regarding the acquisition parameters, the gestational ages of participants, the resolution of the T2w series and the number stacks used in the reconstruction algorithm. #### MRI data processing As clinical fetal brain MRI acquisitions feature anisotropic resolution, the data acquired in different orientations are reconstructed into a single, high-resolution volume through SRR methods. Each subject was reconstructed using three widely used SRR toolkits: NeSVoR (v.0.5.0)14, NiftyMIC (v.0.9.0)13, and SVRTK (v.auto-2.2.0)10,22,23. Depending on the hospital, stacks with high levels of motion or signal drops were excluded through visual inspection19 and/or automated quality control24. At La Timone and Hospital Clínic, stacks were processed with non-local means denoising25 and N4 bias field correction26. Each subject was then reconstructed using the default parameters of the three SRR methods, at 0.8mm isotropic resolution. The resulting SRR volumes were aligned to a standard orientation. For poor quality reconstructions, different stacks combinations were tested until the image quality was deemed sufficient by visual assessment (no evident artifacts or errors from registration/reconstruction). If no combination resulted in a sufficiently high-quality reconstruction, the subject was excluded from the study. ### Biometric Measurements Biometric measurements were performed on both LR 2D stacks and 3D SRR volumes using ITK-SNAP (University of Pennsylvania, PA, USA). Measures were performed on each site by medical experts in obstetric and/or pediatric image analysis: IV (5 years of experience) for Hospital Clínic, NG (> 20 years of experience) and AM (5 years of experience) for La Timone and MK (15 years of experience) for CHUV. This resulted in a design where subjects are nested within the raters (Fig. 1.c.). Following established guidelines for fetal brain MRI biometry1,3,16,27, the following measurements were performed: length of the corpus callosum (LCC), height of the vermis (HV), brain and skull biparietal diameters (bBIP, sBIP), and transverse cerebellar diameter (TCD). An example of the measurements on a subject is shown in Figure 2. These measurements were then compared to the reference values obtained by Kyriakopoulou et al.16 ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/29/2024.09.23.24313965/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/F2) Figure 2. 2D measurements guidelines. **(a)** Measurements done on a 31-week-old subject, reconstructed using SVRTK. Axial: brain and skull biparietal diameters (bBIP and sBIP). Sagittal: length of the corpus callosum (LCC) and height of the vermis (HV). Coronal: transverse cerebellar diameter (TCD). **(b)** Automated segmentation using BOUNTI **(c)** Measurements on the T2w stacks. Each column represents a different stack. The stacks were re-oriented for visualization purposes **(d)** Through-plane view of the low-resolution images of (c), showing the thick slices of the LR acquisitions. On the LR stacks, each rater chose the stack best suited (in terms of alignment and image quality) for each measurement. On the 3D SRR volumes, raters had the option to re-align (manual rigid transformation) the images prior to performing the measurements. In total, the four different raters each performed around 550 measurements (5 structures x 4 variants (1 LR + 3 SRR) x 26-29 subjects). #### Automated volumetry Automated volumetric evaluation was carried out on the SRR reconstructed volumes using BOUNTI28, a recent deep learning segmentation method. BOUNTI segments the brain into 19 different regions and was trained on a large corpus of manually segmented brains volumes. An illustration of the segmentations is provided in Figure 2b. In our analysis, we considered five volumetric measurements for which reference values are available16: extra-cerebral cerebrospinal fluid (eCSF), cortical gray matter (cGM), cerebellum (CBM), supratentorial brain tissue (ST) and total lateral ventricles (VT). cGM and CBM measurements were also compared to the growth curves from Machado-Rivas et al.29, which used the methods of Kainz et al.11 to reconstruct the T2w stacks, and automated segmentation with an atlas-based approach15. #### Qualitative assessment We aimed at obtaining expert feedback on the appearance, particularly on the aspects of intensity and visibility, of key anatomical structures used to assess fetal development. Four neuroradiologists (NG, >20 years of experience; AM, 5 years of experience; MG,12 years of experience; MK, 15 years of experience, were asked to qualitatively assess the volumes reconstructed from six subjects using all three SRR methods considered. The subjects were selected to represent different GA bins (26, 28, 29, 30, 32, and 34 weeks) with high quality 3D SRR volumes for all subjects and methods to avoid any bias. In a first round of evaluation, the clinicians visualized all SRR volumes from a given subject and were asked to assess how clearly different structures appeared in the SRR volume. The details of the questions asked, and structures rated are available in supplementary Table S9. In a second stage, raters were asked to compare the SRR volumes from each subject with the corresponding LR stacks of images. They were first asked to rank the three SRR volumes for each subject based on their likelihood of use (with ties allowed). They were then asked to determine whether they would choose the SRR volume over the LR stacks for their clinical assessment, and whether the SRR volume provided more information than the LR stacks for a radiological evaluation. ### Statistical analysis A univariate analysis was initially carried out to assess the influence of the SRR algorithm on the biometric (respectively volumetric) measurements. Due to the non-Gaussian distribution of the data, a Friedman test (the non-parametric equivalent of a repeated measures ANOVA, N=252, degrees of freedom=2) was used to test the difference across SRR methods. We did not apply corrections for multiple comparisons to detect even small statistical effects related to the SRR techniques, as correction would actually make it easier to support our hypothesis. Post-hoc testing was done using pairwise Wilcoxon rank-sum tests, and Bonferroni correction for multiple comparisons was applied at this stage. Effect sizes were reported as ![Graphic][1]. We confirmed these results using multivariate regression to evaluate the impact of SRR on biometric (resp. volumetric) measurements while accounting for covariates. A t-distributed Generalized Additive Model for Scale and Location (GAMLSS)30,31 was fitted with the biometric (resp. volumetric) measurement as the response, the SRR algorithm as the fixed effect of interest, gestational age (GA) as a covariate, rater as a covariate for the biometry only (as the volumetry is computed automatically), and subject as a random effect. The choice of a GAMLSS model over a simpler t-distributed linear mixed effect (LME) model was based on visual inspection of the residual distribution (R function fitdistrplus::descdist) and of the cumulative distribution function (R function DHARMa::simulateResiduals). While both the LME and the GAMLSS had a well-aligned cumulative distribution function, the GAMLSS model showed a less dispersed residual distribution, suggesting more stable estimates. The qualitative analysis relied on a smaller sample. We nonetheless carried out a univariate analysis using a Friedman test (N=72, degrees of freedom=2). When significant results were found, post-hoc analysis testing was done using pairwise Wilcoxon rank-sum tests, with Bonferroni correction for multiple comparisons. All statistical analyses were carried out using the R software (version 4.2.2). To facilitate the analysis of the results, the ratings of AM were used in a confirmatory analysis as part of a supplementary experiment. The analysis then simply has subjects nested within raters. ## Results ### Population After application of the inclusion and exclusion criteria (Figure 1.a.), 252 SRR from 84 healthy fetuses were included: 29 at the Hospital Clínic, 26 at La Timone and 29 at CHUV. The distribution of gestational age is shown in Figure 1.b. and broken down by age bins in Table 1. ### Biometry measurements across SR reconstruction methods Univariate and multivariate statistics are reported in Table 2. There was no significant difference induced by SRR methods on LCC and HV in the univariate analysis, very small effects in the multivariate analysis, –0.2±0.06 mm (p < 0.001) for the NeSVoR-NiftyMIC difference in LCC, −0.09±0.94 (p < 0.05) for the NeSVoR-SVRTK difference in HV. When comparisons yielded statistically significant results, the effect sizes systematically remained small (at most 0.43±0.06 mm for the sBIP), smaller than a 0.1% variation and below the width of a voxel (0.8mm). View this table: [Table 2.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T2) Table 2. Statistical analyses for biometry measurements. Univariate biometry analysis (N= 252, df =2) and multivariate biometry analysis using a t-distributed GAMLSS model. The multivariate analysis also allowed estimating effects related to the raters, which were consistently larger than the SRR effects, but remained small. The effect was at most 1.55 mm for sBIP (2.5% variability). These results were confirmed by an additional, single-site analysis, where two raters annotated the same data (see Supplementary materials). Growth charts are provided in Figure 3 (top row) and in line with the centiles estimated in previous works3,16,32. Further illustration of the different growth curves for the different raters and SRR are provided in Supplementary Figure S1. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/29/2024.09.23.24313965/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/F3) Figure 3. Top row. Biometric measurements as a function of gestational age, for the different SRR methods and raters. The curves and dashed lines represent normative 5th, 50th and 95th centiles from Kyriakopoulou et al.16, except for LCC, where the black curve is from measurements on HASTE acquisitions from Tilea et al. (2009)3 and the red one from ultrasound measurements done by Pashaj et al. (2013)31. **Bottom row.** Volumetric measures as a function of gestational age, for the different SRR methods and sites. The curves and dashed lines represent normative 5th, 50th and 95th centiles from Kyriakopoulou et al.27 and additional blue curves are taken from Machado-Rivas et al.28. #### Brain tissue volumetry Results for automated brain tissue volumetry are provided in Table 3 and show a small but consistent variability between SRR methods, in the order of 1%, except for eCSF, where 2.7% differences were observed between NeSVoR and NiftyMIC. View this table: [Table 3.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T3) Table 3. Statistical analyses for volumetry measurements. Univariate biometry analysis (N= 252, df =2) and multivariate biometry analysis using a t-distributed GAMLSS model. Growth curves for volumetry are provided in Figure 3 (bottom row) and yield values that generally align with previously estimated centiles16, except for the cortical gray matter, which was consistently overestimated compared to Kyriakopoulou et al.16, and underestimated compared to Machado-Rivas et al.29. #### Qualitative feedback on SRR In the first qualitative experiment evaluating the presence and visibility of specific anatomical structures on SRR volumes, clinicians rated most volumes from NeSVoR and NiftyMIC as insufficient for their radiological assessment. While SVRTK images were rated of sufficiently good quality (better quality than NeSVoR, p=0.013), clinicians remained hesitant to use them in a radiological assessment. An excerpt from the results is shown in Table 4A, where we see that while all SRR methods yield good cortical continuity and sharpness, NeSVoR performed poorly on the white matter (layering: SVRTK-NeSVoR=0.5 (p=0.004), intensity: SVRTK-NeSVoR=0.63 (p=0.01), NiftyMIC-NeSVoR = 0.54 (p=0.003)) and is blurrier than SVRTK and NiftyMIC (blurriness: SVRTK-NeSVoR=0.84 (p=0.001), NiftyMIC – NeSVoR =0.62 (p=0.02)), leading to an overall worse perceived quality (quality: SVRTK-NeSVoR=0.63 (p=0.01)). Additional results on the corpus callosum, ventricles, internal capsule and posterior fossa are available in the supplementary material. An example of reconstructions is shown in Figure 4, where the worst and best rated SRR volumes are presented side-by-side, along with the acquired LR stacks in the three orientations. Overall, NeSVoR was often graded lower than SVRTK and NiftyMIC due to alterations introduced by the method in the white matter homogeneity and intensity (Figure 4, subject 1). On the other hand, the best rated volume (Figure 4, subject 2 with SVRTK) has a very clear white matter, with a marked contrast between the white matter and the basal ganglia. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/29/2024.09.23.24313965/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/F4) Figure 4. Example of two subjects (GA=26w and 30w) with in-plane views of three different T2w acquisitions along with the reconstructed volumes. On the top, subject 1 reconstructed with NeSVoR is the worst rated SR volume (global subjective quality = 0) and on the bottom, subject 2 reconstructed with SVRTK is the best rated SR volume (global subjective quality = 1.54). View this table: [Table 4.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T4) Table 4. Top. Subjective structural quality assessment. Scores range between 0 (bad), 1 (acceptable) and 2 (excellent). A single star means that the method is statistically significantly better than the worst performing method of the column. **Bottom.** Qualitative comparison between SR and LR. Scores range from 0 to 2, the first column reflects a ranking, the second refer to whether the clinician would use SRR instead of LR volumes (choose only one), and the last column refer to whether the SRR was judged more suited for their clinical examination than LR. A score of 1 means that SRR is as useful as LR. In the second experiment (Table 4B), the raters ranked the different SRR volumes between each other, and the LR stacks. The results showed that the NeSVoR reconstructions were consistently rated lower than NiftyMIC and SVRTK, with NiftyMIC rated best in this experiment (SRR ranking: NiftyMIC-NeSVoR=0.86 (p=0.004)). When compared to the LR stacks, there was no unanimous preference for SRR volumes over LR images. Experts noted that most of the NiftyMIC and SVRTK volumes were considered usable as LR images but were rather hesitant in using NeSVoR instead of the LR images for their evaluation. ## Discussion Today, advanced image processing techniques such as motion estimation and SRR allow us to freely navigate in 3D into the fetal brain to extract quantitative measurements. The aim of our study was to assess whether different state-of-the-art SRR methods induced systematic biases when reconstructed volumes are used for biometric and volumetric analyses. Results from multi-centric, multi-scanner acquisitions show statistically significant differences in 2D biometry across SRR methods, with differences consistently remaining below the voxel width (0.8 mm). On 3D volumetric measurements, trends are similar, with deviations in the order of 1% (2.5% for eCSF, due to different ways of cropping the brain across SRR methods). While small, the deviations in volumetry are systematic and might be a concern for future fine-grained analyses. Larger deviations from reference growth curves were observed for the cortical gray matter, where even results from Kyriakopoulou et al.16 and Machado-Rivas et al.29 exhibited large variations. This is likely due to differences in reconstruction and segmentation protocols between these two works as well as the data used to train the BOUNTI model28, as variations in the manual delineation of cGM are notoriously hard to control33. Our work supplements the study of Ciceri et al.21, who showed in a more restricted setting (20-21 weeks, mono-centric) the consistency of the measurements done on two SRR methods. Our results are reassuring towards using SRR volumes in clinical practice or leveraging and comparing results from different studies: even if different SRR methods were to be deployed in clinical practice or used in multi-centric studies, biometric and volumetric measurements would remain consistent across sites, thus opening the door to new biomarkers, which cannot be obtained from US or LR stacks. In addition, while SRR could be readily used for quantitative measurements, challenges remain due to the differences introduced by SRR methods (textured noise, intensity variations), which can appear depending on the original resolution settings. In our experiments, this is particularly pronounced in the case of NeSVoR. Therefore, training physicians to distinguish between SR reconstruction artifacts and structural alterations would be paramount when making SRR widely available. Nevertheless, clinicians generally agreed on the benefits of having *both* LR and SRR volumes available. This could help in detecting cortical malformations, as the gyrification is more clearly visible on SRR data since navigating in 3D in SRR data helps to reduce ambiguities caused by the uncontrolled sampling with 2D slices with LR stacks. This work also shows that the true benefits of SRR would be revealed for biometric measurements of structure that require a precise anatomical orientation. This is the case for median structures like the length of the corpus callosum or the height of the vermis. Nevertheless, despite this multi-centric and multi-rater study, our work should be further extended to include a holistic evaluation of the reconstructed volumes, notably including their quality and their ability to reconstruct pathological subjects. This would be necessary to truly assess the potential of these reconstruction methods in clinical settings. Overall, our study indicates that, when comparable 3D SR volumes of sufficient quality are achieved, the choice of SRR method does not introduce large systematic biases in 2D or 3D measurements. ## Data Availability All data produced in the present study are available upon reasonable request to the authors ## Supplementary material ### Intra-rater reliability between LR and SRR biometry measurements #### Materials and methods Intra-rater reliability was evaluated using Lin’s Concordance Correlation Coefficient34. #### Results In Table S1, intra-rater reliability is reported for the three raters considered. CCC is very high for most structures (above 0.9) indicating very strong reliability. The lowest scores (although still high) are obtained for median structures (length of corpus callosum and height of the vermis). There is no major concern that a given SRR method would lead to a decrease in agreement between the SRR and LR. Figure S1 provides a visual comparison with the Pearson correlation coefficient and shows clearly that LCC and HV have more scattered measures compared to bBIP, sBIP and TCD. Moreover, some bias in the measurements can be observed from IV and MK in the LCC, and NG in the HV. This is not surprising given that obtaining precise planes for measurements is challenging in LR stacks. View this table: [Table S1.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T5) Table S1. Lin’s Concordance Correlation Coefficient (CCC) between the LR and SR measurements for each rater. This supplements the results presented in Figure 3. Measurements with CCC below 0.9 are highlighted in blue. ![Figure S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/29/2024.09.23.24313965/F5.medium.gif) [Figure S1.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/F5) Figure S1. Linear regression between the LR and SR measurements for each rater. ### Complete statistical results for volumetry and biometry Tables S2 and S3 contain the univariate and multivariate analyses for the biometry, and Tables S3 and S4 contain the univariate and multivariate analyses for the volumetry experiment. View this table: [Table S2.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T6) Table S2. Statistical analyses for biometry measurements. Univariate analysis N= 252, df =2 View this table: [Table S3.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T7) Table S3. Statistical analyses for biometry measurements. Multivariate analysis using a t-distributed GAMLSS model. View this table: [Table S4.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T8) Table S4. Statistical analyses for volumetry measurements. Univariate analysis (N= 252, df =2) View this table: [Table S5.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T9) Table S5. Statistical analyses for volumetry measurements. Multivariate analysis using a t-distributed GAMLSS model. #### Single-site multi-rater analysis As the data were rated twice at La Timone, this allowed us to carry out a more in-depth, single site analysis, removing potential confounders introduced by the nested design of the study. Tables S6, S7 and S8 respectively show the intra-and inter-rater reliability, the univariate biometric analysis and the multivariate analysis. The results are in line with the ones in the main paper, except that in this mono-centric evaluation, the effect of SRR is non-significant (the effect size remains the same). The only additional result is the inter-rater reliability between AM and NG, which remains very high overall, although it is slightly lower on median structures, especially in LR vermis height. View this table: [Table S6.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T10) Table S6. Intra and inter-rater reliability. Intra-rater reliability was evaluated using Lin’s Concordance Correlation Coefficient (CC) and inter-rater reliability was evaluated using two-way Intraclass Correlation Coefficient (ICC). View this table: [Table S7.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T11) Table S7. Univariate analysis – Single site and two raters - N=156, df =2. A Kruskal-Wallis test was chosen as Friedman test does not allow for replicated measurements. View this table: [Table S8.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T12) Table S8. Multivariate analysis – Single site and two raters – t-distributed GAMLSS model. ### Rater-wise, SRR-wise regression predictions In Figure S2, we present a visual representation of the fits obtained using the data from different raters and the different SRR methods. It shows visually how more variability in the prediction originates from the rater rather than the SRR method. ![Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/29/2024.09.23.24313965/F6.medium.gif) [Figure S2.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/F6) Figure S2. Quadratic fit split by rater (first row), by SRR method (second row) and global trend (third row). This visually illustrates the sources of variability in the fitting from different sources. #### Additional results of the subjective rating experiment View this table: [Table S9.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T13) Table S9. Details of the qualitative ratings asked to the raters in the first stage of the subjective evaluation. ##### Corpus callosum subjective rating For the corpus callosum, all methods led to a good perception of sharpness and thickness. On the substructures (Table S10A), there was a consistent ordering in the rating quality for all methods (rostrum – genu – splenium/body), independently of the reconstruction method used. On the ventricles, internal capsule and posterior fossa (Table S10B), there was also a consistent hierarchy of NeSVoR < NiftyMIC < SVRTK. View this table: [Table S10.](http://medrxiv.org/content/early/2024/09/29/2024.09.23.24313965/T14) Table S10. Subjective structural quality assessment, additional results. **(A)** Assessment of the corpus callosum and the clarity of its substructures on the images. **(B)** Assessment of the ventricles (Is the germinal matrix presence compatible with age; are the cavum septum pellucidum leaves present or absence; is the ventricular wall regular), the internal capsule (Are the basal ganglia (BG) and thalami clearly discernable from the white matter) and the posterior fossa (is the cerebellar foliation clear visible). ## Acknowledgements This work was funded by Era-net NEURON MULTIFACT project (TS: Swiss National Science Foundation grant 31NE30_203977; AM, GA: French National Research Agency, Grant ANR-21-NEU2-0005; IV, EE: Instituto de Salud Carlos III (ISCIII) grant AC21_2/00016, GM, MG, OC, GP: Ministry of Science, Innovation and Universities: MCIN/AEI/10.13039/501100011033/), and the SulcalGRIDS Project, (GA: French National Research Agency Grant ANR-19-CE45-0014). ## Footnotes * - Correction of typos in some author names - Correction of references to SVRTK ## Abbreviations LR : Low-resolution GA : Gestational Age SRR : Super-resolution reconstruction US : Ultrasound T2w : T2-weighted contrast LCC : Length of the corpus callosum HV : Vermis Height bBIP : Brain biparietal diameter sBIP : Skull biparietal diameter TCD : Transverse cerebellar diameter * Received September 23, 2024. * Revision received September 27, 2024. * Accepted September 29, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Prayer D, Malinger G, De Catte L, et al. ISUOG Practice Guidelines (updated): performance of fetal magnetic resonance imaging. Ultrasound Obstet Gynecol. 2023;61(2):278–287. doi:10.1002/uog.26129 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/uog.26129&link_type=DOI) 2. 2.Papaioannou G, Klein W, Cassart M, Garel C. Indications for magnetic resonance imaging of the fetal central nervous system: recommendations from the European Society of Paediatric Radiology Fetal Task Force. Pediatr Radiol. 2021;51(11):2105–2114. doi:10.1007/s00247-021-05104-w [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00247-021-05104-w&link_type=DOI) 3. 3.Tilea B, Alberti C, Adamsbaum C, et al. Cerebral biometry in fetal magnetic resonance imaging: new reference data. Ultrasound Obstet Gynecol. 2009;33(2):173–181. doi:10.1002/uog.6276 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/uog.6276&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19172662&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000263590000010&link_type=ISI) 4. 4.Garel C, Chantrel E, Sebag G. Le Développement Du Cerveau Foetal: Atlas IRM et Biométrie. Sauramps médical; 2000. 5. 5.Mckinnon K, Kendall GS, Tann CJ, et al. Biometric assessments of the posterior fossa by fetal MRI : A systematic review. Prenat Diagn. 2021;41(2):258–270. doi:10.1002/pd.5874 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/pd.5874&link_type=DOI) 6. 6.Cai S, Zhang G, Zhang H, Wang J. Normative linear and volumetric biometric measurements of fetal brain development in magnetic resonance imaging. Childs Nerv Syst. 2020;36(12):2997–3005. doi:10.1007/s00381-020-04633-3 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00381-020-04633-3&link_type=DOI) 7. 7.Dovjak GO, Schmidbauer V, Brugger PC, et al. Normal human brainstem development *in vivo* : a quantitative fetal MRI study. Ultrasound Obstet Gynecol. 2021;58(2):254–263. doi:10.1002/uog.22162 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/uog.22162&link_type=DOI) 8. 8.Jiang, Shuzhou, Xue, Hui, Glover A, Rutherford M, Rueckert D, Hajnal JV. MRI of Moving Subjects Using Multislice Snapshot Images With Volume Reconstruction (SVR): Application to Fetal, Neonatal, and Adult Brain Studies. IEEE Trans Med Imaging. 2007;26(7):967–980. doi:10.1109/TMI.2007.895456 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2007.895456&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17649910&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000247832700008&link_type=ISI) 9. 9.Rousseau F, Kim K, Studholme C, Koob M, Dietemann JL. On Super-Resolution for Fetal Brain MRI. In: Jiang T, Navab N, Pluim JPW, Viergever MA, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2010. Vol 6362. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2010:355–362. doi:10.1007/978-3-642-15745-5_44 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-3-642-15745-5_44&link_type=DOI) 10. 10.Kuklisova-Murgasova M, Quaghebeur G, Rutherford MA, Hajnal JV, Schnabel JA. Reconstruction of fetal brain MRI with intensity matching and complete outlier removal. Med Image Anal. 2012;16(8):1550–1564. doi:10.1016/j.media.2012.07.004 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.media.2012.07.004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22939612&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 11. 11.Kainz B, Steinberger M, Wein W, et al. Fast Volume Reconstruction From Motion Corrupted Stacks of 2D Slices. IEEE Trans Med Imaging. 2015;34(9):1901–1913. doi:10.1109/TMI.2015.2415453 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2015.2415453&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25807565&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 12. 12.Tourbier S, Bresson X, Hagmann P, Thiran JP, Meuli R, Cuadra MB. An efficient total variation algorithm for super-resolution in fetal brain MRI with adaptive regularization. NeuroImage. 2015;118:584–597. doi:10.1016/j.neuroimage.2015.06.018 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuroimage.2015.06.018&link_type=DOI) 13. 13.Ebner M, Wang G, Li W, et al. An automated framework for localization, segmentation and super-resolution reconstruction of fetal brain MRI. NeuroImage. 2020;206:116324. doi:10.1016/j.neuroimage.2019.116324 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuroimage.2019.116324&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31704293&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 14. 14.Xu J, Moyer D, Gagoski B, et al. NeSVoR: Implicit Neural Representation for Slice-to-Volume Reconstruction in MRI. IEEE Trans Med Imaging. Published online 2023. Accessed March 8, 2024. [https://ieeexplore.ieee.org/abstract/document/10015091/?casa\_token=1fizCbzGbYsAAAAA:FFnraRx4YNsVXTPrV7vD9yxT\_Avq7Zsq4RMlOjo1cZIAqBfnXrxlnVP\_v6uEwID2CKIz44XhHLo](https://ieeexplore.ieee.org/abstract/document/10015091/?casa_token=1fizCbzGbYsAAAAA:FFnraRx4YNsVXTPrV7vD9yxT_Avq7Zsq4RMlOjo1cZIAqBfnXrxlnVP_v6uEwID2CKIz44XhHLo) 15. 15.Gholipour A, Rollins CK, Velasco-Annis C, et al. A normative spatiotemporal MRI atlas of the fetal brain for automatic segmentation and analysis of early brain growth. Sci Rep. 2017;7(1):476. doi:10.1038/s41598-017-00525-w [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-017-00525-w&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28352082&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 16. 16.Kyriakopoulou V, Vatansever D, Davidson A, et al. Normative biometry of the fetal brain using magnetic resonance imaging. Brain Struct Funct. 2017;222(5):2295–2307. doi:10.1007/s00429-016-1342-6 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00429-016-1342-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27885428&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 17. 17.Pier DB, Gholipour A, Afacan O, et al. 3D Super-Resolution Motion-Corrected MRI: Validation of Fetal Posterior Fossa Measurements. J Neuroimaging. 2016;26(5):539–544. doi:10.1111/jon.12342 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/jon.12342&link_type=DOI) 18. 18.Tourbier S, De Dumast P, Kebiri H, Hagmann P, Bach Cuadra M. Medical-Image-Analysis-Laboratory/mialsuperresolutiontoolkit: MIAL Super-Resolution Toolkit v2.0.3. Published online December 24, 2020. doi:10.5281/zenodo.5803816 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/zenodo.5803816&link_type=DOI) 19. 19.Khawam M, de Dumast P, Deman P, et al. Fetal Brain Biometric Measurements on 3D Super-Resolution Reconstructed T2-Weighted MRI: An Intra- and Inter-observer Agreement Study. Front Pediatr. 2021;9:639746. doi:10.3389/fped.2021.639746 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fped.2021.639746&link_type=DOI) 20. 20.Lamon S, De Dumast P, Dunet V, et al. Assessment of Fetal Corpus Callosum Biometry by 3D Super-Resolution Reconstructed T2-Weighted MRI. Obstetrics and Gynecology; 2023. doi:10.1101/2023.06.08.23291142 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1101/2023.06.08.23291142&link_type=DOI) 21. 21.Ciceri T, Squarcina L, Pigoni A, et al. Geometric Reliability of Super-Resolution Reconstructed Images from Clinical Fetal MRI in the Second Trimester. Neuroinformatics. Published online June 7, 2023. doi:10.1007/s12021-023-09635-5 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s12021-023-09635-5&link_type=DOI) 22. 22.Uus AU, Hall M, Payette K, et al. Combined Quantitative T2* Map and Structural T2-Weighted Tissue-Specific Analysis for Fetal Brain MRI: Pilot Automated Pipeline. In: Link-Sourani D, Abaci Turk E, Macgowan C, Hutter J, Melbourne A, Licandro R, eds. Perinatal, Preterm and Paediatric Image Analysis. Lecture Notes in Computer Science. Springer Nature Switzerland; 2023:28–38. doi:10.1007/978-3-031-45544-5_3 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-3-031-45544-5_3&link_type=DOI) 23. 23.Uus AU, Neves Silva S, Aviles Verdera J, et al. Scanner-based real-time 3D brain+body slice-to-volume reconstruction for T2-weighted 0.55T low field fetal MRI. Published online April 23, 2024. doi:10.1101/2024.04.22.24306177 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyNC4wNC4yMi4yNDMwNjE3N3YxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDkvMjkvMjAyNC4wOS4yMy4yNDMxMzk2NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 24. 24.Sanchez T, Esteban O, Gomez Y, et al. FetMRQC: an open-source machine learning framework for multi-centric fetal brain MRI quality control. Published online November 8, 2023. doi:10.48550/arXiv.2311.04780 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.48550/arXiv.2311.04780&link_type=DOI) 25. 25.Manjón JV, Coupé P, Martí-Bonmatí L, Collins DL, Robles M. Adaptive non-local means denoising of MR images with spatially varying noise levels. J Magn Reson Imaging. 2010;31(1):192–203. doi:10.1002/jmri.22003 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jmri.22003&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20027588&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 26. 26.Tustison NJ, Avants BB, Cook PA, et al. N4ITK: Improved N3 Bias Correction. IEEE Trans Med Imaging. 2010;29(6):1310–1320. doi:10.1109/TMI.2010.2046908 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2010.2046908&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20378467&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000278535800009&link_type=ISI) 27. 27.Garel C. MRI of the Fetal Brain. Springer Berlin Heidelberg; 2004. doi:10.1007/978-3-642-18747-6 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-3-642-18747-6&link_type=DOI) 28. 28.Uus AU, Kyriakopoulou V, Makropoulos A, et al. BOUNTI: Brain vOlumetry and aUtomated parcellatioN for 3D feTal MRI. Neuroscience; 2023. doi:10.1101/2023.04.18.537347 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMy4wNC4xOC41MzczNDd2MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA5LzI5LzIwMjQuMDkuMjMuMjQzMTM5NjUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 29. 29.Machado-Rivas F, Gandhi J, Choi JJ, et al. Normal Growth, Sexual Dimorphism, and Lateral Asymmetries at Fetal Brain MRI. Radiology. 2022;303(1):162–170. doi:10.1148/radiol.211222 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1148/radiol.211222&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34931857&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 30. 30.Rigby RA, Stasinopoulos DM. Generalized additive models for location, scale and shape. J R Stat Soc Ser C Appl Stat. 2005;54(3):507–554. 31. 31.Stasinopoulos DM, Rigby RA. Generalized additive models for location scale and shape (GAMLSS) in R. J Stat Softw. 2008;23:1–46. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.18637/jss.v023.i11&link_type=DOI) 32. 32.Pashaj S, Merz E, Wellek S. Biometry of the fetal corpus callosum by three-dimensional ultrasound. Ultrasound Obstet Gynecol. 2013;42(6):691–698. doi:10.1002/uog.12501 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/uog.12501&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23649512&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F29%2F2024.09.23.24313965.atom) 33. 33.Valabregue R, Girka F, Pron A, Rousseau F, Auzias G. Comprehensive analysis of synthetic learning applied to neonatal brain MRI segmentation. Hum Brain Mapp. 2024;45(6). doi:10.1002/hbm.26674 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/hbm.26674&link_type=DOI) 34. 34.Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. Published online 1989:255–268. [1]: /embed/inline-graphic-1.gif