Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI
==============================================================================================

* Diego Fajardo-Rojas
* Megan Hall
* Daniel Cromb
* Mary A. Rutherford
* Lisa Story
* Emma Robinson
* Jana Hutter

## Abstract

Preterm birth is associated with significant mortality and a risk for lifelong morbidity. The complex multifactorial aetiology hampers accurate prediction and thus optimal care. A pipeline consisting of bespoke machine learning methods for data imputation, feature selection, and regression models to predict gestational age (GA) at birth was developed and evaluated from comprehensive multi-modal morphological and functional fetal MRI data from 176 control cases and 67 preterm birth cases. The GA at birth predictions were classified into term and preterm categories and their accuracy, sensitivity, and specificity were reported. An ablation study was performed to further validate the design of the pipeline. The pipeline achieves an R2 score of 0.51 and a mean absolute error of 2.22 weeks. It also achieves a 0.88 accuracy, 0.86 sensitivity, and 0.89 specificity, outperforming previous classification efforts in the literature. The predominant features selected by the pipeline include cervical length and various placental T2* values. The confluence of fast, motion-robust and multi-modal fetal MRI techniques and machine learning prediction allowed the prediction of the gestation at birth. This information is essential for any pregnancy. To the best of our knowledge, preterm birth had only been addressed as a classification problem in the literature. Therefore, this work provides a proof of concept. Future work will increase the cohort size to allow for finer stratification within the preterm birth cohort.

**Author summary** Preterm birth is defined as the birth of a baby before the 37th week of pregnancy. It poses a serious risk to the life of a newborn and it is associated with a variety of severe lifelong health problems. Currently, the causes of preterm birth are not completely understood and therefore predicting when a baby will be born prematurely remains a challenging problem. Fetal MRI is an imaging technique that can provide detailed information about the development of the fetus and it is used to support the care of pregnancies at high-risk of preterm birth. Our work combines machine learning techniques with fetal MRI to predict gestational age at birth. The ability to predict this information is crucial for providing adequate care and effective delivery planning. The main contribution of our study is demonstrating that it is possible to make use of all the information obtained from fetal MRI to estimate the delivery date of a baby. To the best of our knowledge, this is the first study to combine machine learning with such a rich data set to produce these important predictions.

## Introduction

Preterm birth is defined as a live birth before 37 completed weeks of gestation [1]. It is estimated that every year 13.4 million babies are born prematurely, corresponding to a global preterm rate of around 9.9% [2]. Prematurity is the leading cause of mortality among children under 5 years accounting for 17.7% of the 5.3 million yearly deaths in this age group [3]. Complications associated with preterm birth are also the leading cause of neonatal mortality, accounting for 36% of these deaths [3]. The chances of survival of preterm babies are directly related to their gestational age (GA) at birth, with survival chances increasing from less than 18% for babies born at 22 weeks to over 95% for babies born at 29 weeks or later [4–6]. Despite advances in perinatal and neonatal care [4–9] survival critically depends on every additional week in-utero.

A continuous rise in survival rate has not translated into a decrease of the short- or long-term morbidity associated with preterm birth [8–10]. Short-term outcomes of premature birth include infections, bronchopulmonary dysplasia, retinopathy, necrotising enterocolitis, and brain disorders [11]. Long-term consequences include an increased risk of neuropsychiatric disorders such as psychosis, neurodevelopmental disabilities such as cerebral palsy and neuromotor dysfunction, adverse sensory outcomes such as hearing and visual impairment, as well as disabilities encompassing learning, cognition, and behaviour [10, 12, 13]. Similar to mortality rates, the incidence and severity of short- and long-term consequences of preterm birth are inversely related to GA at birth [11, 14, 15]. GA at birth is also correlated to social aspects later in life such as income and education level [15].

Reducing the incidence of preterm birth and the impact of its consequences would not only alleviate the burden on individual patients and their families, but also on entire healthcare systems, since the lifetime cost of preterm births in the USA (in 2016) was estimated to be $25.2 billion [16]. Unsurprisingly, a review of the literature on the economic consequences of preterm birth found a prevailing inverse relation between economic costs and GA at birth, regardless of methodology, date, or country of publication [17].

Preterm birth is classified into three subcategories: extremely preterm (less than 28 weeks), very preterm (28 to 32 weeks), and late preterm (32 to 37 weeks) [1], with further categorisation by clinical presentation: medically induced (or iatrogenic) and spontaneous [18]. While maternal and fetal indicators for iatrogenic preterm birth are well characterised and include conditions such as pre-eclampisa and fetal growth restriction (associated with 30.1% of cases) [19, 20], the aetiologies underlying spontaneous preterm birth are complex, varied, and poorly understood [21]. Causes include - but are not restricted to-infection or inflammation, vascular disease (leading to uterine ischaemia), uterine overdistention, and cervical injury. The latter can be a consequence of LLETZ procedures, cervical cone biopsies for abnormal smear tests, and injuries resulting from emergency C-sections in previous pregnancies [19, 22]. However, definitive causes are registered for only 50% [23, 24] of cases. As such, spontaneous preterm birth should more broadly be considered a syndrome resulting from multiple intricate causes [19, 25].

Despite this complexity, several risk factors have been identified [19, 21, 26] (see Table 1) and are useful, both to provide insights and to help identify at-risk women. The wide variety of factors thereby matches the aetiological complexity of preterm birth. Even within the same clinical subtype, some factors can have opposite effects. For example, low maternal body mass index (BMI) is a risk factor for fetal growth restriction but protective against preeclampsia, whereas these roles are reversed for maternal obesity [27].

View this table:
[Table 1.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/T1)

Table 1. Most common risk factors for preterm birth [19, 21, 26].

Currently there are three leading indicators used in clinical practice to identify women at high risk. The strongest predictor is a history of previous preterm birth or cervical surgery or injury (32% chance of recurrent preterm birth) [19, 28]. The other two biomarkers are mid-trimester cervical length below 25mm [28, 29], measured via vaginal ultrasonography; and the presence of more than 50ng/mL fetal fibronectin, a glycoprotein usually absent in cervicovaginal fluid from 18 weeks of gestation and an indicator of choriodecidual disruption. The absence of any of these factors suggests the likelihood of delivering within the following 7 days is only around 1% [19, 28]. These factors have been combined within clinical practice to improve their predictive capabilities [30, 31]. In other analyses, the combination of these predictors also reduced the average cost of high-risk pregnancies. [32, 33].

Another modality with good potential to investigate preterm birth is Magnetic Resonance Imaging (MRI). Fetal MRI is used as a complementary modality to the commonly used ultrasound screening due to its higher resolution, operator independence and suitability for use on women with a higher BMI. It is also non-invasive with no evidence indicating any risk to the fetus or mother [34–36]. Another key advantage of MRI is that it offers multiple complementary contrasts that can support comprehensive functional evaluation of fetal and maternal tissues [37]. Available contrasts include T2-weighted anatomical imaging, T2* relaxometry (which provides an indirect measure of oxygenation [38]), diffusion MRI (which can quantify alterations in tissue microstructure [39–42]), flow measurements and T2 relaxometry. Past studies have largely focused on individual organs such as investigating changes to lung [43], thymus volumes [44], or assessing placental microstructure by measuring T2* and ADC values [45]. One study measured umbilical vein T2 values as a potential marker of intrauterine growth restriction [46]).

While predictive machine learning (ML) models have enjoyed an ever-increasing popularity, preterm birth has only been addressed as a classification problem. Models based on electronic health records, uterine electromyography and transvaginal ultrasound [47] reported accuracies of approximately 0.77 [48–50], while studies based on electrohysterography reported values above 0.94 [51–53], with the latter, however, only including records of women with recorded contractile activity [53, 54].

Machine learning applied to structural MR measurements has been successful at predicting GA at the time of the scan during pregnancy. For example, Convolutional Neural Networks trained on fetal brain MRI have been able to outperform current clinical methods to estimate GA at the time of scan [55, 56]. A different study managed to obtain a mean absolute error of 6.1 days by developing bespoke features from 3D ultrasound and using a regression forest for prediction [57].

For this work, a stacking approach was chosen to predict GA at the time of birth. Stacking is an ensembling technique that consists of combining the predictions of individual base models by training a meta-model [58]. Stacking was introduced by Wolpert in 1992 to improve the predictions and generalisability of individual classification models [59]. In 1996 Breiman [60] showed that stacking was also suitable for regression problems, while in 1999 Ting and Witten [61] generalised the technique further by stacking three different types of base models and exploring different meta-models than the ones used in previous work. Ensemble methods such as stacking have the statistical advantage of reducing the risk of overfitting to the training data by taking into account the predictions of all the base models, as well as the representational advantage of expanding the space of available models by combining the base models into meta-models [62].

In recent years, stacking has been successful at various tasks such as genomic prediction [63], protein interactions prediction [64], or prostate cancer detection [65]. These works take advantage of more recent ML learning models, e.g., Yi et. al. [64] use Support Vector Machines and XGBoost models as part of their base models, while Wang et. al. explore using a Random Forest as their meta-model [65].

The present study combines a uniquely rich MR data acquisition including both anatomical and functional scans of multiple fetal organs, and multimodal MRI of the placenta, with a ML pipeline based on stacking. To the best of our knowledge, this is the first work to leverage the advantages of stacking methods together with a comprehensive multi-modal data set to predict GA at birth.

## Methods

This section contains a detailed outline of the development of the ML pipeline introduced in this work. The pipeline was designed to address the challenges presented by the data. These include: a large number of derived features relative to the number of training examples, data imbalance, and missing data. These problems were addressed through feature selection, balanced training, and feature imputation. Throughout the development of the pipeline, different design options were investigated including changing data threshold for imputation, and models for feature selection and regression. The end product is a meta-model where predictions are stacked to obtain a final predicted GA at birth. The last subsection describes an ablation study, which investigates the impact of each component. Fig 1 illustrates the workflow of the project. The reader is invited to refer to it repeatedly to complement the description that follows.

![Fig 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/18/2024.02.17.24302791/F1.medium.gif)

[Fig 1.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/F1)

Fig 1. Schematic representation of the project pipeline.
The boxes in a darker shade denote steps of the pipeline where different design options were explored. The top part of the figure shows the flow of the pipeline with fixed designed options, while the bottom part explicitly indicates the different design choices available for each step.

### Data

The data set used for this work comprises clinical records, MR data, and parameters manually extracted from ultrasound from 313 singleton pregnancies, acquired as part of four ethically approved studies: 14/LO/1169 (Placenta Imaging Project, Fulham Research Ethics Committee, approval received September 23, 2016), 19-SS-0032 (Inflammation study in pregnancy, South East Scotland Ethics Committee, approval received March 7, 2019), 21/WA/0075 (Congenital Heart Imaging Programme, Wales Research Ethics Committee, approval received March 8, 2021), and 21/SS/0082 (Individualised Risk prediction of adverse neonatal outcome in pregnancies that deliver preterm using advanced MRI techniques and machine learning, South East Scotland Ethics Committee, approval received March 2022). Informed consent was obtained in all instances.

From the 313 cases originally considered, 59 cases were excluded as they were lacking GA at delivery. We also removed 11 cases scanned after 37 weeks, since - in the context of predicting preterm birth - these would bias training of the model. This resulted in a final data set of 243 cases (see S1 Fig).

Recruitment for all the considered studies was opportunistic, with two studies particularly recruiting women at high risk of preterm birth based on obstetric history, ultrasound and biomarker findings. However, the stated difficulty in accurately predicting preterm birth renders this task difficult, and as a result recruitment and thus the data set available is biased towards term birth.

In all experiments data was split into train, validation, and test sets with an 8:1:1 ratio, keeping an equal proportion of term and preterm birth cases in each set.

### Image Acquisition and Processing

Imaging protocols were similar for each study: MR scans were performed on a clinical 3T Philips Achieva scanner between 15 and 40 weeks of gestation using a 32-channel cardiac coil (as is standard process for fetal imaging). For maternal comfort, padding was provided, imaging time was limited to under 90 minutes, and there was frequent verbal interaction and monitoring of heart rate and blood pressure. The protocol included both anatomical T2-weighted imaging and functional MR sequences. For the work presented here only T2* relaxometry among the functional sequences was used.

Anatomical information was acquired with a 2D multi-slice Turbo-Spin-Echo sequence in four to ten planes covering the fetal brain and uterus. Next, to allow for image-based shimming on the 3T scanner, a map of the B0 field was obtained; then shimming was performed for the organ of interest. Afterwards, functional MRI of the entire uterus was performed in coronal orientation using free-breathing multi-echo Gradient Echo with Echo Planar read-out.

In addition to the MRI, two ultrasound scans were performed: an anomaly scan (clinically performed between 19 and 21 weeks of gestation) and a second growth scan (including Doppler ultrasound) that was generally performed within one week of the MRI. In both cases, morphological measurements were manually extracted including abdominal and head circumference, bi-parietal diameter (i.e. the cross-sectional diameter of the skull), femur length, and expected weight. From the growth scan blood flow pulsatility indices were also estimated for the umbilical, uterine and mid-cerebral arteries.

The obtained MR images were processed to obtain quantitative values. For the anatomical data, slice-to-volume reconstruction [66] and learning-based segmentation [67] were applied separately for the brain, body and the placenta. Then regional volumes were calculated.

From the functional data, no motion correction was applied, since all echos for each slice were acquired within 200ms. Using the method described in [45], quantitative T2* values were obtained by fitting the signal of data from subsequent echo times for the entire uterine field of view (FOV). Values over 300ms were clipped to limit partial volume effects following common practise. Segmentation of the placenta, brain and lungs was performed manually. From these segmentations, regional volumes were calculated, as well as the mean, kurtosis, and skewness of their T2* distributions. These data acquisition steps are represented by index 1 in Fig 1.

## Summary of Derived Features

In addition to imaging derived features, demographics, obstetric and medical history of the patients (including previous pregnancies, miscarriages and preterm births) were recorded as well as any relevant information from the current pregnancy such as diagnosis of pre-eclampsia, gestational diabetes, fetal growth restriction or any other fetal or maternal pathology. Finally, the outcomes of the pregnancy were obtained, including gestational age at birth, birth weight-centile and any occurrence of major complications. Collectively the full set of features used by our models was summarised as follows (see S1 Table for more details): 

1.  **Clinical variables**: e.g. number of previous preterm deliveries, and maternal body mass index.

2.  **Structural MRI metrics**: describing sizes of structures e.g. volumes of different brain regions or bi-parietal diameter of the foetal head; these were extracted from both the anatomical and functional scans

3.  **Functional MRI metrics**: statistics derived from T2* distributions of the placenta, brain and lungs.

4.  **Ultrasound metrics**: from both anomaly and growth ultrasounds - manually extracted by a trained sonographer e.g. the fetal head circumference, femur length.

### Feature cleaning

Prior to training it was vital to address the confound effect of gestational age at scan, as well as address the impact of missing data.

### Deconfounding

While GA at scan is a feature that would normally be available in a clinical setting, its impact on any learning model could lead to data leakage (e.g. by acting as a lower bound for GA at delivery). Moreover, as all features change dramatically with age [43, 44, 68, 69], it is necessary to disentangle the dominant effect of GA from more subtle signatures that might robustly predict preterm birth. For these reasons, GA was linearly regressed from all features using the method of internally studentised residuals [70]. See index 2 in Fig 1.

### Data Imputation

There was significant heterogeneity in the availability of features across the data set. Fetal and maternal motion, maternal discomfort, and clerical errors led to loss of data, with different features available for each of the 243 cases. For this reason, a regression-based approach to imputation, known as Multivariate imputation by chained equations (MICE) [71], was investigated (see S1 Algorithm). Following guidance from the literature, ten iterations of the model were performed [71, 72], with two different regression models: weighted *K*-Nearest Neighbours (KNR) [73] and Random Forests [74]. Both models were implemented in the standard way using Sci-kit Learn [75]. Imputation should not be applied to features with arbitrarily large amounts of missing data [72, 76]. Thus the impact of discarding features with more than 30%, 40%, or 50% missing values was investigated (see S2 Table for the missing percentages of each feature). Features with a greater percentage of missing values than the respective threshold were discarded. All remaining features were normalised (mean 0, std 1) afterwards.

Data imputation corresponds to index 3 in Fig 1. The boxes corresponding to this step are emphasised by a darker shade to represent that different options were investigated as part of the pipeline design process. The top part of the figure shows the flow of the pipeline with fixed design choices (e.g. if the choice is made to investigate a pipeline using a RF within the MICE algorithm to impute features with less than 40% of missing data). Conversely, the bottom part of the figure explicitly indicates the design choices that were investigated for this step.

### Training

Training was performed using a stacking approach in which a number of different classes of machine learning model were trained and these were ensembled together through the training of a meta model [59]. Base models consisted of: Random Forests (RF) [74], Support Vector Regression (SVR) [77], and XGBoost [78]. Each was chosen due to unique strengths: RF are interpretable and robust to overfitting [79]; SVR are robust to outliers and well-suited to small data sets [80, 81]; XGBoost offers state-of-the-art performance from sparse data sets [78]. Importantly they are all capable of capturing non-linear relationships but approach regularisation in different ways [80]. This suggests that they will perform differently on boundary cases, to produce diverse predictions that could benefit from ensembling.

Since a key challenge of training models on our data set has been the high number of features relative to examples (see S1 Table), feature selection was also performed to discourage overfitting. Two simple models were explored: linear regression and Random Forests. For each model trained, 10 features were selected. These two different design options are indicated by the boxes with a darker shade with index 4 in Fig 1.

Models were trained using the Sci-kit learn framework [75], with hyperparameters (see S3 Table) optimised using 3-fold cross-validated grid search [82]. The metric used for optimisation was the coefficient of determination (*R*2) [83]. Given fixed design choices on the previous steps, training was carried out every non-empty subset of the selected features. Since there are 1023 non-empty subsets of the ten selected features and 3 regression models, 3069 different regression models were trained in total (Index 5 in Fig 1). These were then composed via the training of a meta-model, for which two different methods were explored: Linear Regression and Random Forests (index 6 in Fig 1). Meta-models were trained on the *m* best performing base models, as validated through their *R*2 score on the validation set. The value of *m* was also optimised using the validation set.

### Ablation Study

An ablation study was conducted to validate the design of the proposed pipeline, with results compared against the best performing meta-model. Since XGBoost models may be trained with incomplete data, and without variance normalisation of the features (since the base learners are decision trees) the first two experiments consist of a single XGBoost model trained on unnormalised data. All experiments are described as follows: 

1.  Out-of-the-box XGBoost: XGBoost without preprocessing.

2.  XGBoost with deconfounding: one XGboost was trained after linear deconfounding of features.

3.  Imputation: all base predictive models were trained with deconfounding and imputation (using the imputation approach used in the best meta-model), without performing any upsampling or feature selection.

4.  Correcting data imbalance: base models were trained with imputation and upsampling preterm cases in the training set.

5.  Combining feature selecting with upsampling: This equates to evaluating the best performing base model, obtained without ensembling.

6.  Meta-model without upsampling: the impact of upsampling in the whole pipeline was explored by turning it off. This is equivelent to the final meta-model without upsampling.

7.  Meta-model: Reporting the performance of the proposed meta-model - obtained from the whole pipeline.

## Results

### Data Exploration

Table 2 shows key demographics, clinical information, and outcomes, divided into preterm and control cohorts. Specifically, the data set consisted of 176 control cases and 67 preterm cases. The distribution of the data according to the four temporal categories was 72.4% term, 14.8% late preterm, 7% very preterm, and 5.8% extremely preterm. Fig 2 shows the distribution of the five continuous features and outcomes included in Table 2, namely GA at scan, maternal BMI at scan, maternal age, GA at birth, and birth weight centile. The pairwise relationship between these is also plotted. For a statistical summary of the data set see S2 Table.

![Fig 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/18/2024.02.17.24302791/F2.medium.gif)

[Fig 2.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/F2)

Fig 2. Data exploration.
Distribution and pairwise relationship between GA at scan, maternal BMI at scan, maternal age, GA at birth, and birth weight centile in the data set.

View this table:
[Table 2.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/T2)

Table 2. Key demographics, clinical information, and outcomes of participants.

### Meta-model

The best performing meta-model achieved an *R*2 score of 0.45 and a mean absolute eror (MAE) of 2.55 weeks on the validation set. This performance was achieved using the following settings in the pipeline. First, features with more than 50% missing values were discarded. Then, features with 50% or less missing values were imputed using the MICE algorithm with a KNR as its regression model and a RF was used for feature selection. In order of importance, the selected features were: 

1.  Cervical length measured from the sagittal plane of MR scan.

2.  Mean whole placental T2* value measured from the MR scan.

3.  End-diastolic flow measured from the growth ultrasound.

4.  T2* brain to placenta ratio measured from the MR scan.

5.  Bi-parietal diameter measured from the anomaly ultrasound.

6.  Placental T2* kurtosis value measured from the MR scan.

7.  Fetal head circumference measured from the anomaly ultrasound.

8.  Brain T2* kurtosis value measured from the MR scan.

9.  Estimated fetal weight at growth ultrasound.

10. Whole brain T2* volume value measured from the MR scan.

Fig 3 a) shows the mean decrease in impurity corresponding to each of these features. This is the metric used by Random Forests to assign importances to each feature [84].

![Fig 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/18/2024.02.17.24302791/F3.medium.gif)

[Fig 3.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/F3)

Fig 3. Feature importances and predictions of the meta-model.
a) Feature importance score of the 10 most important features used to train the base models in the pipeline. b) Predictions made by the meta-model 50-KNR-RF on the test set, colourised according to their true preterm temporal category.

After training RF, SVR, and XGBoost models with every non-empty subset of these 10 features, the 14 models with the highest *R*2 score on the validation set were used as input for a RF meta-model. Specifically, the 14 base models that provided the features used by the meta-model were all SVRs, trained with different subsets of the selected features. In what follows, this meta-model will be referred to by abbreviating its components, i.e. 50-KNR-RF.

The metrics used for evaluation on the test set were the *R*2 score and the mean absolute error (MAE) measured in weeks. The cases were labeled as term (≥ 37 weeks) or preterm (< 37 weeks), according to the GA predicted by the meta-model, and accuracy, sensitivity, and specificity were also reported. On the test set 50-KNR-RF achieved an *R*2 score of 0.51 and a MAE of 2.22 weeks. After labeling each subject as term or preterm according to their predicted GA at birth, the meta-model achieved a 0.88 accuracy, 0.86 sensitivity, and 0.89 specificity. The predictions made by 50-KNR-RF on the test set are depicted in Fig. 3 b). It is worth noting that the model only misclassified one of the preterm cases and two term ones.

### Ablation study

The performance of each of the models resulting from the experiments of the ablation study is reported in Table 3. 50-KNR-RF outperformed every model in the ablation study in every evaluation metric. The best performing model within the experiments that consist of training different versions of the base models (experiments 3), 4), and 5)) was a SVR, which coincides with the type of models used as features for 50-KNR-RF.

View this table:
[Table 3.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/T3)

Table 3. Evaluation of the models in the ablation study.

## Discussion

Comprehensive multi-modal fetal data and ML models combine synergistically to predict GA at delivery. The developed pipeline acknowledges and addresses key challenges such as imbalances and missing features in the data set, both of which are common when investigating preterm birth.

As mentioned in the Introduction, preterm birth had so far only been addressed as a classification problem. The pipeline constructed in this work attempts to make more precise predictions by developing a regression model to predict GA at birth. In turn, these predictions can be classified as term or preterm to compare the performance of 50-KNR-RF to other classification models in the literature. Table 4 shows a comparison between 50-KNR-RF and the best performing models obtained by other recent studies. Out of the 5 models, 50-KNR-RF has the highest accuracy and specificity. The only model that achieves a higher sensitivity than 50-KNR-RF is one of the models developed by by Esty et al. [48]. However, the 0.93 sensitivity achieved by that model comes with the trade-off of having an accuracy and specificity lower than 0.73. In contrast, 50-KNR-RF achieved a score greater than 0.85 in every metric.

View this table:
[Table 4.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/T4)

Table 4. Classification performance of the meta-model obtained by this work and other models from recent studies.

To the best of our knowledge, the only other study on predicting GA at delivery using ML is the one by Heinsalu et al. [85], where they investigated models using a simpler version of the pipeline displayed in this work. Their best performing model achieves an *R*2 score of 0.66 and a MAE of 1.60, but their implementation suffers from data leakage at the imputation stage which makes these results unreliable. Nevertheless, the framework they established is valuable and served as the basis of the present work.

This study provides a proof of concept, but the clinical implementation of a reliable model that could predict the timing of delivery would have important benefits. These include ensuring women are transferred to appropriate neonatal care facilities. A timely transfer helps reduce neonatal mortality and decrease costs [37]. Another crucial example is the targeting of therapies to mitigate the effects of prematurity. Especifically, cortiscosteroids administration can help reduce intra-ventricular haemorraghe and promote lung maturity. The timing of this therapy is highly important, since it works best when administered within a week before delivery, and repeated doses increase the risk of adverse effects such as reduction in birthweight [86].

The results of the ablation study in Table 3 are helpful to understand the importance of every element of the pipeline. In experiments 1) and 2) it can be seen that even though XGBoost has a built-in mechanism to be trained using data sets with missing values, the results achieved with minimal preprocessing steps are among the worst in the study. In experiments 3) and 4) RF, SVR, and XGBoost models were trained using the data set obtained after imputation, without a feature selection selection step, and using upsampling in the case of experiment 4). On average, results in these experiments were better than in experiments 1) and 2) but upsampling did not translate in a significant difference between experiments 3) and 4). Contrastingly, the impact of upsampling in the whole pipeline can be appreciated by comparing the meta-model obtained in experiment 5) and 50-KNR-RF. The only difference between these two experiments is the use of upsampling during training. The sensitivity score of 0.43 obtained by the meta-model in experiment 5) evidences its inadequacy to address data imbalance, while the 0.86 score achieved by 50-KNR-RF shows that upsampling is effective in tackling this problem. Lastly, it can also be observed that 50-KNR-RF generalises well, i.e. it is not overfitting. 50-KNR-RF achieves an *R*2 score of 0.4530 and a MAE of 2.5489 weeks on the validation set and an *R*2 score of 0.5143 and a MAE of 2.2202 weeks on the test set. This is line with the literature, that suggests that ensemble models tend to generalise well [59]. The reason why the SVR model in experiment 6) has lower scores than its counterparts in experiments 3) and 4) in spite of being the best model prior to the ensembling of 50-KNR-RF is because it is not generalising well. It is the best performing model on the validation set, where it achieves an *R*2 score of 0.4878 and a MAE of 2.6418 weeks, but its performance on the test set is significantly worse. Combining the predictions of the 14 best models via 50-KNR-RF helps to solve this problem.

The features selected as the most important are in line with the literature. The importance of cervical length as a predictor in clinical practice is reflected by its consistent use in the models. Placental features obtained from MRI scans were other prominent features, which is in line with the current understanding of the mechanisms leading to iatrogenic preterm birth [45, 87]. The most common clinical indicator, number of previous preterm births, was not a predominant feature. This could be explained by the decision of approaching the problem as a regression instead of a classification one.

While this data set could be considered large given the comprehensive data acquisition, including a fetal MR scan in a cohort of women requiring a high level of medical care, its size is an important limitation for ML methods. The few examples of extremely preterm subjects available during training help explain the poor performance on the test set for this category. Taking into account the performance of 50-KNR-RF in the validation and test sets, the meta-model seems to generalise well. However, the size of the data set makes it hard to predict if the performance of the meta-model would generalise to new data.

Another limitation is the lack of information on the clinical presentation of preterm birth for every patient in the data set. Iatrogenic and spontaneous preterm births have different aetiologies and training separate models for each case could not only yield better predictions, but also help improve the understanding of each clinical presentation by differentiating their most predictive features. Future work will focus on such subgroups and on extracting relevant phenotypes associated with the different types of preterm birth.

Data obtained on a 1.5T and a 0.55T scanner were available for this study. However, these were not included as there is not a straightforward way to extrapolate the signals acquired by scanners with different magnetic field strengths [88]. Future experiments that include these types of data could test the adequacy of the elements of the pipeline, such as the method of internally studentised residuals, to make accurate predictions regardless of field strength.

There are other directions future research can take to expand or improve the methodology presented in this work. The implementation of the models is ready to benefit from larger or more complete data sets. Adding features known for their predictive power, such as quantitative fibronectin measurements, could improve the results. The performance of the meta-model demonstrates that structural and functional information obtained from MRI can be used to predict GA at delivery. An interesting direction is to make predictions directly from the images making use of deep learning techniques, bypassing the problem of missing data and the need of time-consuming measurements made by experienced clinicians. These techniques have been explored to classify preterm and term patients by automatic measurements of cervical length from transvaginal ultrasound [49] and to estimate GA at scan from fetal brain MRI [55, 56].

MRI remains an expensive modality, however, with an increasing use of fetal MRI, the pipeline presented in this study helps to address a question essential for any pregnancy, and can find an application regardless of the indication of the scan. One of the fundamental contributions of this work is that it shows that fetal MR data acquired as part of diagnostic care or research can be used to obtain useful predictions on the GA at delivery, which in turn can inform the care provided to all pregnancies.

## Data Availability

All data code and data are available online at https://github.com/dfajardorojas/ml-for-preterm-birth-

https://github.com/dfajardorojas/ml-for-preterm-birth-

## Author Contributions

**Conceptualization:** Diego Fajardo-Rojas, Emma Robinson, Jana Hutter.

**Data Curation:** Megan Hall, Daniel Cromb, Lisa Story.

**Formal Analysis:** Diego Fajardo-Rojas.

**Investigation:** Diego Fajardo-Rojas.

**Methodology:** Diego Fajardo-Rojas, Emma Robinson, Jana Hutter.

**Software:** Diego Fajardo-Rojas.

**Supervision:** Mary A. Rutherford, Lisa Story, Emma Robinson, Jana Hutter.

**Validation:** Diego Fajardo-Rojas.

**Visualization:** Diego Fajardo-Rojas.

**Writing – Original Draft Preparation:** Diego Fajardo-Rojas.

**Writing – Review & Editing:** Emma Robinson, Jana Hutter.

## Supporting information

View this table:
[S1 Table.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/T5)

S1 Table. Description of the features.
Features and outcomes available in the original data set. The first column is the name of the feature, the second column its type (C = continuous, Cat = categorical, D = discrete), the third column provides a short description, and the last column registers how each feature was acquired: clinical background (CB), clinical outcome (CO), structural MRI (sMRI), functional MRI (fMRI), growth ultrasound (GUS), and anomaly ultrasound (AUS).

View this table:
[S2 Table.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/T6)

S2 Table. Statistical summary of the features.
Statistics of the data set after the first preprocessing steps and before imputation.

View this table:
[S3 Table.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/T7)

S3 Table. Hyperparameters of the base models.
Hyperparameters investigated by the grid search for each of the base models

**S1 Fig. Details of First Preprocessing Steps.** Number of subjects kept after each initial preprocessing step.

![S1 Algorithm.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/18/2024.02.17.24302791/F4.medium.gif)

[S1 Algorithm.](http://medrxiv.org/content/early/2024/02/18/2024.02.17.24302791/F4)

S1 Algorithm. Pseudocode of the MICE algorithm.

## Acknowledgments

The authors acknowledge the invaluable help of the radiographers and midwives while acquiring the data presented here.

*   Received February 17, 2024.
*   Revision received February 17, 2024.
*   Accepted February 18, 2024.


*   © 2024, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/)

## References

1.  1.Organization WH. Preterm birth; 2018. Available from: [https://www.who.int/news-room/fact-sheets/detail/preterm-birth](https://www.who.int/news-room/fact-sheets/detail/preterm-birth).
    
    
2.  2.Ohuma E, Moller AB, Bradley E, Chakwera S, Hussain-Alkhateeb L, Lewin A, et al. National, regional, and global estimates of preterm birth in 2020, with trends from 2010: a systematic analysis. The Lancet. 2023;402:1261–1271. doi:10.1016/S0140-6736(23)00878-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(23)00878-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37805217&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

3.  3.Perin J, Mulick A, Yeung D, Villavicencio F, Lopez G, Strong K, et al. Global, regional, and national causes of under-5 mortality in 2000–19: an updated systematic analysis with implications for the Sustainable Development Goals. The Lancet Child Adolescent Health. 2021;6. doi:10.1016/S2352-4642(21)00311-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2352-4642(21)00311-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34800370&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

4.  4.Ancel PY, Goffinet F, Kuhn P, Langer B, Matis J, Hernandorena X, et al. Survival and Morbidity of Preterm Children Born at 22 Through 34 Weeks’ Gestation in France in 2011: Results of the EPIPAGE-2 Cohort Study. JAMA pediatrics. 2015;169. doi:10.1001/jamapediatrics.2014.3351.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamapediatrics.2014.3351&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25621457&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

5.  5.Santhakumaran S, Statnikov E, Gray D, Battersby C, Ashby D, Modi N. Survival of very preterm infants admitted to neonatal care in England 2008-2014: time trends and regional variation. Archives of disease in childhood Fetal and neonatal edition. 2017;103. doi:10.1136/archdischild-2017-312748.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTM6ImZldGFsbmVvbmF0YWwiO3M6NToicmVzaWQiO3M6MTA6IjEwMy8zL0YyMDgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wMi8xOC8yMDI0LjAyLjE3LjI0MzAyNzkxLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

6.  6.Bell E, Hintz S, Hansen N, Bann C, Wyckoff M, Demauro S, et al. Mortality, In-Hospital Morbidity, Care Practices, and 2-Year Outcomes for Extremely Preterm Infants in the US, 2013-2018. JAMA. 2022;327:248–263. doi:10.1001/jama.2021.23580.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2021.23580&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

7.  7.Blencowe H, Cousens S, Oestergaard M, Chou D, Moller AB, Narwal R, et al. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: A systematic analysis and implications. Lancet. 2012;379:2162–72. doi:10.1016/S0140-6736(12)60820-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(12)60820-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22682464&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000304990900032&link_type=ISI) 

8.  8.Cheong JL, Spittle AJ, Burnett AC, Anderson PJ, Doyle LW. Have outcomes following extremely preterm birth improved over time? In: Seminars in Fetal and Neonatal Medicine. vol. 25. Elsevier; 2020. p. 101114.
    
    
9.  9.Boland R, Cheong J, Doyle L. Changes in long-term survival and neurodevelopmental disability in infants born extremely preterm in the post-surfactant era. Seminars in Perinatology. 2021;45:151479. doi:10.1016/j.semperi.2021.151479.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.semperi.2021.151479&link_type=DOI) 

10. 10.Allen M, Cristofalo E, Kim C. Outcomes of Preterm Infants: Morbidity Replaces Mortality. Clinics in perinatology. 2011;38:441–54. doi:10.1016/j.clp.2011.06.011.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.clp.2011.06.011&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21890018&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

11. 11.Costeloe KL, Hennessy EM, Haider S, Stacey F, Marlow N, Draper ES. Short term outcomes after extreme preterm birth in England: comparison of two birth cohorts in 1995 and 2006 (the EPICure studies). The BMJ. 2012;345.
    
    
12. 12.Vanes L, Murray R, Nosarti C. Adult outcome of preterm birth: Implications for neurodevelopmental theories of psychosis. Schizophrenia Research. 2021;247. doi:10.1016/j.schres.2021.04.007.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.schres.2021.04.007&link_type=DOI) 

13. 13.Jarjour I. Neurodevelopmental Outcome After Extreme Prematurity: A Review of The Literature. Pediatric Neurology. 2014;52. doi:10.1016/j.pediatrneurol.2014.10.027.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.pediatrneurol.2014.10.027&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25497122&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

14. 14.Moore T, Hennessy E, Myles J, Johnson S, Draper E, Costeloe K, et al. Neurological and Developmental Outcome in Extremely Preterm Children Born in England in 1995 and 2006: The EPICure Studies. BMJ (Clinical research ed). 2012;345:e7961. doi:10.1136/bmj.e7961.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNDUvZGVjMDRfMy9lNzk2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzAyLzE4LzIwMjQuMDIuMTcuMjQzMDI3OTEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

15. 15.Moster D, Markestad T. Long-Term Medical and Social Consequences of Preterm Birth. The New England journal of medicine. 2008;359:262–73. doi:10.1056/NEJMoa0706475.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa0706475&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18635431&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000257656600006&link_type=ISI) 

16. 16.Waitzman N, Jalali A, Grosse S. Preterm Birth Lifetime Costs in the United States in 2016: An Update. Seminars in Perinatology. 2021;45:151390. doi:10.1016/j.semperi.2021.151390.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.semperi.2021.151390&link_type=DOI) 

17. 17.Petrou S, Yiu HH, Kwon J. Economic consequences of preterm birth: a systematic review of the recent literature (2009–2017). Archives of Disease in Childhood. 2019;104(5):456–465. doi:10.1136/archdischild-2018-315778.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTI6ImFyY2hkaXNjaGlsZCI7czo1OiJyZXNpZCI7czo5OiIxMDQvNS80NTYiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wMi8xOC8yMDI0LjAyLjE3LjI0MzAyNzkxLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

18. 18.Moutquin JM. Classification and heterogeneity of preterm birth. BJOG : an international journal of obstetrics and gynaecology. 2003;110 Suppl 20:30–3. doi:10.1016/S1470-0328(03)00021-1.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1470-0328(03)00021-1&link_type=DOI) 

19. 19.Goldenberg R, Culhane J, Iams J, Romero R. Epidemiology and Causes of Preterm Birth. Lancet. 2008;371:75–84. doi:10.1016/S0140-6736(08)60074-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(08)60074-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18177778&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000252192600033&link_type=ISI) 

20. 20.Morken NH, Kallen K, Jacobsson B. Fetal growth and onset of delivery: A nationwide population-based study of preterm infants. American journal of obstetrics and gynecology. 2006;195:154–61. doi:10.1016/j.ajog.2006.01.019.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajog.2006.01.019&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16813752&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000238926700024&link_type=ISI) 

21. 21.Frey H, Klebanoff M. The epidemiology, etiology, and costs of preterm birth. Seminars in Fetal and Neonatal Medicine. 2016;21. doi:10.1016/j.siny.2015.12.011.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.siny.2015.12.011&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26794420&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

22. 22.Suff N, Xu VX, Glazewska-Hallin A, Carter J, Brennecke S, Shennan A. Previous term emergency caesarean section is a risk factor for recurrent spontaneous preterm birth; a retrospective cohort study. European Journal of Obstetrics Gynecology and Reproductive Biology. 2022;271:108–111. doi:10.1016/j.ejogrb.2022.02.008.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ejogrb.2022.02.008&link_type=DOI) 

23. 23.Menon R. Spontaneous preterm birth, a clinical dilemma: Etiologic, pathophysiologic and genetic heterogeneities and racial disparity. Acta obstetricia et gynecologica Scandinavica. 2008;87:590–600. doi:10.1080/00016340802005126.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/00016340802005126&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18568457&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000257910700002&link_type=ISI) 

24. 24.Muglia LJ, Katz M. The enigma of spontaneous preterm birth. The New England journal of medicine. 2010;362 6:529–35.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMra0904308&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20147718&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000274397200010&link_type=ISI) 

25. 25.Romero RJ, Espinoza J, Kusanovic JP, Gotsch F, Hassan SS, Erez O, et al. The preterm parturition syndrome. BJOG: An International Journal of Obstetrics & Gynaecology. 2006;113.
    
    
26. 26.Cobo T, Kacerovsky M, Jacobsson B. Risk factors for spontaneous preterm delivery. International Journal of Gynecology Obstetrics. 2020;150:17–23. doi:10.1002/ijgo.13184.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ijgo.13184&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

27. 27.Kramer M, Papageorghiou AT, Culhane J, Bhutta Z, Goldenberg R, Gravett M, et al. Challenges in defining and classifying the preterm birth syndrome. American journal of obstetrics and gynecology. 2011;206:108–12. doi:10.1016/j.ajog.2011.10.864.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajog.2011.10.864&link_type=DOI) 

28. 28.Suff N, Story L, Shennan A. The prediction of preterm delivery: What is new? Seminars in Fetal and Neonatal Medicine. 2018;24. doi:10.1016/j.siny.2018.09.006.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.siny.2018.09.006&link_type=DOI) 

29. 29.Romero R, Dey S, Fisher S. Preterm Labor: One Syndrome, Many Causes. Science (New York, NY). 2014;345:760–765. doi:10.1126/science.1251816.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNDUvNjE5OC83NjAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wMi8xOC8yMDI0LjAyLjE3LjI0MzAyNzkxLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

30. 30.Carter J, Seed P, Watson H, David A, Sandall J, Shennan A, et al. Development and validation of prediction models for the QUiPP App v.2: a tool for predicting preterm birth in women with symptoms of threatened preterm labor. Ultrasound in Obstetrics Gynecology. 2019;55. doi:10.1002/uog.20422.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/uog.20422&link_type=DOI) 

31. 31.Watson HA, Carlisle N, Seed PT, Carter J, Kuhrt K, Tribe RM, et al. Evaluating the use of the QUiPP app and its impact on the management of threatened preterm labour: A cluster randomised trial. PLoS Medicine. 2021;18.
    
    
32. 32.Desplanches T, Lejeune C, Jonathan C, Sagot P, Quantin C. Cost-effectiveness of diagnostic tests for threatened preterm labor in singleton pregnancy in France. Cost Effectiveness and Resource Allocation. 2018;16. doi:10.1186/s12962-018-0106-y.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12962-018-0106-y&link_type=DOI) 

33. 33.Baaren GJ, Vis J, Grobman W, Bossuyt P, Opmeer B, Mol BW. Cost-effectiveness anlysis of cervical length measurement and fibronectin testing in women with threatened preterm labor. American journal of obstetrics and gynecology. 2013;209. doi:10.1016/j.ajog.2013.06.029.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajog.2013.06.029&link_type=DOI) 

34. 34.Tocchio S, Kline-Fath B, Kanal E, Schmithorst V, Panigrahy A. MRI Evaluation and Safety in the Developing Brain. Seminars in perinatology. 2015;39. doi:10.1053/j.semperi.2015.01.002.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.semperi.2015.01.002&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25743582&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

35. 35.Lum M, Tsiouris A. MRI safety considerations during pregnancy. Clinical Imaging. 2020;62. doi:10.1016/j.clinimag.2020.02.007.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.clinimag.2020.02.007&link_type=DOI) 

36. 36.Ray J, Vermeulen M, Bharatha A, Montanera W, Park A. Association Between MRI Exposure During Pregnancy and Fetal and Childhood Outcomes. JAMA. 2016;316:952–961. doi:10.1001/jama.2016.12126.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2016.12126&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27599330&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

37. 37.Story L, Hutter J, Zhang T, Shennan A, Rutherford M. The use of antenatal fetal Magnetic Resonance Imaging in the assessment of patients at high risk of preterm birth. European Journal of Obstetrics Gynecology and Reproductive Biology. 2018;222. doi:10.1016/j.ejogrb.2018.01.014.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ejogrb.2018.01.014&link_type=DOI) 

38. 38.Sørensen A, Hutter J, Seed M, Grant PE, Gowland P. T2*-weighted placental MRI: basic research tool or emerging clinical test for placental dysfunction? Ultrasound in Obstetrics & Gynecology. 2020;55(3):293–302. doi:10.1002/uog.20855.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/uog.20855&link_type=DOI) 

39. 39.Avena-Zampieri CL, Hutter J, Rutherford M, Milan A, Hall M, Egloff A, et al. Assessment of the fetal lungs in utero. American Journal of Obstetrics Gynecology MFM. 2022;4(5):100693. doi:10.1016/j.ajogmf.2022.100693.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajogmf.2022.100693&link_type=DOI) 

40. 40.Lee W, Krisko A, Shetty A, Yeo L, Hassan S, Gotsch F, et al. Noninvasive Fetal Lung Assessment Using Diffusion Weighted Imaging. Ultrasound in obstetrics gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology. 2009;34:673–7. doi:10.1002/uog.7446.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/uog.7446&link_type=DOI) 

41. 41.Slator PJ, Hutter J, Palombo M, Jackson LH, Ho A, Panagiotaki E, et al. Combined diffusion-relaxometry MRI to identify dysfunction in the human placenta. Magnetic Resonance in Medicine. 2019;82(1):95–106. doi:10.1002/mrm.27733.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/mrm.27733&link_type=DOI) 

42. 42.Kristi B A, Ditte N H, Caroline H, Marianne S, Astrid P, Jens B F, et al. Placental diffusion-weighted MRI in normal pregnancies and those complicated by placental dysfunction due to vascular malperfusion. Placenta. 2020;91:52–58. doi:10.1016/j.placenta.2020.01.009.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.placenta.2020.01.009&link_type=DOI) 

43. 43.Story L, Zhang T, Steinweg J, Hutter J, Matthew J, Dassios T, et al. Foetal lung volumes in pregnant women who deliver very preterm: a pilot study. Pediatric Research. 2019;87. doi:10.1038/s41390-019-0717-9.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41390-019-0717-9&link_type=DOI) 

44. 44.Story L, Zhang T, Uus A, Hutter J, Egloff A, Gibbons D, et al. Antenatal thymus volumes in fetuses that delivered ¡32 weeks gestation: An MRI pilot study. Acta Obstetricia et Gynecologica Scandinavica. 2020;100. doi:10.1111/aogs.13983.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/aogs.13983&link_type=DOI) 

45. 45.Hutter J, Slator PJ, Jackson L, Gomes ADS, Ho A, Story L, et al. Multi-modal functional MRI to explore placental function over gestation. Magnetic Resonance in Medicine. 2019;81(2):1191–1204. doi:10.1002/mrm.27447.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/mrm.27447&link_type=DOI) 

46. 46.Zhu MY, Milligan N, Keating S, Windrim R, Keunen J, Thakur V, et al. The hemodynamics of late onset intrauterine growth restriction by MRI. American journal of obstetrics and gynecology. 2015;214. doi:10.1016/j.ajog.2015.10.004.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajog.2015.10.004&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26475425&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

47. 47.Wlodarczyk T, Plotka S, Szczepański T, Rokita P, Sochacki-Wójcicka N, Wójcicki J, et al. Machine Learning Methods for Preterm Birth Prediction: A Review. Electronics. 2021;10:586. doi:10.3390/electronics10050586.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/electronics10050586&link_type=DOI) 

48. 48.Esty A, Frize M, Gilchrist J, Bariciak E. Applying Data Preprocessing Methods to Predict Premature Birth. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2018. p. 6096–6099.
    
    
49. 49.Wlodarczyk T, Plotka S, Trzcinski T, Rokita P, Sochacki-Wójcicka N, Lipa M, et al. Estimation of Preterm Birth Markers with U-Net Segmentation Network. 2019;.
    
    
50. 50.N S P, Pushpalatha M. In: Machine Learning Approach for Preterm Birth Prediction Based on Maternal Chronic Conditions; 2019. p. 581–588.
    
    
51. 51.Chen L, Xu H. Deep neural network for semi-automatic classification of term and preterm uterine recordings. Artificial Intelligence in Medicine. 2020;105:101861. doi:10.1016/j.artmed.2020.101861.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.artmed.2020.101861&link_type=DOI) 

52. 52.Despotović D, Zec A, Mladenovic K, Radin N, Loncar-Turukalo T. A Machine Learning Approach for an Early Prediction of Preterm Delivery; 2018. p. 000265–000270.
    
    
53. 53.Sadi-Ahmed N, Kacha B, Taleb H, Kedir-Talha M. Relevant Features Selection for Automatic Prediction of Preterm Deliveries from Pregnancy ElectroHysterograhic (EHG) records. Journal of Medical Systems. 2017;41. doi:10.1007/s10916-017-0847-8.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s10916-017-0847-8&link_type=DOI) 

54. 54.Fele-Zorz G, Kavsek G, Novak-Antolic Z, Jager F. A comparison of various linear and non-linear signal processing techniques to separate uterine EMG records of term and pre-term delivery groups. Medical biological engineering computing. 2008;46:911–22. doi:10.1007/s11517-008-0350-y.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s11517-008-0350-y&link_type=DOI) 

55. 55.Kojita Y, Matsuo H, Kanda T, Nishio M, Sofue K, Nogami M, et al. Deep learning model for predicting gestational age after the first trimester using fetal MRI. European Radiology. 2021;31. doi:10.1007/s00330-021-07915-9.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00330-021-07915-9&link_type=DOI) 

56. 56.Shen L, Zheng J, Lee E, Shpanskaya K, McKenna E, Atluri M, et al. Attention-guided deep learning for gestational age prediction using fetal brain MRI. Scientific Reports. 2022;12. doi:10.1038/s41598-022-05468-5.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-022-05468-5&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35082346&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

57. 57.Namburete A, Stebbing R, Kemp B, Yaqub M, Papageorghiou AT, Noble J. Learning-Based Prediction of Gestational Age from Ultrasound Images of the Fetal Brain. Medical Image Analysis. 2015;30. doi:10.1016/j.media.2014.12.006.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.media.2014.12.006&link_type=DOI) 

58. 58.Zhou ZH. Ensemble methods: foundations and algorithms. CRC press; 2012.
    
    
59. 59.Wolpert DH. Stacked generalization. Neural Networks. 1992;5(2):241–259. doi:10.1016/S0893-6080(05)80023-1.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0893-6080(05)80023-1&link_type=DOI) 

60. 60.Breiman L. Stacked Regressions. Mach Learn. 1996;24(1):49–64.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF00117832&link_type=DOI) 

61. 61.Ting KM, Witten IH. Issues in Stacked Generalization. Journal of Artificial Intelligence Research. 1999;10:271–289. doi:10.1613/jair.594.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1613/jair.594&link_type=DOI) 

62. 62.Dietterich TG. Ensemble Methods in Machine Learning. In: International Workshop on Multiple Classifier Systems; 2000. Available from: [https://api.semanticscholar.org/CorpusID:56776745](https://api.semanticscholar.org/CorpusID:56776745).
    
    
63. 63.Liang M, Chang T, An B, Duan X, Du L, Wang X, et al. A Stacking Ensemble Learning Framework for Genomic Prediction. Frontiers in Genetics. 2021;12. doi:10.3389/fgene.2021.600040.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fgene.2021.600040&link_type=DOI) 

64. 64.Yi HC, You ZH, Wang MN, Guo ZH, Wang YB, Zhou JR. RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information. BMC bioinformatics. 2020;21(1):60.
    
    
65. 65.Wang Y, Wang D, Geng N, Wang Y, Yin Y, Jin Y. Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Applied Soft Computing. 2019;77:188–204. doi:10.1016/j.asoc.2019.01.015.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.asoc.2019.01.015&link_type=DOI) 

66. 66.Uus A, Zhang T, Jackson LH, Roberts TA, Rutherford MA, Hajnal JV, et al. Deformable slice-to-volume registration for motion correction of fetal body and placenta MRI. IEEE transactions on medical imaging. 2020;39(9):2750–2759.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/tmi.2020.2974844&link_type=DOI) 

67. 67.Uus AU, Kyriakopoulou V, Makropoulos A, Fukami-Gartner A, Cromb D, Davidson A, et al. BOUNTI: Brain vOlumetry and aUtomated parcellatioN for 3D feTal MRI. bioRxiv. 2023; p. 2023–04.
    
    
68. 68.Story L, Davidson A, Patkee P, Fleiss B, Kyriakopoulou V, Colford K, et al. Brain volumetry in fetuses that deliver very preterm: An MRI pilot study. NeuroImage: Clinical. 2021;30:102650. doi:10.1016/j.nicl.2021.102650.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.nicl.2021.102650&link_type=DOI) 

69. 69.Papageorghiou AT, Ohuma EO, Altman DG, Todros T, Ismail LC, Lambert A, et al. International standards for fetal growth based on serial ultrasound measurements: the Fetal Growth Longitudinal Study of the INTERGROWTH-21st Project. The Lancet. 2014;384:869–879.
    
    
70. 70.Cook RD, Weisberg S. Residuals and Influence in Regression. Monographs on statistics and applied probability. Chapman and Hall; 1986. Available from: [https://books.google.co.uk/books?id=aMDpswEACAAJ](https://books.google.co.uk/books?id=aMDpswEACAAJ).
    
    
71. 71.Azur M, Stuart E, Frangakis C, Leaf P. Multiple Imputation by Chained Equations: What is it and how does it work? International journal of methods in psychiatric research. 2011;20:40–9. doi:10.1002/mpr.329.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/mpr.329&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21499542&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

72. 72.Jäger S, Allhorn A, Biessmann F. A Benchmark for Data Imputation Methods. Frontiers in Big Data. 2021;4. doi:10.3389/fdata.2021.693674.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fdata.2021.693674&link_type=DOI) 

73. 73.Bicego M, Loog M. Weighted K-Nearest Neighbor revisited; 2016. p. 1642–1647.
    
    
74. 74.Breiman L. Random Forests. Machine Learning. 2001;45:5–32. doi:10.1023/A:1010950718922.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1010933404324/METRICS&link_type=DOI) 

75. 75.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1524/auto.2011.0951&link_type=DOI) 

76. 76.Bertsimas D, Pawlowski C, Zhuo Y. From predictive methods to missing data imputation: An optimization approach. Journal of Machine Learning Research. 2018;18:1–39.
    
    
77. 77.Smola A, Schölkopf B. A tutorial on support vector regression. Statistics and Computing. 2004;14:199–222. doi:10.1023/B
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/B&link_type=DOI) 

78. 78.Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. CoRR. 2016; abs/1603.02754.
    
    
79. 79.Gzar DA, Mahmood AM, Abbas MK. A Comparative Study of Regression Machine Learning Algorithms: Tradeoff Between Accuracy and Computational Complexity. Mathematical Modelling of Engineering Problems. 2022;9(5):1217–1224. doi:10.18280/mmep.090508.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.18280/mmep.090508&link_type=DOI) 

80. 80.Fernández-Delgado M, Sirsat MS, Cernadas E, Alawadi S, Barro S, Febrero-Bande M. An extensive experimental survey of regression methods. Neural Networks. 2019;111:11–34. doi:10.1016/j.neunet.2018.12.010.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neunet.2018.12.010&link_type=DOI) 

81. 81.Kinaneva D, Hristov G, Kyuchukov P, Georgiev G, Zahariev P, Daskalov R. Machine Learning Algorithms for Regression Analysis and Predictions of Numerical Data. In: 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA); 2021. p. 1–6.
    
    
82. 82.Krstajic D, Buturovic L, Leahy D, Thomas S. Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics. 2014;6:10. doi:10.1186/1758-2946-6-10.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1758-2946-6-10&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24678909&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

83. 83.Casella G, Berger RL, Company BP. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Thomson Learning; 2002. Available from: [https://books.google.co.uk/books?id=0x\_vAAAAMAAJ](https://books.google.co.uk/books?id=0x_vAAAAMAAJ).
    
    
84. 84.Nembrini S, König I, Wright M. The revival of the Gini Importance? Bioinformatics (Oxford, England). 2018;34. doi:10.1093/bioinformatics/bty373.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bty373&link_type=DOI) 

85. 85.Heinsalu R, Williams L, Ranjan A, Avena Zampieri C, Uus A, Robinson E, et al. In: Predicting preterm birth using multimodal fetal imaging. Springer; 2021.
    
    
86. 86.Story L, Simpson N, David A, Z Z, Bennett P, Jolly M, et al. Reducing the Impact of Preterm Birth: Preterm Birth Commissioning in the United Kingdom. European Journal of Obstetrics Gynecology and Reproductive Biology: X. 2019;3. doi:10.1016/j.eurox.2019.100018.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.eurox.2019.100018&link_type=DOI) 

87. 87.Purisch S, Gyamfi C. Epidemiology of preterm birth. Seminars in Perinatology. 2017;41. doi:10.1053/j.semperi.2017.07.009.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.semperi.2017.07.009&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom) 

88. 88.Garcia-Eulate R, Garcia-Garcia D, Domínguez P, Noguera J, Luis E, Rodriguez-Oroz M, et al. Functional bold MRI: Advantages of the 3 T vs. the 1.5 T. Clinical imaging. 2011;35:236–41. doi:10.1016/j.clinimag.2010.07.003.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.clinimag.2010.07.003&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21513865&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F18%2F2024.02.17.24302791.atom)