Abstract
Objective Evaluate clinical utility and cost effectiveness of identifying pregnancies at increased risk of preterm birth using a validated proteomic biomarker risk predictor to enable proactive intervention
Study Design Pregnancies at elevated risk (≥ 15%) of preterm birth were identified in a cohort from TREETOP (NCT02787213), a study independent of biomarker development. In the screening arm, higher-risk subjects received simulated interventions based on published efficacy of multimodal treatment or care-management alone. Subjects in the non-screening arm received no interventions. Neonatal and maternal length of stay, neonatal mortality and morbidity and neonatal costs were compared between arms.
Results Multimodal/care-management modeled treatments predicted reductions in neonatal (30%/22%) and maternal (9.2%/8.5%) hospital stays, neonatal morbidity and mortality (41%/29%), and neonatal costs (34%/16%) for the screening vs. non-screening arm.
Conclusion Modeled interventions applied to pregnancies identified as higher-risk by a proteomic biomarker risk predictor demonstrate clinically and economically meaningful improvements in neonatal and maternal outcomes.
Introduction
Preterm birth (PTB) occurs in approximately 10% of all births and is the leading cause of neonatal death in the United States (1, 2). In addition to neonatal morbidity and mortality, the economic impact of PTB on the healthcare system is enormous and estimated to be greater than 25 billion U.S. dollars annually (3). Direct medical care accounts for the majority of these costs which are significantly higher for individuals who are delivered at earlier gestational ages (GA). The effective application of interventions such as progesterone, cervical cerclage and intensive case management to ameliorate PTB and its sequelae requires tools to identify pregnancies at risk. Clinical factors used to assess the risk of PTB are poorly predictive and are present in only a small minority of pregnancies. A history of previous PTB is a traditional predictor of recurrent PTB but is seen in only approximately 4% of all pregnancies and predicts only 11% of all spontaneous PTBs (sPTBs) (4, 5). Similarly, a short cervical length measured by transvaginal ultrasound is a widely used predictor of sPTB, but accounts for only 2% of all pregnancies and an additional 6% of all sPTBs beyond those who are identified by a history of prior preterm delivery (6, 7). Biomarker-based risk stratification tools offer an important solution to predict and potentially ameliorate PTB and its sequelae. Recent years have seen the onset of precision diagnostic tools that enable the targeting of interventions. Of particular interest here are tools that predict PTB in women with no apparent risk well in advance of delivering in an effort to improve outcomes (8). Saade and colleagues described and clinically validated a novel serum proteomic biomarker risk predictor for sPTB based on a combination of the expression ratio of insulin-like growth factor-binding protein 4 (IBP4, gene symbol IGFBP4) to sex hormone-binding globulin (SHBG) and clinical variables using samples from the Proteomic Assessment of Preterm Risk (PAPR) study (NCT01371019) (9). In a follow up study, the proteomic biomarker risk predictor was observed to be predictive of medically indicated PTB (10). Additionally, the mass spectrometry assay measuring the protein ratio IBP4/SHBG in pregnant serum has been analytically validated (11) and a threshold at which the probability of sPTB <37 weeks of gestation is at least twice the baseline rate has been established (9).
In addition to clinical and analytical validation, establishing the clinical and economic utility of a test is important to determine its value in clinical practice for all healthcare stakeholders including patients, physicians, healthcare workers and insurers (12). Studies of clinical utility are frequently classified as those establishing efficacy or those establishing comparative effectiveness (13-16). Studies of efficacy establish a causal link between a novel protocol and positive outcomes, typically in a highly controlled and less generalizable setting or population. Randomized control trials (RCTs) are the gold standard for establishing efficacy. Comparative effectiveness studies, in contrast, establish a correlative, but not causal, link between the test and clinical outcomes in more generalizable settings or populations, and so, provide ‘real world’ evidence of clinical utility. Ideally, a predictive test has evidence of both efficacy and comparative effectiveness. Recent years have seen much work on the evolution of clinical utility thinking and application (14-16).
The purpose of the current analysis, named ACCORDANT (Analyses aCross COngruent studies ReDucing Adverse pregNancy ouTcomes), is to determine the clinical and economic utility of the proteomic biomarker risk predictor when used as a screening test for a low-risk pregnant population in improving neonatal outcomes when coupled with established interventions (17-24), such as progesterone and care-management. The ACCORDANT study utilized a cohort of pregnancies from the TREETOP study (25), which is independent from sources of biomarker development and intervention efficacy measurement. TREETOP was an observational study including only standard care and not previously utilized in modeling studies. In contrast to pure modeling studies, the TREETOP subjects were all assigned to higher or lower risk of preterm delivery < 37 weeks of gestation risk based on actual proteomic biomarker mass spectrometry measurements performed as samples were collected. Modeled interventions were applied to higher-risk pregnancies in the ACCORDANT screening arm. The effectiveness of ACCORDANT’s test-and-treat approach was quantified in terms of reductions in neonatal hospital length of stay, neonatal morbidity and mortality, maternal hospital length of stay, and neonatal costs. Furthermore, the relative impact on neonatal outcomes of two interventions, namely multimodal intervention and care-management alone, is assessed.
Methods
Study design overview
An overview of the ACCORDANT simulation study is depicted in Figure 1. Subjects originate from the Multicenter Assessment of a Spontaneous Preterm Birth Risk Predictor (TREETOP) study (NCT02787213) (25). Subjects are randomized into screening and non-screening arms. Patients in the screening arm identified as higher-risk have proteomic biomarker risk predictor results, determined in the TREETOP study, that are at or above a pre-defined 15% risk of sPTB < 37 weeks of gestation. Higher-risk patients receive simulated treatment with either of two intervention models (described in detail below) resulting in shifts in gestational age at birth. For higher-risk patients with shifted GA at birth, outcomes are calculated according to their new GA at birth. Unscreened and lower risk screened patients retained their respective original GA at birth observed in the TREETOP study. The two arms of the study are then compared to assess differences in neonatal and maternal length of hospital stay, neonatal morbidity and mortality index and associated neonatal costs. To estimate variability in outcomes, the study was repeated 500 times, applying variation to GA shifts in each iteration. These sources of variation and all assumptions for implementation of the study are detailed in Table 1. The routing of all subjects to each arm allows for a demographically unbiased comparison of the two arms.
The ACCORDANT analysis thus has the following components, each of which are detailed below: TREETOP study subjects randomized into two arms; application of the proteomic biomarker risk predictor to TREETOP subjects to identify those with preterm birth risk ≥ 15%; and estimation of outcomes/costs resulting from the above steps in the two simulated study arms (Figure 1).
TREETOP Study subjects consist of a randomly selected subset (14 sites, n=847) of 34% of all TREETOP subjects who underwent serum sample collection in the protein ratio’s validated GA window (19 1/7 – 20 6/7 weeks) as reported in Markenson, et al. (25). These subjects were randomly selected by a third-party statistician for analysis and reflect the clinical and demographic factors of the overall study. We note that the non-selected subjects in TREETOP remain blinded for future studies. All TREETOP clinical data were recorded electronically, monitored centrally on site and were subject to source document verification as pre-specified. Neonatal outcomes were collected through 28 days of life.
Application of the proteomic biomarker risk predictor
The proteomic biomarker measurements were assayed on all TREETOP serum samples as they were collected, using a standardized lab process previously validated and documented (11), consistent with clinical intended use. A previously validated probability threshold (34) was used to assign higher and lower preterm risk. As per Saade et al. (9), proteomic measurements were translated into risk predictions for each subject and a threshold corresponding to 15% risk of sPTB < 37 weeks of gestation was employed to stratify higher vs. lower risk in each pregnancy. Higher-risk subjects were routed for intervention (see intervention models below).
The intervention models
Intervention models provided the means and standard deviations of shifts in GA at birth with treatment relative to a subject’s untreated GA at birth. Specifically, for each treated subject in TREETOP, her original GA at birth is shifted by an amount selected randomly from the normal distribution defined by the mean and standard deviation of the change for the GA at birth with treatment that is expected for pregnancies with the untreated GA at birth observed for that subject in TREETOP. Two intervention models are assessed in this analysis: multimodal (17) and care-management (18, 26, 27).
The multimodal intervention model is based on the observed effect size of the combination of 17-α-hydroxyprogesterone caproate and care-management applied to higher-risk pregnancies as determined by the proteomic biomarker risk predictor in the Prediction and Prevention of Preterm Birth (PREVENT-PTB, NCT03530332) randomized control trial (17).
The care-management intervention model is based on the combination of three controlled studies (18, 26, 27) of the effect of care-management applied to higher-risk pregnancies as determined by clinical and demographic data. The shifts in GA at birth for the care-management intervention model, as indexed by untreated GA at birth, were derived using the metafor R package (35).
Intervention effects were formalized by the following two functions defining means and standard deviations, respectively, of shift in GA at birth. For each observation x, where x is the untreated GA at birth in weeks, calculation using the functions shown establishes sigmoid curves relating untreated GA at birth to the magnitude (mean) and variability (standard deviation) of shift in GA at birth with intervention: and where max_m, max_SD are the upper asymptotes; mid_m, mid_SD are the inflection points; and rate_m, rate_SD are the slopes of the three parameter sigmoid functions, respectively. Here m signifies mean, SD signifies standard deviation. Table 1 specifies the numerical values for coefficients used in the two functions above for calculating the means and standard deviations of shifts in GA at birth representing the multimodal and care-management intervention models. For illustration, magnitudes of shift in GA at birth with intervention are presented in Table 2 for weeks 28 to 36.
Derivation of outcomes and costs
We describe the derivation of neonatal and maternal length of stay, neonatal morbidity and mortality, and cost for each subject in the screening and non-screening arm of this analysis. For all subjects, these four outcomes are either within error of the values observed in the TREETOP study if unaffected by intervention, mortality or missingness; or are assigned as a function of GA at birth.
Derivation of neonatal and maternal length of hospital stay
Information from two datasets, Beam (29) and Phibbs (28), was used as the source of post-intervention lengths of stay, providing generalizability as these datasets include over 750,000 and 1,000,000 pregnancies, respectively. In addition, because TREETOP truncated collection of neonatal length of stay information for each subject at 28 days with information missing for some subjects, missing or truncated TREETOP neonatal length of stay information was augmented using information from Beam or Phibbs. Missing/truncated values and values altered by interventions were calculated by randomized sampling from these datasets based on distributions defined by the published mean and variance estimates using the method of moments and confirmed by nonparametric estimates also supplied and by distributions in TREETOP. The published neonatal length of stay data and thus also the derived distributions were indexed by GA at birth, allowing for unbiased assignment of neonatal length of stay to each treated and untreated subject in the study based on subject GA at birth. Based on the observation in Beam that late preterm distributions are differently shaped than early preterm distributions, we applied distinct distributions to early and late preterm GAs at birth (see Table 1). Length of stay error with respect to TREETOP observations was set at ±1 day. Derivation of maternal length of stay was conducted in the same manner except only using the Phibbs dataset (maternal length of stay was not available in the Beam dataset).
Derivation of neonatal morbidity and mortality
The assessment of neonatal morbidity and mortality was based on an established scoring system (6). Affected infants were assigned a morbidity and mortality index score that increases from 1 (mild) to 2 (moderate) to 3 (severe) for each additional diagnosis of respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage grade III or IV, all stages of necrotizing enterocolitis, periventricular leukomalacia or proven severe sepsis; with a score of 4 assigned to perinatal mortality. The scale uses hospital stays to determine index scores if the length of stay gives a higher score than concomitant diagnoses: 1-4 days give a score of 1, 5-20 days a score of 2 and >20 days a score of 3. For example, level 1 includes those neonates with at most one adverse outcome and up to four days of hospital stay. While the published scale accounts for neonatal intensive care unit stays, here total length of neonatal hospital stay is used instead to leverage the universality of hospital admission and greater simplicity of calculation of hospital length of stay vs. level of care. Subjects in this analysis are assigned an initial neonatal morbidity and mortality index based on TREETOP observations. If the subject’s GA at birth changes, the neonatal morbidity and mortality score for that subject is updated probabilistically in accordance with multinomial distributions indexed by GA at birth, shown in Supplemental Table 1. As the neonatal morbidity and mortality index is an ordinal scale without numeric error, modeled neonatal morbidity and mortality was restricted to the original observation in untreated subjects and to values no greater than the original observation in treated subjects.
Estimation of neonate cost
Neonate costs were not collected in the TREETOP study and are wholly derived from the Phibbs and Beam cost datasets for each subject in this analysis in either arm of the study. Costs in these datasets are indexed by GA at birth, as is neonatal length of stay, and so can be derived from neonatal length of stay as a mean cost per day of stay. To maintain the known tight association of cost and length of stay and to utilize the variance already incorporated into the neonatal length of stay calculation, the neonate cost is obtained by first determining the neonatal length of stay for that subject (as defined above) and then determining the cost from the conversion factors indexed by GA at birth in the Phibbs and Beam datasets, respectively. Those subjects who underwent intervention were also assigned an additional cost from weeks 22 to 35 for progesterone treatment and/or care-management (see Table 1 for details).
Comparison of study arms and statistical methods
Statistical analyses were performed in R (3.5.1 or higher, MRAN). As this study is exploratory as per ISPOR (Professional Society for Health Economics and Outcomes Research) classification, p-values <0.05 were considered significant.
Changes in neonatal and maternal lengths of stay due to treatment were assessed by time-to-event analysis. While multiple approaches give similar results, we report Cox proportional hazards regression p-values for the main effect of treatment arm, either in the study as a whole or in the top 10% of subjects per arm.
Significance of changes in the number of subjects in each neonatal morbidity and mortality level was assessed using a one-sided Fisher’s Exact test pre-specified for reduction in higher levels of neonatal morbidity and mortality.
As cost data are not normally distributed (cost exponentially increases as GA at birth decreases) and no changes are expected in the large majority of babies who are born healthy and at term, changes in cost with treatment were assessed by a bootstrap test of total costs across the 500 iterations, either in each arm as a whole or in the top 10% of neonates per arm by cost.
Results
The sub-cohort of the TREETOP population used in this study is described in Markenson et al. (25), and consisted of 847 subjects enrolled from 14 sites in the United States. Of the 847 subjects, 36% were identified as higher-risk (≥ 15%) of sPTB < 37 weeks of gestation by the proteomic biomarker risk predictor. Here we summarize the impact of two modeled interventions (multimodal, care-management) on important clinical and economic outcomes on this cohort of 847 pregnancies as selected by the proteomic biomarker risk predictor. Application of the intervention models to the higher-risk subjects in the screening arm resulted in an average prolongation of gestation in the arm as a whole of under a day (0.8 multimodal / 0.3 care-management). However, as this average prolongation of gestation is dominated by term deliveries, we also examined the average prolongation of gestation modeled for the cohort of pregnancies in the bottom 10% of GA at birth in each arm. For this cohort the average prolongation of gestation is 8 days with the multimodal intervention and 3 days with the care-management intervention. We then examined the impact of these GA shifts on neonatal length of hospital stay, maternal length of hospital stay, neonatal morbidity and mortality, and neonate costs in the TREETOP population.
Tables 3 and 4 detail the impact of the two intervention models on the above outcomes for the TREETOP population. Across all subjects, using the Phibbs and Beam datasets, neonatal length of stay was reduced by 26% and 30% (multimodal) and by 19% and 22% (care-management), respectively. For subjects in the top 10% of neonatal length of hospital stay, neonatal length of stay was reduced by 46% and 47% (multimodal) and by 33% and 34% (care-management), respectively for the two datasets. In all cases, regardless of intervention model and outcome database used, the reductions were significant (Table 3).
Across all subjects, maternal length of stay was reduced 9% (multimodal and care-management) using the Phibbs outcome database. For subjects in the top 10% of neonatal length of hospital stay, maternal length of stay was reduced 17% (multimodal) and 16% (care-management). In all cases, regardless of intervention model, the reductions were significant (Table 3).
Across all subjects, neonatal costs were reduced by 34% and 26% with savings of $1,800 and $3,200 per pregnancy screened (multimodal) and 16% with a savings of $880 and $1,900 per pregnancy screened (care-management), respectively for the Phibbs and Beam databases. The two models differed in cost reduction with the Beam (commercial payer data) cost reductions being approximately double the Phibbs cost reduction. Reductions of neonatal costs by the multimodal intervention were significant (Table 3).
Change in the neonatal morbidity and mortality index upon intervention was significant in reduction of the proportion of neonates with scores at or above level 2 (multiple negative outcomes and/or extended hospital stay) versus level 1 (at most 1 negative outcome and up to 4 days hospital stay), for both intervention models. Reductions in the proportion of neonatal morbidity and mortality level 2 or higher were 41% (multimodal) and 29% (care-management) respectively (Table 4).
Discussion
Preterm birth remains the most compelling issue in obstetrics and neonatology, creating adverse short and long-term outcomes for newborns. In efforts at identification and treatment of at-risk mothers-to-be with early interventions (progesterone, cerclage, and others) and acute therapies in symptomatic patients (tocolytics, corticosteroids, magnesium sulfate for neuroprotection and antibiotics), acute maternal administration of antenatal corticosteroids and magnesium sulfate have been demonstrated to improve outcomes (36). The overall frequency of premature delivery has not been materially reduced in the US. The strategy most used currently for lowering the rate of premature delivery with resultant decreases in serious neonatal morbidities is identification of specific risk factors or specific higher-risk groups and application to these women of approaches that make sense for the specific etiology associated with their particular risk factor, e.g., cerclage for cervical shortening or premature cervical dilations. For small numbers of patients with specific risk factors an approach such as progesterone therapy for previous preterm delivery or short cervix on endovaginal ultrasound are further examples (20, 21). However, this approach is limited by the fact that such risk factors only capture a small percentage of the patients who will ultimately deliver prematurely. This approach of risk identification therefore would hold promise if we could identify a much larger proportion of those destined to deliver prematurely and apply an intervention or combination of interventions appropriate for the group chosen. There is substantial reason to believe that a benefit would be seen.
In this paper we assessed the clinical effectiveness and economic benefit of using a proteomic biomarker risk predictor for identifying higher-risk patients in the TREETOP study for treatment with established interventions for preterm birth including the use of medications and proactive care-management. Our innovative analytical approach combines real-world observational data with simulation of prolonged gestation based on published treatment efficacy to provide real-world effectiveness estimates. As reduction in adverse neonatal outcomes is in reality the compelling reason for delaying the GA at which the baby is born, we chose several critical metrics to determine whether the modeled studies resulted in improved neonatal outcomes. The results projected impressive potential improvements in neonatal morbidity and mortality, neonatal length of stay and in neonatal cost of care, which is also a surrogate for decreased morbidities requiring extended neonatal intensive care hospitalization. Encouragingly, neonatal improvements occurred without increasing, but rather decreasing maternal length of stay.
While one cannot necessarily assume efficacy of progesterone treatment in a broader population, and especially in patients who are delivered prematurely for medical or obstetrical indications, as it has in reality only been demonstrated to be effective in women with previous preterm deliveries by some studies in those with a short cervix on ultrasound, a stronger assertion can be made of the potential benefit for care-management. Several studies in diverse populations have demonstrated reductions in prematurity, shortened durations of NICU stay and/or improvements in composite neonatal morbidity with such programs (18, 19, 26, 27). And since all of these were in were in populations with diverse reasons for identification of increased risk for prematurity, their combined findings lend support that a marker which is not specific for any particular etiology or particular risk of prematurity may be similarly effective on all.
The results of this exercise are important, demonstrating not only the potential benefits for mothers and babies but also the utility of effectiveness assessments combining real-world data and published evidence. Furthermore, this study provides confidence that future studies and real-world use of test-and-treat strategies are worthwhile, ethically justified for pregnant women and are likely to demonstrate positive results. As well, this modeling exercise provides a compelling reason to proceed with well-designed and controlled prospective studies of progesterone and/or care-management in patients identified as higher-risk with such a well demonstrated, sensitive protein marker of risk for prematurity. Finally, these results are based on interventions that are currently used in practice, known to be safe and generally found to be acceptable to pregnant women (24, 26, 27, 36). Thus, there is a high potential for the proteomic biomarker risk predictor to be a clinically important component of risk stratification for pregnant women that could lead to tangible gains in reducing the impact of preterm birth.
Data Availability
Data supporting the results presented here can be requested at data-sharing@seraprognostics.com. Data will not be made publicly available, or in any format, that may violate a subject's right to privacy. For example, dating information or identifiers that would allow data to be integrated, thereby enabling the potential identification of study subjects, are protected.
Conflict of Interest
AF, AP, TF, GC, JB, JJB, TG and PK are stockholders, employees, or contract employees of Sera Prognostics. All other authors report no conflict of interest.
Availability of Data and Materials
Data supporting the results presented here can be requested at data-sharing@seraprognostics.com. Data will not be made publicly available, or in any format, that may violate a subject’s right to privacy. For example, dating information or identifiers that would allow data to be integrated, thereby enabling the potential identification of study subjects, are protected.
Funding
Sponsorship (funding, study design and execution) of this work was provided by Sera Prognostics, Inc.
Author Contributions
J.B., G.C.C., J.J.B, and P.E.K conceptualized the study. G.R.M., G.R.S., L.C.L, K.D.H., D.V.C, C.N.S, J.K.B, D.M.H, S.L., S.A.S., C.A.M, S.M.W., L.M.P., E.J.S., K.A.B., A.F.H., and A.H.C provided subjects, clinical data, and serum samples for analyses. A.C.F curated data. Formal analyses were done by J.B. (including validation), A.D.P, and P.E.K., with methodology designed by J.B, J.J.B, and P.E.K. Visualization/data presentation and/or manuscript draft preparation was done by J.B., T.C.F., T.J.G. G.C.C., J.J.B., and P.E.K. The study was supervised by J.J.B. and P.E.K., with P.E.K. serving as project administrator. All authors reviewed, edited, and approved the manuscript.
Acknowledgements
We would like to acknowledge the study coordinators and research personnel at the study sites. We would also like to acknowledge the Sera clinical laboratory and clinical operations teams.