Abstract
Perinatal depression (PND) is one of the most common medical complications during pregnancy and postpartum period, affecting 10-20% of pregnant individuals. Black and Latina women have higher rates of PND, yet they are less likely to be diagnosed and receive treatment. Machine learning (ML) models based on Electronic Medical Records (EMRs) have been effective in predicting postpartum depression in middle-class White women but have rarely included sufficient proportions of racial and ethnic minorities, which contributed to biases in ML models for minority women. Our goal is to determine whether ML models could serve to predict depression in early pregnancy in racial/ethnic minority women by leveraging EMR data. We extracted EMRs from a hospital in a large urban city that mostly served low-income Black and Hispanic women (N=5,875) in the U.S. Depressive symptom severity was assessed from a self-reported questionnaire, PHQ-9. We investigated multiple ML classifiers, used Shapley Additive Explanations (SHAP) for model interpretation, and determined model prediction bias with two metrics, Disparate Impact, and Equal Opportunity Difference. While ML model (Elastic Net) performance was low (ROCAUC=0.67), we identified well-known factors associated with PND, such as unplanned pregnancy and being single, as well as underexplored factors, such as self-report pain levels, lower levels of prenatal vitamin supplement intake, asthma, carrying a male fetus, and lower platelet levels blood. Our findings showed that despite being based on a sample mostly composed of 75% low-income minority women (54% Black and 27% Latina), the model performance was lower for these communities. In conclusion, ML models based on EMRs could moderately predict depression in early pregnancy, but their performance is biased against low-income minority women.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work is funded through a K12 BIRCWH Award (NICHD 101373-04) to BPB and supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, through Grant Award Number UL1TR002003.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
University of Illinois Chicago Institutional Review Board (IRB # 2020-0553)
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability
Data is available to the research community upon approval of the University of Illinois Chicago Institutional Review Board (IRB # 2020-0553). Code is available at https://github.com/Bealab.