ABSTRACT
Multiple sclerosis (MS) phenotypes provide useful disease descriptions but lack complete information regarding the continuing disease process. Disease activity and progression are meaningful modifiers of the MS phenotypes which can further guide prognosis, therapeutic decisions, and clinical trial designs and outcomes, which were not explicitly documented in patients’ electronic medical records (EMRs). We aimed to detect disease activity and progression in patients with MS from clinical notes in the EMR using Natural Language Processing and Machine Learning models. Using randomly selected progress notes from MS patients at the University of Rochester MS clinic, we integrated NLP and machine learning technologies to predict selected phenotype modifiers that represent disease activity and progression. The method was evaluated by the performance of both the NLP models and machine learning models, as well as the interpretability of the integrated method. We identified 460 progress notes from 287 adult MS patients. The NLP model had an average of 0.92 in precision, 0.87 in recall, and 0.89 in F-score for entity extraction. It had an average of 0.85 in precision, 0.84 in recall, and 0.85 in F-score for entity relation extraction. The sensitivities and specificities of the classification algorithms in predicting phenotype modifiers were: 67% and 93% for predicting modifier “Active”, 61% and 82% for predicting modifier “Worsening”, 92% and 98% for predicting modifier “Progression”, 80% and 94% for predicting modifier “New MRI Lesion”, respectively. We showed that the integrated method of NLP with machine learning classification is capable of detecting evidence of disease activity and clinical progression from clinical notes. The classification algorithms yielded interpretable and largely clinically relevant features (symptoms and clinical conditions) that were persistently associated with disease activity and progression. This method holds promise for facilitating the screening of MS clinical trial participants and potentially identifying early evidence of disease progression.
Author Summary Disease activity and progression of disability can be meaningful modifiers to base MS phenotypes which can further impact prognosis, therapeutic decisions, and clinical trial designs and outcomes. However, studies have shown that neither MS phenotypes nor their modifiers are consistently documented in electronic medical record (EMR) chart notes. The evidence for disease activity and progression often resides in the clinical notes, requiring manual chart review from clinical experts and increasing the difficulty of conducting clinical research. In this paper, we developed a generalized information extraction, classification and prediction pipeline, incorporating Natural Language Processing (NLP) technologies and shallow machine learning models, to detect MS disease activity and progression in clinical notes from EMR and to predict phenotype modifiers. Results demonstrated that this integrated method extracts clinically relevant information from progress notes that are persistently associated with disease activity and progression, and predicts MS phenotype modifiers with satisfactory performance, encouraging portability and interpretability. In the future, we aimed to apply the method in this study for facilitating high throughputs of MS clinical trial screening and assessing disease modifying therapy utilization based on disease modifiers.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The project described in this publication was supported by the University of Rochester CTSA award number TL1 TR002000 from the National Center for Advancing Translational Sciences of the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All relevant ethical safeguards have been met in relation to patient privacy protection. Institutional review board (IRB) approval for this study was obtained through Research Subjects Review Board (RSRB) under the Office of Human Subject Research at University of Rochester. The approval number is STUDY00005629. The RSRB granted waiver of Informed Consent and waiver of HIPAA authorization to this study because the research involves medical record review only and is no greater than minimal risk and there is no recruitment or intervention performed.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
There was an update the Ethics statements with more details and an error fix in Table 2a Demographic characteristics of study subjects.
Data Availability
The data that support the findings of this study will be publicly available from Dryad.