Abstract
Background Atopic dermatitis (AD) is a chronic skin condition that millions of people around the world live with each day. Performing research studies into identifying the causes and treatment for this disease has great potential to provide benefit for these individuals. However, AD clinical trial recruitment is a non-trivial task due to variance in diagnostic precision and phenotypic definitions leveraged by different clinicians as well as time spent finding, recruiting, and enrolling patients by clinicians to become study subjects. Thus, there is a need for automatic and effective patient phenotyping for cohort recruitment.
Objective Our study aims to present an approach for identifying patients whose electronic health records suggest that they may have AD.
Methods We created a vectorized representation of each patient and trained various supervised machine learning methods to classify when a patient has AD. Each patient is represented by a vector of either probabilities or binary values where each value indicates whether they meet a different criteria for AD diagnosis. Results: The most accurate AD classifier performed with a class-balanced accuracy of 0.8036, a precision of 0.8400, and a recall of 0.7500 when using XGBoost (Extreme Gradient Boosting).
Conclusions Creating an automated approach for identifying patient cohorts has the potential to accelerate, standardize, and automate the process of patient recruitment for AD studies; therefore, reducing clinician burden and informing knowledge discovery of better treatment options for AD.
Competing Interest Statement
David J. Margolis is or recently has been a consultant for Pfizer, Leo, and Sanofi with respect to studies of atopic dermatitis and served on an advisory board for the National Eczema Association.
Funding Statement
This study was partially funded by the National Institutes of Health (NIH) National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) P30-AR069589 as part of the Penn Skin Biology and Diseases Resource-Based Center (Core: David J. Margolis, Danielle L. Mowery).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of University of Pennsylvania gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Data Availability
To protect patient privacy, the clinical data is not available.
Abbreviations
- AD
- atopic dermatitis
- BERT
- Bidirectional Encoder Representations from Transformers
- EHR
- Electronic Health Records
- ICD
- International Classification of Disease
- UKWP
- United Kingdom Working Party
- HR
- Hanifin and Rajka
- AI
- Artificial Intelligence
- NLP
- Natural Language Processing
- ML
- Machine Learning
- MLP
- Multi-layer Perceptron
- ReLU
- Rectified Linear Unit
- SGD
- Stochastic Gradient Descent
- KNN
- K-Nearest Neighbors
- XGBoost
- Extreme Gradient Boosting
- AdaBoost
- Adaptive Boosting
- SVM
- Support Vector Machines
- TP
- True Positive
- TN
- True Negative
- FP
- False Positive
- FN
- False Negative
- NPV
- Negative Predictive Value