PT - JOURNAL ARTICLE AU - Lorman, Vitaly AU - Razzaghi, Hanieh AU - Song, Xing AU - Morse, Keith AU - Utidjian, Levon AU - Allen, Andrea J. AU - Rao, Suchitra AU - Rogerson, Colin AU - Bennett, Tellen D. AU - Morizono, Hiroki AU - Eckrich, Daniel AU - Jhaveri, Ravi AU - Huang, Yungui AU - Ranade, Daksha AU - Pajor, Nathan AU - Lee, Grace M. AU - Forrest, Christopher B. AU - Bailey, L. Charles TI - A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program AID - 10.1101/2022.12.22.22283791 DP - 2022 Jan 01 TA - medRxiv PG - 2022.12.22.22283791 4099 - http://medrxiv.org/content/early/2022/12/26/2022.12.22.22283791.short 4100 - http://medrxiv.org/content/early/2022/12/26/2022.12.22.22283791.full AB - Background As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data.Methods and Findings In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS-CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values.Conclusions The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.Funding Source This research was funded by the National Institutes of Health (NIH) Agreement OT2HL161847-01 as part of the Researching COVID to Enhance Recovery (RECOVER) program of research.Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the RECOVER Program, the NIH or other funders.Competing Interest StatementDr. Rao reports prior grant support from GSK and Biofire and is a consultant for Sequiris. Dr. Jhaveri is a consultant for AstraZeneca, Seqirus, Dynavax, receives an editorial stipend from Elsevier and Pediatric Infectious Diseases Society and royalties from Up To Date/Wolters Kluwer. Dr. Lee serves on the PASC Advisory Board for United Health Group. Dr Bailey has received grants from Patient-Centered Outcomes Research Institute All other authors have nothing to disclose.Funding StatementThis research was funded by the National Institutes of Health (NIH) Agreement OT2HL161847-01 as part of the Researching COVID to Enhance Recovery (RECOVER) program of research.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:BRANY IRB gave ethical approval for this work and waived documentation of informed consent.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors.