EHR-based Case Identification of Pediatric Long COVID: A Report from the RECOVER EHR Cohort

Morgan Botdorf; Kimberley Dickinson; Vitaly Lorman; Hanieh Razzaghi; Nicole Marchesani; Suchitra Rao; Colin Rogerson; Miranda Higginbotham; Asuncion Mejias; Daria Salyakina; Deepika Thacker; Dima Dandachi; Dimitri A Christakis; Emily Taylor; Hayden Schwenk; Hiroki Morizono; Jonathan Cogen; Nathan M Pajor; Ravi Jhaveri; Christopher B. Forrest; L. Charles Bailey; the RECOVER Consortium

doi:10.1101/2024.05.23.24307492

Abstract

Objective Long COVID, marked by persistent, recurring, or new symptoms post-COVID-19 infection, impacts children’s well-being yet lacks a unified clinical definition. This study evaluates the performance of an empirically derived Long COVID case identification algorithm, or computable phenotype, with manual chart review in a pediatric sample. This approach aims to facilitate large-scale research efforts to understand this condition better.

Methods The algorithm, composed of diagnostic codes empirically associated with Long COVID, was applied to a cohort of pediatric patients with SARS-CoV-2 infection in the RECOVER PCORnet EHR database. The algorithm classified 31,781 patients with conclusive, probable, or possible Long COVID and 307,686 patients without evidence of Long COVID. A chart review was performed on a subset of patients (n=651) to determine the overlap between the two methods. Instances of discordance were reviewed to understand the reasons for differences.

Results The sample comprised 651 pediatric patients (339 females, M_age = 10.10 years) across 16 hospital systems. Results showed moderate overlap between phenotype and chart review Long COVID identification (accuracy = 0.62, PPV = 0.49, NPV = 0.75); however, there were also numerous cases of disagreement. No notable differences were found when the analyses were stratified by age at infection or era of infection. Further examination of the discordant cases revealed that the most common cause of disagreement was the clinician reviewers’ tendency to attribute Long COVID-like symptoms to prior medical conditions. The performance of the phenotype improved when prior medical conditions were considered (accuracy = 0.71, PPV = 0.65, NPV = 0.74).

Conclusions Although there was moderate overlap between the two methods, the discrepancies between the two sources are likely attributed to the lack of consensus on a Long COVID clinical definition. It is essential to consider the strengths and limitations of each method when developing Long COVID classification algorithms.

Competing Interest Statement

Dr. Mejias reports funding from Janssen, Merck for research support, and Janssen, Merck and Sanofi-Pasteur for Advisory Board participation; Dr. Rao reports prior grant support from GSK and Biofire and is a consultant for Sequiris. Dr. Jhaveri is a consultant for AstraZeneca, Seqirus and Dynavax, and receives an editorial stipend from Elsevier. All other authors have no conflicts of interest to disclose.

Funding Statement

This research was funded by the National Institutes of Health (NIH) Agreement OT2HL161847-01 as part of the Researching COVID to Enhance Recovery (RECOVER) program of research.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Institutional Review Board (IRB) approval was obtained under Biomedical Research Alliance of New York (BRANY) protocol #21-08-508. BRANY IRB waived the need for consent and HIPAA authorization.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

Authorship Statement Authorship has been determined according to ICMJE recommendations.
Funding Source: This research was funded by the National Institutes of Health (NIH) Agreement OTA OT2HL161847-01 as part of the Researching COVID to Enhance Recovery (RECOVER) program of research.
Disclaimer: This content is solely the responsibility of the authors and does not necessarily represent the official views of the RECOVER Initiative, the NIH, or other funders.
Abbreviations: PASC—post-acute sequelae of SARS-CoV-2 infection; COVID-19—coronavirus disease 2019; SARS-CoV-2— severe acute respiratory syndrome coronavirus 2; PCR—polymerase chain reaction; EHR—electronic health record; MIS-C— multisystem inflammatory syndrome in children; ICD-10—International Classification of Diseases, version 10.
Supplemental files have been added. These files contain supplemental tables, the chart review form, and diagnostic codes included in each cluster of conditions/symptoms.

Data Availability

All data produced in the present study are available upon reasonable request to the authors.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.