PT - JOURNAL ARTICLE AU - Azhir, Alaleh AU - Hügel, Jonas AU - Tian, Jiazi AU - Cheng, Jingya AU - Bassett, Ingrid V. AU - Bell, Douglas S. AU - Bernstam, Elmer V. AU - Farhat, Maha R. AU - Henderson, Darren W. AU - Lau, Emily S. AU - Morris, Michele AU - Semenov, Yevgeniy R. AU - Triant, Virginia A. AU - Visweswaran, Shyam AU - Strasser, Zachary H. AU - Klann, Jeffrey G. AU - Murphy, Shawn N. AU - Estiri, Hossein TI - Precision Phenotyping for Curating Research Cohorts of Patients with Post-Acute Sequelae of COVID-19 (PASC) as a Diagnosis of Exclusion AID - 10.1101/2024.04.13.24305771 DP - 2024 Jan 01 TA - medRxiv PG - 2024.04.13.24305771 4099 - http://medrxiv.org/content/early/2024/04/16/2024.04.13.24305771.short 4100 - http://medrxiv.org/content/early/2024/04/16/2024.04.13.24305771.full AB - Scalable identification of patients with the post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms and the suboptimal accuracy, demographic biases, and underestimation of the PASC diagnosis code (ICD-10 U09.9). In a retrospective case-control study, we developed a precision phenotyping algorithm for identifying research cohorts of PASC patients, defined as a diagnosis of exclusion. We used longitudinal electronic health records (EHR) data from over 295 thousand patients from 14 hospitals and 20 community health centers in Massachusetts. The algorithm employs an attention mechanism to exclude sequelae that prior conditions can explain. We performed independent chart reviews to tune and validate our precision phenotyping algorithm. Our PASC phenotyping algorithm improves precision and prevalence estimation and reduces bias in identifying Long COVID patients compared to the U09.9 diagnosis code. Our algorithm identified a PASC research cohort of over 24 thousand patients (compared to about 6 thousand when using the U09.9 diagnosis code), with a 79.9 percent precision (compared to 77.8 percent from the U09.9 diagnosis code). Our estimated prevalence of PASC was 22.8 percent, which is close to the national estimates for the region. We also provide an in-depth analysis outlining the clinical attributes, encompassing identified lingering effects by organ, comorbidity profiles, and temporal differences in the risk of PASC. The PASC phenotyping method presented in this study boasts superior precision, accurately gauges the prevalence of PASC without underestimating it, and exhibits less bias in pinpointing Long COVID patients. The PASC cohort derived from our algorithm will serve as a springboard for delving into Long COVID’s genetic, metabolomic, and clinical intricacies, surmounting the constraints of recent PASC cohort studies, which were hampered by their limited size and available outcome data.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study has been supported by grants from the National Institutes of Health, National Institute of Allergy and Infectious Diseases (NIAID) R01AI165535, National Heart, Lung, and Blood Institute (NHLBI) OT2HL161847, and National Center for Advancing Translational Sciences (NCATS) UL1 TR003167, UL1 TR001881, and U24TR004111. J.Hugel's work was partially funded by a fellowship within the IFI programme of the German Academic Exchange Service (DAAD) and by the Federal Ministry of Education and Research (BMBF) as well by the German Research Foundation (426671079).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Use of patient data in this study was approved by the Mass General Brigham Institutional Review Board.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesClinical data used in the study are not available publicly. All non-clinical data produced are available online at https://github.com/clai-group/long_covid_ai_scripts/pkgs/container/post_covid_ai_scripts