RT Journal Article SR Electronic T1 Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.05.24.22275398 DO 10.1101/2022.05.24.22275398 A1 Reese, Justin A1 Blau, Hannah A1 Bergquist, Timothy A1 Loomba, Johanna J. A1 Callahan, Tiffany A1 Laraway, Bryan A1 Antonescu, Corneliu A1 Casiraghi, Elena A1 Coleman, Ben A1 Gargano, Michael A1 Wilkins, Kenneth A1 Cappelletti, Luca A1 Fontana, Tommaso A1 Ammar, Nariman A1 Antony, Blessy A1 Murali, T. M. A1 Karlebach, Guy A1 McMurry, Julie A A1 Williams, Andrew A1 Moffitt, Richard A1 Banerjee, Jineta A1 Solomonides, Anthony E. A1 Davis, Hannah A1 Kostka, Kristin A1 Valentini, Giorgio A1 Sahner, David A1 Chute, Christopher G. A1 Madlock-Brown, Charisse A1 Haendel, Melissa A A1 Robinson, Peter N. A1 , A1 , YR 2022 UL http://medrxiv.org/content/early/2022/05/25/2022.05.24.22275398.abstract AB Accurate stratification of patients with Post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies and could enable more focussed investigation of the molecular pathogenetic mechanisms of this disease. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling long COVID phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Using unsupervised machine learning (k-means clustering), we found six distinct clusters of long COVID patients, each with distinct profiles of phenotypic abnormalities with enrichments in pulmonary, cardiovascular, neuropsychiatric, and constitutional symptoms such as fatigue and fever. There was a highly significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. We show that the clusters we identified in one hospital system were generalizable across different hospital systems. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on long COVID.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Institutes of Health awards as follows: CD2H NCATS U24 TR002306, NHLBI RECOVER Agreement OT2HL161847-01, Office of the Director Monarch Initiative R24 OD011883, and NHGRI Center of Excellence in Genome Sciences RM1 HG010860; and was conducted under the N3C DUR RP-5677B5. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH. Additionally, Justin T. Reese was supported by the Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231; Peter N. Robinson was supported by the Donald A. Roux Family Fund at the Jackson Laboratory; and Melissa A. Haendel was supported by the Marsico Family at the University of Colorado Anschutz Medical Authorship was determined using ICMJE recommendations. The analyses described in this publication were conducted with data or tools accessed through the NCATS N3C Data Enclave covid.cd2h.org/enclave and supported by NCATS U24 TR002306. This research was possible because of the patients whose information is included within the data from participating organizations (covid.cd2h.org/dtas) and the organizations and scientists (covid.cd2h.org/duas) who have contributed to the on-going development of this community resource.72 The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol #IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData are available by application to the N3C Data Enclave, which is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources.