PT - JOURNAL ARTICLE AU - Banda, Juan M. AU - Adderley, Nicola AU - Ahmed, Waheed-Ul-Rahman AU - AlGhoul, Heba AU - Alser, Osaid AU - Alser, Muath AU - Areia, Carlos AU - Cogenur, Mikail AU - Fišter, Krisitina AU - Gombar, Saurabh AU - Huser, Vojtech AU - Jonnagaddala, Jitendra AU - Lai, Lana YH AU - Leis, Angela AU - Mateu, Lourdes AU - Mayer, Miguel Angel AU - Minty, Evan AU - Morales, Daniel AU - Natarajan, Karthik AU - Paredes, Roger AU - Periyakoil, Vyjeyanthi S. AU - Prats-Uribe, Albert AU - Ross, Elsie G. AU - Singh, Gurdas AU - Subbian, Vignesh AU - Vivekanantham, Arani AU - Prieto-Alhambra, Daniel TI - Characterization of long-term patient-reported symptoms of COVID-19: an analysis of social media data AID - 10.1101/2021.07.13.21260449 DP - 2021 Jan 01 TA - medRxiv PG - 2021.07.13.21260449 4099 - http://medrxiv.org/content/early/2021/07/15/2021.07.13.21260449.short 4100 - http://medrxiv.org/content/early/2021/07/15/2021.07.13.21260449.full AB - As the SARS-CoV-2 virus (COVID-19) continues to affect people across the globe, there is limited understanding of the long term implications for infected patients1–3. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually designed by clinicians, and not granular enough to understand the natural history or patient experiences of ‘long COVID’. In order to get a complete picture, there is a need to use patient generated data to track the long-term impact of COVID-19 on recovered patients in real time. There is a growing need to meticulously characterize these patients’ experiences, from infection to months post-infection, and with highly granular patient generated data rather than clinician narratives. In this work, we present a longitudinal characterization of post-COVID-19 symptoms using social media data from Twitter. Using a combination of machine learning, natural language processing techniques, and clinician reviews, we mined 296,154 tweets to characterize the post-acute infection course of the disease, creating detailed timelines of symptoms and conditions, and analyzing their symptomatology during a period of over 150 days.Competing Interest StatementThe authors have declared no competing interest.Funding StatementJMB was funded by a grant by the National Institute of Aging (3P30AG059307-02S1). DPA is funded through an NIHR Senior Research Fellowship (Grant number SRF-2018-11-ST2-004). VH contribution to this work was carried out with support from National Library of Medicine, National Institutes of Health.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB not neededAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData will be made available after formal publication.