Abstract
Importance SARS-CoV-2 infection can result in ongoing, relapsing, or new symptoms or other health effects after the acute phase of infection; termed post-acute sequelae of SARS-CoV-2 infection (PASC), or long COVID. The characteristics, prevalence, trajectory and mechanisms of PASC are ill-defined. The objectives of the Researching COVID to Enhance Recovery (RECOVER) Multi-site Observational Study of PASC in Adults (RECOVER-Adult) are to: (1) characterize PASC prevalence; (2) characterize the symptoms, organ dysfunction, natural history, and distinct phenotypes of PASC; (3) identify demographic, social and clinical risk factors for PASC onset and recovery; and (4) define the biological mechanisms underlying PASC pathogenesis.
Methods RECOVER-Adult is a combined prospective/retrospective cohort currently planned to enroll 14,880 adults aged ≥18 years. Eligible participants either must meet WHO criteria for suspected, probable, or confirmed infection; or must have evidence of no prior infection. Recruitment occurs at 86 sites in 33 U.S. states, Washington, DC and Puerto Rico, via facility– and community-based outreach. Participants complete quarterly questionnaires about symptoms, social determinants, vaccination status, and interim SARS-CoV-2 infections. In addition, participants contribute biospecimens and undergo physical and laboratory examinations at approximately 0, 90 and 180 days from infection or negative test date, and yearly thereafter. Some participants undergo additional testing based on specific criteria or random sampling. Patient representatives provide input on all study processes. The primary study outcome is onset of PASC, measured by signs and symptoms. A paradigm for identifying PASC cases will be defined and updated using supervised and unsupervised learning approaches with cross– validation. Logistic regression and proportional hazards regression will be conducted to investigate associations between risk factors, onset, and resolution of PASC symptoms.
Discussion RECOVER-Adult is the first national, prospective, longitudinal cohort of PASC among US adults. Results of this study are intended to inform public health, spur clinical trials, and expand treatment options.
Introduction
Hundreds of millions of people worldwide have been infected with the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2).[1] Many have experienced ongoing, relapsing, or new symptoms or other health effects occurring after the acute phase of infection, termed post-acute sequelae of SARS-CoV-2 infection (PASC), or long COVID. While more than 200 symptoms have been associated with PASC,[2] there are no agreed upon criteria for the diagnosis of PASC, and estimates of PASC incidence and prevalence vary widely.[3–9]
The pathophysiology underlying PASC remains incompletely understood.[10, 11] Various mechanisms have been proposed, including viral persistence,[12–17] microvascular clotting and platelet dysregulation,[18–21] tissue damage from initial infection,[22, 23] inflammation and immune dysregulation,[17, 24-31] reactivation of other latent viral infections (e.g., Epstein-Barr virus),[17, 32, 33] microbial translocation and dysbiosis,[34, 35] and/or impacts of pandemic– related disruptions on health.[36–38] Further characterization of PASC clinical manifestations and underlying pathophysiologic mechanisms could facilitate identification and investigation of preventive and therapeutic interventions.
Materials and Methods
Objectives
The National Institutes of Health (NIH) initiative “Researching COVID to Enhance Recovery (RECOVER) Multi-site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults” (RECOVER-Adult) is intended to: (1) characterize the incidence and prevalence of PASC; (2) characterize the spectrum of clinical symptoms, subclinical organ dysfunction, natural history, and distinct phenotypes identified as PASC; (3) identify demographic, social determinants of health (SDoH), and clinical risk factors for PASC and PASC recovery, and (4) define the biological mechanisms underlying pathogenesis of PASC. This report describes the study design of the RECOVER-Adult study.
Study Design
RECOVER is an ambidirectional (combined retrospective and prospective) longitudinal cohort study that includes people infected or uninfected with SARS-CoV-2. Participants may be enrolled at the time of SARS-CoV-2 infection or a negative test (for uninfected group) and followed prospectively; or, may be enrolled after SARS-CoV-2 infection or a negative test, asked retrospectively about symptoms since infection or a negative test, and then followed prospectively. Participants may be followed until October 2025. An embedded cohort study of pregnant people will provide longitudinal follow-up of the birthing parent and offspring. The protocol for the pregnancy cohort is reported separately [39]. In addition, a series of nested case– control studies will be performed among participants with and without select symptoms or findings, who will undergo more intensive radiographic imaging, physiologic assessment, and tissue collection.
Protocol development
The protocol was developed and refined in a collaborative process involving site and core investigators, patient representatives and caregivers, and NIH staff (see S1 Fig for timeline and details). Patient representatives were recruited from COVID advocacy organizations such as the Patient-Led Research Collaborative, Survivor Corps, and Long COVID Families; patient organizations with expertise in post-viral syndromes; grass-roots activist organizations; and through nominations by enrolling sites. All patient representatives were compensated for their time. Refinements to the protocol continue to be made in response to participant and site feedback, new scientific evidence, and interim results.
Study organizational structure and study management
The study infrastructure includes four cores: (1) the Clinical Science Core (CSC) at New York University (NYU) Grossman School of Medicine oversees study sites and provides scientific leadership in collaboration with the site Principal Investigators, (2) the Data Resource Core (DRC) at Massachusetts General Hospital and Brigham and Women’s Hospital provides scientific and statistical leadership, and handles data management and storage, (3) the PASC Biorepository Core (PBC) at Mayo Clinic manages biospecimens obtained from study sites, and (4) the Administrative Coordinating Center (ACC) at RTI International (RTI) provides operational and administrative support; collectively these form the Core Operations Group. The four cores are supported by six Oversight Committees that oversee RECOVER-wide activities including publications, ancillary studies, clinical trial interventions selection, quality assurance, and study design. Twelve pathobiology task forces provide content-specific input. All RECOVER cohort studies receive inputs from the National Community Engagement Group composed of patient and community representatives; and are overseen by a Steering Committee composed of core and hub principal investigators, patient representatives and NIH program leadership; an Executive Committee composed of NIH Institute leaders, patient representatives and other federal leadership; and an Observational Study Monitoring Board (OSMB) (S2 Fig).[40]
Study setting and participating sites
RECOVER-Adult is designed as a hub and spoke model, with 16 hubs collectively overseeing 86 enrolling sites in the United States (U.S.) located in 33 states plus Washington, DC and Puerto Rico (S1 Table). Enrolling sites include hospitals, health centers, and community organizations drawing participants primarily from their surrounding communities. Two sites are mobile health vans enrolling in rural communities far from health centers. One hub is enrolling participants remotely across the country, with study procedures conducted through home visits and biospecimen collection at local laboratories.
Eligibility Criteria
Participants are eligible for RECOVER if they are at least 18 years old, have reached the age of majority in their state of residence, are not incarcerated, and are not terminally ill. Individuals with or without history of SARS-CoV-2 infection are eligible. Infected individuals must have suspected, probable, or confirmed SARS-CoV-2 infection as defined by World Health Organization (WHO) criteria[41] (see Table 1), or positive SARS-CoV-2 infection-specific antibody testing. Uninfected individuals must not meet any WHO criteria for infection and must have a documented negative SARS-CoV-2 nucleic acid and antibody test result (Table 1).
Individuals who were pregnant at the time of a SARS-CoV-2 infection and had a live birth, or who are pregnant at the time of enrollment in RECOVER, are only eligible to enroll in the pregnancy cohort of the adult study. Their offspring are eligible for enrollment into the congenital exposure cohort of the RECOVER pediatric study. Individuals who were pregnant at the time of a SARS-CoV-2 infection and had a pregnancy loss or termination prior to 20 weeks’ gestation, are eligible to enroll in either the pregnancy or the adult main cohort.
Sample size
Sample size determinations were performed for both aggregate and subgroup analyses based on characteristics such as age, sex, race/ethnicity, pregnancy, and vaccination status. The current version of the protocol targets enrollment of 12,200 participants with history of SARS-CoV-2 infection and 2,680 participants without history of SARS-CoV-2 infection (total of 14,880). Sample size targets are further specified by duration of time between infection (or negative test) and enrollment, and by pregnancy status (Table 2). Based on 90% power and a type-1 error rate of 0.01, the minimum detectable effect size for the difference in risk of PASC or a PASC symptom between participants with and without infection is 3.1% (6.4% in a 25% subgroup), assuming the risk among participants without infection is 15%. When restricting to acute infected and uninfected participants only, the minimum detectable risk difference is 4.7% (10.0% in a 25% subgroup). In logistic regression analyses investigating whether an infected participant develops PASC, the minimum detectable odds ratio for a risk factor is 1.22 (1.46 in a 25% subgroup), assuming 25% of all infected participants develop PASC and that the risk factor prevalence among participants who do not develop PASC is 20%. Finally, assuming that 50% of infected participants who develop PASC recover from it during follow-up, the minimum detectable odds ratio for the association between a risk factor and recovering from PASC is 1.40 (1.90 in a 25% subgroup), assuming the risk factor has 20% prevalence among participants who do not recover from PASC.
Sample size targets for race/ethnicity are intended to match the distribution of SARS-CoV-2 infection in the U.S. as of June 2021: 16% non-Hispanic Black; 27% Hispanic, 4% non-Hispanic Asian, American Indian/Alaska Native or Native Hawaiian/Other Pacific Islander; and 53% non– Hispanic White.[42]
Recruitment
Participants are recruited through outreach to patients cared for at the enrolling site, community outreach, use of public health test lists, and self-referrals from the RECOVER website (https://recovercovid.org). For participants without SARS-CoV-2 infection, sites are asked to draw from similar communities, demographics, and sites of care as those recruiting infected participants. Enrollment is tracked by enrollment category, race/ethnicity, sex, residence in rural or medically underserved areas, hospitalization status at time of initial infection, and referral source to allow for real time adjustments in enrollment to match protocol targets.
Assessments
The RECOVER-Adult schedule of assessments includes: surveys, collection of biologic specimens, physical examinations, laboratory tests, radiologic studies, and invasive procedures to measure study outcomes. The schedule starts at the time of first infection or at the negative test date (“index date”); follow-up visits are conducted at 90-day intervals for a maximum of 4 years. All participants undergo the same assessments at baseline enrollment. Thereafter, participants follow the assessment schedule corresponding to the appropriate time point relative to the index date. For example, participants who are enrolled 90 days after infection follow the 180-day assessment schedule at their first follow-up visit (Figure). Participants may remain in the study if they have missed a visit; after three missed visits they may be considered lost to follow-up, in which case no further information is obtained.
Figure: Schedule of assessments
Surveys
Participants complete surveys at 90-day intervals throughout the study. On enrollment, data are collected on demographics, SDoH, disability, characteristics of the initial SARS-CoV-2 infection (if applicable), pregnancy (if applicable), vaccination status, comorbidities, medications, and PASC symptoms. Subsequently, at 90-day intervals, data are collected on interim infections, time-varying social determinants, vaccinations, comorbidities, medications and symptoms. The PASC symptom survey was developed for RECOVER and includes an overall quality of life instrument (PROMIS-10) and screening for core symptoms (43 for biological males and 46 for biological females) drawn from existing literature plus input from patient representatives and investigators. Questions about depression, anxiety, post-traumatic stress disorder (PTSD), and grief are also included. Report of a symptom may trigger additional questions about that symptom. Wherever possible, pre-existing validated survey instruments are used. Details of survey instruments can be found at https://recovercovid.org/protocols and in S2 Table.
In-person assessments
Office-based assessments are performed on all participants at enrollment, at 180 days after index date, and then at yearly increments thereafter. These include: height, weight, waist circumference, seated vitals (heart rate, blood pressure, oxygen saturation), 30 second sit-to– stand, and a 10 minute active stand test during which heart rate and blood pressure are measured at 1, 3, 5 and 10 minute intervals (S3 Table).
Laboratory assessments
A core set of laboratory studies are obtained on all participants at enrollment, at 90 and 180 days after the index date (S3 Table). After 180 days, abnormal laboratory tests from the most recent prior visit are repeated annually. At enrollment only, participants enrolled as uninfected undergo SARS-CoV-2 PCR testing and SARS-CoV-2 antibody testing (nucleocapsid for all, and spike only for unvaccinated). These core studies are performed at each site in Clinical Laboratory Improvement Amendments of 1988 (CLIA)-certified laboratories.
Biospecimens
At enrollment, at 90 and 180 days after the index date, and then annually, participants are asked to provide blood and nasopharyngeal/nasal swab biospecimens for storage (Table 3). Saliva is collected once upon enrollment for genetic analysis. Urine and stool are collected biannually. Biospecimens are not collected from participants who decline use of samples for future research.
Triggered testing
Participants with infection experiencing specific symptoms or having abnormal study assessments may be eligible for additional assessments, each of which is triggered by qualifying criteria. In addition, participants with and without infection are randomly selected to complete additional assessments for comparison. These assessments are divided into Tier 2 and Tier 3 assessments. Tier 2 assessments are anticipated to be completed by approximately 30% of participants per assessment, and may be repeated yearly if abnormal (S4 Table). Tier 3 assessments are more invasive and/or burdensome, and are anticipated to be completed by not more than 20% of participants per assessment. Tier 3 assessments with more than minimal risk can only be performed once (S5 Table). Participants are eligible to begin Tier 2 and Tier 3 testing 90 to 180 days after index date, depending on the assessment. Individuals who are pregnant, 3-months postpartum or breastfeeding are ineligible for some of the assessments. Tier 2 and 3 assessments include additional surveys, blood tests, clinical examinations, imaging, and procedures. Specialized blood tests are run by a central laboratory (ARUP Laboratories, Salt Lake City, Utah) for consistency. Imaging is acquired via pragmatic standard clinical protocols to maximize testing availability across sites. Several of the imaging studies are overseen by reading centers charged with protocol development, quality assurance/control, and certifying performance sites, in addition to centralized review of a portion of studies. DICOM images are uploaded and shared using specialized cloud-based image storage software provided by Ambra Health. Clinical examinations and procedures are performed by clinically certified personnel at each site following standard clinical protocols.
Data collection and management
Study data are collected by sites and entered into a centralized REDCap (Research Electronic Data Capture) database hosted by the DRC in a Federal Information Security Modernization Act moderate environment.[43] REDCap includes data validation and audit capabilities.[44, 45] Protected health information in the central REDCap database is limited to zip code and birthdate. Automated queries are generated by the DRC for missing or implausible data and sent to sites for near real-time correction. Monthly study monitoring reports are provided to sites to optimize fidelity to protocol. Periodic audits are conducted independent of the site investigators and sponsor.
Outcomes
The primary endpoints of this study are the presence of composite incident or prevalent PASC symptoms and progression of PASC; since there is not yet an agreed-upon definition of PASC, a working definition will be developed as part of the study (see statistical analysis). Secondary endpoints include recovery trajectories from SARS-CoV-2 infection, documentation of organ injury, and incident clinical diagnoses.
Statistical analysis
Point prevalence, defined as the proportion of participants reporting a symptom at a given follow-up time point among those remaining in RECOVER, will be calculated for participants with and without infection separately. Odds ratios (ORs) adjusted for demographic factors will be reported. Machine learning approaches will be used to select combinations of symptoms among the 40+ included in the symptom survey (i.e., variable selection) that differentiate participants with and without a history of infection [46]. Prevalent symptoms and select severity scores will be used as input to the model. Balancing weights will be applied to account for any differences between infected and uninfected participants [47]. Analyses will be iteratively refined as new data modalities (e.g. laboratory, radiology and other Tier 2/3 tests) and additional longitudinal assessments become available. Within PASC positive individuals, consensus clustering will be applied to identify PASC subgroups.[48]
Logistic regression analyses will be conducted to investigate associations between clinical factors, demographics, and SDoH, and the cumulative incidence of PASC among participants with infection. Multinomial regression models will also be used to investigate specific associations between these risk factors and PASC subgroups. A Cox proportional hazards regression will be fitted to model the hazard of developing PASC given the risk factors. A Fine–Gray model for the sub-distributional hazard of developing each PASC subgroups, accounting for the competing and semi-competing risks of each sub-phenotype as well as study dropout as a censoring event, will be fit among infected participants to estimate the association between risk factors and the hazard of each PASC subgroups.[49] Additional strategies that account for time–varying covariates (e.g., vaccination status, pharmaceutical or clinical interventions) will be considered.[50]
A Cox proportional hazard model will model PASC resolution, defined based on longitudinal assessments, to evaluate associations with baseline factors, including markers of illness severity during the acute phase of infection. Point estimates and 95% Wald confidence intervals will be estimated for each risk factor and a large-sample score test will be conducted to test against the null hypothesis that the hazard of resolution is independently associated with each risk factor.
Observational Study Monitoring Board
The RECOVER OSMB appointed by the NIH provides data and safety oversight, meeting at least twice annually. The purpose of the OSMB is to assure independent review of unreasonable risk exposure because of study participation, monitor study progress and integrity, and advise on significant protocol modifications. The OSMB is composed of experts in longitudinal studies, manifestations of COVID-19, biostatistics and bioethics, and patient/caregiver representatives. As RECOVER-Adult does not involve any interventions, early stopping rules for efficacy or futility are not indicated.
Major changes to the protocol
A planned flexible study design allows modifications to PASC case definition, tiered phenotyping assessments, comparator groups, and statistical plan after study initiation to optimize public health impact without undermining validity and integrity of study findings. Table 4 lists key modifications to the protocol to date.
Data sharing and dissemination
Scientific data will be de-identified and shared pursuant to the NIH policy for Data Management and Sharing policy.[51] The RECOVER Ancillary Study Oversight Committee and Biospecimen Access Committee govern access to data and biospecimens for ancillary studies. Study results will be disseminated via scientific publication, presentation at national meetings, public-facing webinars, community briefings, RECOVER newsletters, social media, and other means of communication to both scientific and lay audiences. Additionally, results of all tests performed in CLIA-certified laboratories or read by clinically-certified personnel are returned to study participants.[52] Tests performed in research laboratories that are not CLIA-certified are not returned to participants, following federal regulations.
Ethics
The study was approved by the NYU Grossman School of Medicine Institutional Review Board (IRB), which serves as the single IRB for the majority of the study sites. A few pre-existing consortia use their own IRBs through an exemption granted by the NIH. All participants provide signed, informed consent to participate in the main protocol. Participants are reconsented if there are major changes to the study design or to anticipated risks. For high-risk Tier 3 procedures, a separate procedure-specific consent is obtained prior to the procedure.
Discussion
The overall goal of RECOVER is to rapidly improve understanding of, and ability to predict, treat, and prevent PASC. RECOVER-Adult’s large sample size and breadth of representation across geographic region, age, sex, race/ethnicity and other SDoH and pregnancy status are expected to produce broadly applicable and actionable results and support numerous subgroup analyses. Additionally, nested case-control studies will occur among participants with certain PASC phenotypes who have triggered assessments. RECOVER-Adult includes numerous strengths. Participants include many uninfected and asymptomatic infected individuals for comparison that is often missing from other large studies.[11, 53-56] All participants are followed prospectively from time of enrollment, allowing longitudinal analyses of disease trajectory. By contrast, most studies to date have been single time point assessments or serial cross-sectional studies.[3-9, 57] Acute participants enrolled at the time of first infection will provide a prospective estimate of PASC rates that is less biased than the most studies, which have enrolled subjects after PASC status is known.[3–9] The strong focus on patient-reported symptom outcomes allows capture of a broader range of sequelae than studies relying on electronic health records or claims data.[53, 58-62] The adaptive nature of the protocol allows for rapid responsiveness to new discoveries and changes in the nature of the pandemic. The extensive biospecimen collection and clinical, laboratory, and radiology assessments will generate a wealth of deep phenotyping data that can be used for pathophysiologic analyses.Finally, multi-omics analyses are proposed and have potential to provide molecular mechanistic insights into the pathophysiology of PASC.
RECOVER-Adult is also unique in the extent to which patients experiencing PASC and representatives from patient advocacy communities contributed to the protocol’s development and ongoing operations. For example, the PASC symptom survey was drawn in part from lists of symptoms generated by members of the patient community,[2] allowing measurement of symptoms overlooked by other studies, including post-exertional malaise and menstrual cycle changes. At the urging of patient representatives, participants with a clinical diagnosis of COVID were included even without a positive test history, to permit inclusion of individuals affected in the earliest stages of the pandemic when testing was not widely available. Among the many other significant design aspects credited to input from patient representatives are: wording of the consent document, including clinical assessments specific to dysautonomia, sharing clinically certified lab results with participants, ensuring accommodations for participants with myalgic encephalomyelitis/chronic fatigue syndrome, and selection of SDoH instruments.
In summary, RECOVER-Adult is a large, national, longitudinal, retrospective and prospective cohort that will answer key questions about the epidemiology and pathophysiology of PASC. Results will support clinical trial development by defining PASC and sub-phenotypes, natural history, risk factors, biomarkers, and mechanistic pathways for potential therapeutic targets. Results of this study will also inform public health efforts, prevention, and clinical care.
Supporting information
S1 Fig: Protocol Development Timeline
S2 Fig: RECOVER Consortium Oversight Structure
S1 Table: Hubs and Enrolling Sites
S2 Table: Survey Topics as of Protocol Version 7.0
S3 Table: Tier 1 Assessments
S4 Table: Tier 2 Assessments
S5 Table: Tier 3 Assessments
S6 Table: Writing Committee
S7 Table: RECOVER-Adult Consortium Members
S8 Table: RECOVER-Adult Committees and Task Forces
Acknowledgements
We would like to thank the National Community Engagement Group (NCEG), all patient, caregiver and community representatives, and all the participants enrolled in the RECOVER initiative.
Footnotes
^ Membership of the RECOVER Initiative is provided in the Supporting Information.
Registration: NCT05172024