Abstract
Given the continued burden of severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) disease (COVID-19) across the U.S., there is a high unmet need for data to inform decision-making regarding social distancing and universal masking. We examined the association of community-level social distancing measures and individual masking with risk of predicted COVID-19 in a large prospective U.S. cohort study of 198,077 participants. Individuals living in communities with the greatest social distancing had a 31% lower risk of predicted COVID-19 compared with those living in communities with poor social distancing. Self-reported masking was associated with a 63% reduced risk of predicted COVID-19 even among individuals living in a community with poor social distancing. These findings provide support for the efficacy of mask-wearing even in settings of poor social distancing in reducing COVID-19 transmission. In the current environment of relaxed social distancing mandates and practices, universal masking may be particularly important in mitigating risk of infection.
Introduction
In March 2020, most U.S. states implemented community social distancing interventions, including shelter-in-place orders, school closures, bans on large gatherings, and suspension of non-essential businesses, in an attempt to limit transmission of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19).1 In some areas, these measures appear to have successfully reduced the pace and severity of COVID-19 burden during the initial wave of infections2,3, thereby “flattening the curve.” However, lockdowns are not viable as a long-term solution 4,5. In addition, despite growing evidence showing that masking can reduce disease transmission6,7, adherence to masking recommendations by public health authorities have been variable across the U.S. Given the continued rising burden of COVID-19 across many U.S. communities, there is a high unmet need for data to inform decision-making regarding universal masking in settings in which social distancing is not widely observed.
To date, most evidence on the efficacy of social distancing and universal masking is based on modeling using community-level data in relation to disease burden as assessed through testing, hospitalizations, or mortality7–9. Such studies are unable to concurrently account for personal risk factors for infection or optimally assess the latency between social distancing or masking interventions and infection rates given the significant lag between the onset of symptoms, testing, and medical care. Here, we conducted a prospective study in the U.S using a smartphone-based application that collected self-reported, individual-level information on COVID-19 like symptoms, masking and other personal risk factors, in combination with community-level social distancing measures to investigate the relative effectiveness of social distancing and masking policies with the risk of COVID-19.
Methods
Study Population
Our study population includes all participants enrolled in the COVID Symptom Study smartphone application (“app”) from March 29, 2020 to July 16, 2020 in the U.S. The app is a freely available program developed by Zoe Global Ltd. in collaboration with researchers and clinicians at Massachusetts General Hospital and King’s College London. Participants using this app reported demographic information and comorbidities at baseline and were encouraged to report on their current health condition daily to allow for the longitudinal, prospective collection of symptoms and COVID-19 testing results10. Participants were recruited through general and social media outreach, as well as direct invitations from the investigators of long-running prospective cohorts to study participants11. At enrollment, participants provided informed consent to the use of aggregated information for research purposes and agreed to applicable privacy policies and terms of use. This research study was approved by the Partners Human Research Committee (Institutional Review Board Protocol 2020P000909). This protocol is registered with ClinicalTrials.gov (NCT04331509).
Assessment of predicted COVID-19 and personal risk factors
The information collected through the app has been provided in detail previously10. Upon first use, participants were asked to provide baseline demographic factors, including their zip code of residence, and answered separate questions about a suspected risk factors for COVID-19 (Table 1). On first use and upon daily reminders, participants were asked if they felt physically normal, and if not, their symptoms, including fever, persistent cough, fatigue, loss of smell/taste, and diarrhea, among others10. Participants were also asked if they had been tested for COVID-19, and if yes, the results (none, negative, waiting, or positive). Beginning on June 12, 2020, participants were also asked if they had worn a face mask when outside the house in the last week (never, sometimes, most of the time, or always). Population density was calculated from Census data for all Zip Code Tabulation Areas (ZCTA) in the U.S. The daily estimated effective reproductive number (Rt), the average number of secondary cases arising from a single case for a given day in each state, was extracted from rt.live12. The method and adaptation for estimation of Rt was previously described13,14. Because a report of a positive COVID-19 test depends on access to testing and incorporates a variable delay between symptoms and testing, we used a previously published symptom-based classifier that predicts COVID-19 as our primary outcome measure15. Between March 24 and April 21, 2020, 2,450,569 UK and 168,293 U.S. individuals reported symptoms and 6,452 UK and 726 U.S. individuals reported a positive COVID-19 test. To build a prediction model, the UK participants were randomly divided into a training set and a test set (ratio: 80:20). Based on the training set, a logistic model generated to predict symptomatic COVID-19 was: Log odds (Predicted COVID-19) = −1.32 - (0.01 x age) + (0.44 x male sex) + (1.75 x loss of smell or taste) + (0.31 x severe or significant persistent cough) + (0.49 x severe fatigue) + (0.39 x skipped meals). The prediction model achieved a sensitivity of 0.65 (95% CI 0.62-0.67) and specificity of 0.78 (95% CI 0.76-0.80) in the test set. In additional validation in the U.S. participants, the prediction model achieved a sensitivity of 0.66 (95% CI 0.62-0.69) and specificity of 0.83 (95% CI 0.82-0.85). To examine the influence of COVID-19 incidence on our results, we included the daily county-level test positive COVID-19 incidence estimated by the Center for Systems Science and Engineering at Johns Hopkins University as a covariate16,17.
Assessment of community social distancing and personal face mask use
We assigned each individual participant a social distancing grade within their communities based on their zip code of residence. We used data provided by Unacast18 that estimated county-level social distancing for each calendar day according to the GPS activity of all devices assigned to their longest recorded location. Compared to the same day of the week during the pre-COVID-19 period (defined by Unacast as the four weeks prior to March 8, 2020), Unacast estimated, for each day, the percent reduction in three metrics – metric 1) average distance traveled per device; metric 2) non-essential visitation (e.g., restaurants, department stores, hair salons); and metric 3) human encounters calculated as two devices in close proximity (i.e., spatial distance of ≤50 m and temporal distance of ≤60 minutes)18. Unacast assigned grades (A, B, C, D, and F) using predefined cutoff points for each metric and calculated an overall social distancing grade (Supplementary Methods), with grade A indicating the greatest social distancing and F the poorest social distancing. For all analyses, we combined grades A and B due to a limited number of individuals living in counties assigned to grade A. For personal face mask use, we used the individual-level information collected through the app. Beginning on June 12, app users received supplementary questions regarding use of face masks based on the query “In the last week, did you wear a face mask when outside the house?”.
Statistical Analysis
We conducted prospective analyses after excluding participants who had any symptom related to COVID-19 or who had tested positive for COVID-19 prior to start of follow-up to minimize collider bias. Follow-up time started when participants first reported on the app and accrued until they developed predicted COVID-19, or the time of last data entry prior to July 16th, whichever occurred first. We used updated, time-varying community social distancing exposure data as our primary independent variable. Community-level social distancing exposure data and corresponding follow-up time was mapped to each individual and updated each time they logged in the app to provide updated symptom information. We also used time-varying masking exposure data for the association between self-reported personal use of masks and predicted COVID-19. Cox proportional hazards regression models stratified by age, state, and calendar date at study entry were used to calculate unadjusted and multivariable adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) of predicted COVID-19. Covariates were selected a priori based on putative risk factors and included race (white, Black, Asian, other race), sex (male, female), population density (quartiles), current smoking, work as a frontline healthcare worker, interaction with suspected or documented COVID-19, and history of diabetes, heart disease, lung disease, and kidney disease (each yes/no). Missing data for categorical variables was included as a missing indicator.
To minimize any variation of estimated daily social distancing grade associated with day of the week (e.g. Sunday vs. Monday), we used a seven-day average of community social distancing grade as the exposure for each participant. We first examined the latency between community social distancing grade and predicted COVID-19 using varying lag times (0 days, 7 days, 14 days, 21 days, and 28 days). For example, for a latency of 7 days, we used social distancing grade exposure on April 1 for predicted COVID-19 outcome measures on April 8, grade on April 2 for follow-up on April 9 and so forth (Supplemental Figure 1). For subgroup analysis according to daily state-level Rt, we used a 21-day latency since this corresponded to the start of the seven-day average social distancing exposure with a 14-day latency. Two-sided p-values of <0.05 were considered statistically significant for main analyses. All statistical analyses were performed using R software, version 3.6.1 (R Foundation).
Results
Between March 29 and July 16, 2020, we enrolled 277,798 participants who provided baseline information. We excluded 79,721 individuals who did not live in a county with available Unacast data, reported any symptoms or a positive COVID 19 test at enrollment, had <24 hours of follow-up time or who reported a positive COVID-19 test or symptoms of predicted COVID-19 within 24 hours of enrollment. This left 198,077 participants in our prospective inception cohort, in which we subsequently documented 4,488 cases of predicted COVID-19 over 11,403,773 person-days of follow-up. Compared to others, individuals who lived in communities with poor social distancing (Grade=F) at baseline were younger, more likely to be male, more likely to smoke currently, have less lung disease, and had more interaction with suspected or documented COVID-19 individuals (Table 1). In contrast, individuals living in communities with excellent social distancing (Grade=A/B) were older and more likely to live in areas with lower population density (Table 1).
Risk of predicted COVID-19 according to overall community social distancing grade at various time lags
To test the association between community-level social distancing and risk of subsequent predicted COVID-19, we evaluated lag times of 7 to 28 days. Living in a community with greater social distancing grade (F to A/B) was associated with a lower risk of predicted COVID-19 for all lag times evaluated (Table 2). The maximal association was first observed with a fourteen-day lag and the benefit plateaued beyond that time period (Figure 1). Compared to participants living in communities with overall poor social distancing (Grade=F), the multivariable HRs for predicted COVID-19 at 14 days were 0.85 (95% CI 0.77-0.95) for fair (Grade=D), 0.80 (95% CI 0.70-0.91) for good (Grade=C), and 0.69 (95% CI 0.55-0.86) for excellent (Grade=A/B) social distancing (Plinear-trend <.001) after adjusting for personal risk factors for COVID-19 (Table 2). There was a negative but not statistically significant association with a 0-day lag. When we further adjusted for county-level test positive COVID-19 incidence in the community at the time of assessment for the social-distancing measures, we observed similar results (multivariable HR, 0.67; 95% CI 0.53-0.85) for excellent social distancing (Grade=A/B) compared to participants living in communities with overall poor social distancing (Grade=F). For subsequent analyses, we focused on models using a fourteen-day latency since the reduction in predicted COVID-19 appeared maximal at 14 days, and this is considered a plausible interval for exposure to symptom-based disease prediction.
Risk of predicted COVID-19 according to community social distancing metrics and demographics
We also assessed the three individual components of the Unacast social distancing grade: including average distance traveled, non-essential visitation, and human encounters (Table 3). Reduction in average distance traveled (HR, 0.78; 95% CI 0.65-0.92 <25% versus >55%) and non-essential visitation (HR, 0.79; 95% CI 0.70-0.89 <55% versus >65%) were both associated with lower risk of predicted COVID-19. The reduction in human encounters, based on phone-to-phone proximity measures, was not associated with lower risk of predicted Covid-19. In subgroup analyses, the association of social distancing grade and COVID-19 appeared to differ according to age (Pinteraction=0.001). The benefit of increasing social distancing from Poor (F) to Excellent (A/B) was greatest among the middle-age participants (35-55 years, HR, 0.47; 95% CI 0.26-0.86), than among younger (age <35 years) or older participants (>55). We assessed for effect modification by other demographic including race, sex, and health problems limiting activities, and found no significant interactions between social distancing grades and these factors (all Pinteraction >0.05; Supplemental Table 2).
Furthermore, to evaluate whether the impact of social distancing on risk of predicted COVID-19 was modified by local transmissibility, we performed subgroup analysis according to Rt. During epidemic slowing/maintenance period (Rt ≤1.0), compared to participants living in communities with overall poor social distancing (Grade=F), the multivariable HRs for predicted COVID-19 were 0.88 (95% CI 0.75-1.02) for fair (Grade=D), 0.79 (95% CI 0.66-0.96) for good (Grade=C), and 0.64 (95% CI 0.48-0.87) for excellent (Grade=A/B) social distancing (Plinear-trend =0.004) after adjusting for personal risk factors for COVID-19 (Supplemental Table 5). This trend was also observed with similar magnitudes albeit with borderline significance (Plinear-trend =0.08) during the epidemic growth period (Rt >1.0).
Risk of predicted COVID-19 according to personal face mask use
We examined the association between self-reported personal use of a face mask and risk of predicted COVID-19 among the 139,690 participants who provided this information. Compared to individuals who reported never using a face mask, individuals who reported using (sometimes, most of the time, or always) a face mask had a multivariable HR for predicted COVID-19 of 0.35 (95% CI 0.30-0.42; Table 4).
Ever using a face mask was associated with reduced predicted COVID-19, with adjusted HRs of 0.31 (95% CI 0.11-0.82) among individuals living in communities with excellent or good social distancing grade, 0.29 (95% CI 0.19-0.45) for those living in communities with fair social distancing grade, and 0.37 (95% CI, 0.30–0.44) for those living in communities with poor social distancing grade (Pinteraction=0.57). The results remained similar after additional adjustment for actual COVID-19 incidence. Furthermore, observed associations were not substantially different when analyses were restricted to participants living in Texas, Arizona, California, and Florida (HR, 0.35; 95% CI 0.26-0.47), states which were among the states in which social distancing policy was relaxed earlier during the initial phase of the pandemic. Finally, the association of personal use of a face mask and predicted COVID-19 did not appear to substantially different according to Rt (Supplemental Table 6).
Discussion
In this prospective study of 198,077 participants using a real-time mobile phone application in U.S., we observed that individuals living in communities with the greatest social distancing had a 31% lower risk of predicted COVID-19 compared with those living in communities with poor social distancing, with maximum benefit evident after a latency period of 14 days. Furthermore, among individuals living in communities with poor social distancing, personal use of a face mask outside of the home at least sometimes was associated with a 63% reduced risk of predicted COVID-19 compared to individuals who never wore a face mask.
Notably, a reduction in average distance traveled and non-essential visitation in the community was protective for predicted COVID-19. In contrast, close contact as measured by human encounters was not associated with predicted COVID-19. This suggests that average distance traveled and non-essential visitation, as measures of independent mobility, may be more reflective of effective social distancing than measures based on assessing proximity between two devices. It is also possible that the criterion to define human encounters based on devices <50 meters apart may not be optimal to study COVID-19 transmission. In subgroup analysis, we did not observe the inverse associations between living in communities with the greater social distancing and risk of COVID-19 among individuals aged greater than 55 years, having health problems requiring stay-at-home, and regularly using mobility aids. For those individuals, living in a community with the greatest social distancing may not play an important role in reducing COVID-19 risk due to their limited mobility and lower likelihood of social interaction in crowded spaces. Noticeably, the inverse association between living in a community with greater social distancing and risk of predicted COVID-19 was most consistently observed among younger individuals without significant health problems or limitations in mobility.
We observed that the disease burden of COVID-19 at the start of the social distancing measurement did not influence the association of social distancing and personal use of a face mask with risk of predicted COVID-19. We also observed that protective effect of social distancing on predicted COVID-19 was present both in areas where the epidemic was slowing or maintained (Rt ≤1.0) as well as in areas where COVID-19 was actively spreading (Rt>1.0). We similarly observed that the benefit of personal use of a face mask was observed in regions and time periods in which there was epidemic slowing/maintenance or growth. These findings imply that baseline risk did not impact the relative benefits of social distancing policies and/or face mask use, although it is remains possible that the absolute reduction in risk is greater in areas with higher burden of COVID-19.
In our study, we used predicted COVID-19 as a proxy for a positive COVID-19 test due to the small number of COVID-19 test positive app users during the study period. The small fraction of positive COVID-19 tests among all participants (0.17%) may be largely influenced by the limited availability of COVID-19 testing during the study period. A recent study demonstrated that more than 80% of individuals with a COVID-19 infection in the U.S. went undetected in March 202019. Moreover, another study in 10 sites across U.S. reported that the estimated number of COVID-19 infections was 6 to 24 times greater per site than the number of reported from March 23 to May 1220. Nevertheless, despite the limited power, we found a protective but not statistically significant association between community social distancing and risk of a positive COVID-19 test (Supplemental table 3). Therefore, this association between the social distancing observed within one’s community and a positive COVID-19 test should be further investigated in studies in which there was a higher prevalence of testing.
Our findings are consistent with previous ecological studies investigating the effect of social distancing on risk of COVID-1921–28. In one recent study that also used estimates of social distancing based on Unacast data, each one-unit increase in social distancing was associated with a 29% reduced risk of COVID-19 incidence and a 35% reduced risk of COVID-19 mortality22 at the county-level. In a separate study, COVID-19 epidemic case growth rates declined by approximately 1% per day beginning four days after statewide social distancing measures were implemented21. In addition, estimated rates of COVID-19 cases were increased in border counties in Iowa which did not issue a stay-at-home order compared with border counties in Illinois which did issue a stay-at-home order23. Another study based on 149 countries demonstrated that any physical distancing intervention was associated with 13% reduced risk of COVID-19 incidence3. These finding add to this body of evidence as we estimate the impact of social distancing in the community on individual-level outcomes.
Other studies have shown that masking is associated with a lower risk of COVID-19 on a population-scale6,7,13,25,29. In one recent study among health care workers, universal masking in a hospital setting was associated with a lower rate of COVID-196,30. A recent meta-analysis demonstrated that face mask use was associated with a 85% reduced risk of viral infection causing COVID-19, SARS (severe acute respiratory syndrome), or MERS (Middle East respiratory syndrome) in both health care and non-health care settings7. While the role of a face mask in protecting other individuals is well-recognized, we observed that a face mask may also protect individuals who wear them, as has been described by others29. Alternatively, participants who generally are willing to wear a face (or self-report such) mask may also engage in overall healthier behaviors.
This study has several strengths. First, we used a mobile application to rapidly collect prospective data from a large population on known or suspected COVID-19 personal risk factors, such as mask wearing. This is a significant advantage over existing studies which cannot concurrently examine the impact of personal interventions to reduce exposure risk with community-scale data. Second, we collected data from participants initially free of a positive COVID-19 test and any symptoms, which allowed a prospective assessment of incident symptoms with minimal recall or collider bias. Third, we assessed COVID-19 incidence according to a validated symptom assessment which minimizes the biases associated with geographic variation in access31 to COVID-19 testing on estimates of COVID-19 incidence, which may bias effect estimates away from or towards the null (e.g. social distancing associated with reduced test access or increased test seeking behavior). This also allows us to better assess the impact of social distancing on COVID-19 according to different latency periods since it minimizes the time delay between onset of infection, obtaining a test, and reporting of the result, which has been estimated to be delayed by as long as a week in some areas of the U.S.32,33.
There are several limitations to our study. First, our information on risk factors and symptoms are collected by self-report. Although information based on clinical records and testing would be more accurate, given the rapid pace of the pandemic and the limited availability of medical care and testing, self-reported information is more feasible to collect longitudinally and prospectively among a large number of participants and minimizes recall bias or selection bias (e.g. preferentially capturing severe cases through hospitalization records or death reports). Second, since our cohort is not a random sampling of the population, there remains a possibility for selection or collider bias, as well as generalizability. We tried to minimize collider bias in this study by excluding participants who reported having COVID-19 symptoms prior to joining the app. Although this limitation is inherent to any study requiring voluntary provision of information, we acknowledge that data collection through smartphone adoption has comparatively lower penetrance among certain socioeconomic groups and older adults and that participants of an app study may have differential likelihood of reporting symptoms34. Third, it is possible that the personal risk factors for COVID-19 that we assessed here, such as wearing a face mask, may be confounded by other behaviors that reduce infection risk, as well as whether users are accurately self-reporting these behaviors. Fourth, the social distancing metrics used as an exposure are not reflective of actual user mobility. There may be non-differential misclassification of exposure status by region, if county-level factors are correlated with the individual-level heterogeneity of each mobility metric (e.g. younger app users in an urban area with high mobility). Fifth, our analysis was focused on symptomatic COVID-19. However, it is likely that an association between social distancing and masking with risk of asymptomatic spread would be similar. Lastly, while personal face mask use and other covariates were based on individual level data reported through the app, the social distancing measures are based on regionally aggregated data assigned to each app user.
In conclusion, within a large population-based sample of individuals in the U.S, we demonstrated a significantly reduced risk of predicted COVID-19 infection among individuals living in communities with a greater social distancing grade at 14 days either in regions or time periods experiencing either epidemic slowing or growth.
Among participants who lived in a community with poor social distancing, wearing a face mask was associated with reduced risk. These findings provide additional support for the efficacy of non-pharmaceutical interventions in reducing COVID-19 incidence and spread and suggest that the benefits of such interventions will become most evident at 14 days after implementation. Encouraging universal masking may be particularly important to limit the continued spread of COVID-19 as social distancing mandates continue to be relaxed.
Data Availability
Data collected in the app are being shared with other health researchers through the NHS-funded Health Data Research UK (HDRUK)/SAIL consortium, housed in the UK Secure e-Research Platform (UKSeRP) in Swansea. Anonymized data can be shared with bonafide researchers via HDRUK, provided the request is made according to their protocols and is in the public interest (see https://healthdatagateway.org/detail/9b604483-9cdc-41b2-b82c-14ee3dd705f6). Data updates can be found at https://covid.joinzoe.com.
DATA AVAILABILITY
Data collected in the app are being shared with other health researchers through the NHS-funded Health Data Research UK (HDRUK)/SAIL consortium, housed in the UK Secure e-Research Platform (UKSeRP) in Swansea. Anonymized data can be shared with bonafide researchers via HDRUK, provided the request is made according to their protocols and is in the public interest (see https://healthdatagateway.org/detail/9b604483-9cdc-41b2-b82c-14ee3dd705f6). Data updates can be found at https://covid.joinzoe.com.
ACKNOWLEDGEMENTS
We would like to thank to all of the participants who entered data into the app, including study volunteers enrolled in cohorts within the Coronavirus Pandemic Epidemiology (COPE) consortium. We thank the staff of Zoe Global Ltd., the Department of Twin Research at King’s College London, and the Clinical and Translational Epidemiology Unit at Massachusetts General Hospital.
Footnotes
CONFLICT OF INTEREST JW, RD, and JC are employees of Zoe Global Ltd. TDS is a consultant to Zoe Global Ltd. DAD, JM, and ATC previously served as investigators on a clinical trial of diet and lifestyle using a separate mobile application that was supported by Zoe Global Ltd. Other authors have no conflict of interest to declare.
FUNDING Zoe provided in kind support for all aspects of building, running and supporting the app and service to all users worldwide. ADJ is supported by the National Institute of Diabetes and Digestive and Kidney Diseases K01DK110267. DAD is supported by the National Institute of Diabetes and Digestive and Kidney Diseases K01DK120742. CMA is supported by the NIDDK K23-DK120899. LHN is supported by the American Gastroenterological Association Research Scholars Award. ATC is the Stuart and Suzanne Steele MGH Research Scholar and Stand Up to Cancer scientist. The Massachusetts Consortium on Pathogen Readiness (MassCPR) and Mark and Lisa Schwartz supported MGH investigators (SK, ADJ, CHL, DAD, LHN, CGG, WM, RSM, JM, ATC). King’s College of London investigators (KAL, MNL, TV, MG, CHS, MJC, SO, CJS, TDS) were supported by the Wellcome Trust and EPSRC (WT212904/Z/18/Z, WT203148/Z/16/Z, T213038/Z/18/Z), the NIHR GSTT/KCL Biomedical Research Centre, MRC/BHF (MR/M016560/1), UK Research and Innovation London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare, and the Alzheimer’s Society (AS-JF-17-011). MNL is supported by an NIHR Doctoral Fellowship (NIHR300159). JM is partially supported by the European Commission Horizon 2020 program (H2020-MSCA-IF-2015-703787) and by the National Institutes of Health (P30 DK40561). JEH is supported by NIH/NIEHS P30 ES000002. TV is supported by NIH/NIDDK K01 DK125612. Sponsors had no role in study design, analysis, and interpretation of data, report writing, and the decision to submit for publication. The corresponding author had full access to data and the final responsibility to submit for publication.