Abstract
Purpose The United States has the highest number of confirmed COVID-19 cases in the world to date, with over 94,000 COVID-19-related deaths1. The true risk of a COVID-19 resurgence as states prepare to reopen businesses is unknown. This paper aims to classify businesses by their risk of transmission and quantify the relationship between the density of super-spreader businesses and COVID-19 cases.
Methods We constructed a COVID-19 Business Transmission Risk Index based upon the frequency and duration of visits and square footage of businesses pre-pandemic in 2019 in 8 states (Massachusetts, Rhode Island, Connecticut, New Hampshire, Vermont, Maine, New York, and California). We used this index to classify businesses as super-spreaders. Then, we analyzed the association between the density of superspreader businesses in a county and the rate of COVID-19 cases. We performed significance testing using a negative binomial regression. The main outcome of interest is the cumulative number of COVID-19 cases each week.
Results We found a positive association between the density of super-spreader businesses and COVID-19 cases. A 1 percentage point increase in the density of superspreader businesses is associated with 5% higher COVID-19 cases, all else equal.
Conclusion Higher densities of super-spreader businesses are associated with higher rates of COVID-19 cases. This may have important implications for how states reopen super-spreader businesses.
1. Introduction
The United States has the highest number of confirmed COVID-19 cases in the world to date, with over 94,000 COVID-19-related deaths1. One reason has been the emergence of clusters of COVID-19 from super-spreader events and establishments2,3,4,5,6,7. Identifying potential super-spreader businesses has important implications for policymakers as they decide when and how to safely reopen non-essential businesses89. Baicker et al aimed to determine which industries or business establishments had a higher risk of transmission. The study raised important questions that individuals may face as businesses reopen, including the comparative risk of visiting different business establishments7.
There is a pattern to the events and places that have a high risk of transmission and are called super-spreaders. They are often indoor events with people in extremely close proximity to each other for a long duration of time. The risk of transmission in a closed establishment is 18.7 times higher than in an event in an open-air establishment10. Even though many people feel ready to to reopen the economy, experts have cautioned of the potential resurgence of the virus if we open our economy prematurely11,12,13,14,15.
Given the empirical evidence of the association between super-spreaders and transmission of COVID-19, it is crucial to evaluate which businesses may have the highest risk. In this study, we sought to identify the businesses that have the potential to be super-spreaders using data on frequency, duration, and density of visits pre-pandemic. We tested the hypothesis that US counties with higher densities of super-spreader businesses, as defined by our index, were at a higher risk of COVID-19 transmission and thus may require a careful reopening of businesses to minimize a resurgence of COVID-19 cases.
2. Data and Methods
2.1 Data
We use data from SafeGraph Monthly Patterns0 from January 1, 2019 - May 31, 2019 and SafeGraph Core Points of Interest data to measure business characteristics and traffic. Data on county-level COVID-19 cases and tests are from Johns Hopkins University and the New York Times. Socio-economic and demographic characteristics are collected from the 2018 American Community Survey from the United States Census Bureau. Businesses are classified by their 6-digit North American Industry Classification System (NAICS) code, developed by the United States Census Bureau.
2.2 Setting
This study focused on counties in 8 states (Massachusetts, Rhode Island, Connecticut, New Hampshire, Vermont, Maine, New York, and California). There are 187 counties, with a total population of 73,894,989. We examine traffic to 918,094 businesses from 307 different 6-digit NAICS codes from January 1, 2019 - May 31, 2019. We analyze COVID-19 cases in these counties from January 22,2020 - May 22,2020.
2.3 Index Construction and Super-Spreader Classification
We constructed a COVID-19 Business Transmission Risk Index using data on business characteristics and traffic by NAICS code from January 1, 2019 - May 31, 2019. The index was built using data on visitors per square foot, frequency of visits, and the average duration of visits. Visitors per square foot account for how densely visitors are packed into businesses. Businesses that are more densely packed may have a higher risk of COVID-19 transmission. The average duration of visits accounts for the length of time visitors are spending in a business. Businesses where visitors linger for longer periods of time could be riskier for COVID-19 transmission than businesses where visitors are quickly in and out of the business8,9,10.
The COVID-19 Business Transmission Risk Index is calculated for each 6-digit NAICS code in our sample by weighting the total visit time across all visitors from January 1,2019 - May 31,2019 by the square footage of the business establishment.
NAICS codes which fall in the top 5% of the Index are classified as super-spreader industries. We classify businesses in these industries as super-spreader businesses. This classifies 156,307 individual businesses as super-spreaders out of a total of 918,094 businesses.
2.4 Study Variables
The outcome measure is the cumulative number of COVID-19 cases each week per county. The independent variable is the density of super-spreader businesses in a county, which is measured as the number of super-spreader businesses out of the total number of businesses. Covariates included are counties’ racial composition (Black and White), population above 65 years, population below the poverty line, and population density per square mile.
2.5 Statistical Analysis
Univariate analyses were conducted to produce overall baseline characteristics and a report of the most common super-spreader businesses. Data were analyzed using a negative binomial regression at the county-level. The natural log of the total county population was included as an offset term. The model was adjusted for counties’ racial composition, percent of population above 65 years, percent of population below the poverty line, and population density. Additionally, an indicator variable for each state was included in an effort to adjust for differences in testing practices across states. It should be noted that this will not, however, account for differences in testing across states that vary overtime. Standard errors were clustered at the state-level.
Coefficients were transformed into incidence rate ratios (IRRs) and are reported with 95% confidence intervals (CIs). Statistical significance was determined by a p-value ≤ 0.05. All tests were two-tailed. Statistical analysis was performed using Stata SE version 14.2 (StataCorp).
3. Results
3.1 Summary Statistics
Summary statistics are reported in Table 1. In our sample, there were an average 28.83 cumulative cases of COVID-19 per 10,000 by May 22, 2020. The average density of super-spreader businesses in a county was 16.29 per 100 businesses. On average, 18.55% of a county was above the age of 65, 84.86% was White, 2.95% was Black, and 8.96% was below the federal poverty line. The average population density of a county was 299.49 people per square mile. Our study covered 8 states, 187 counties, with a total population of 73,894,989, and 918,094 businesses.
Figure 1 displays a map of Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut, New York and California, with total cumulative COVID-19 cases for each county as of May 22,2020 and locations of super-spreader businesses.
3.2 Main Results
Table 2 reports the most common super-spreader business types by NAICS code. The most common type of super-spreader business in our sample is full-service restaurants. These are restaurants where you’re seated, typically have a server, and pay after your meal is completed. There are 116,605 full-service restaurants in our sample. The second most common type of super-spreader business is limited-service restaurants with 26,196 in our sample. These are restaurants where you may pay at a counter prior to your meal. This would include fast food, delicatessens, sandwich shops, takeout restaurants, and pizza delivery. The third most common type of super-spreader business in our sample is hotels (except casino hotels) and motels with 13,432 of these businesses in our sample.
Table 3 reports the main results of our negative binomial regression measuring the association between super-spreader density and COVID-19 cases. We find a positive association between the density of super-spreader businesses and COVID-19 cases (adjusted IRR=1.05; 95% CI: 1.02-1.07). Our results suggest that an increase in superspreader businesses by 1 percentage point is associated with a 5% increase in COVID-19 cases, all else equal.
4. Discussion
4.1 Super-spreader Businesses and COVID-19
Our index attempts to quantify the risk of COVID-19 transmission at businesses based upon the frequency, duration, and density of visits. Businesses with more visitors that stay for longer and are more densely packed are likely to have higher risks of transmission8,9,10.
Knowing the density of super-spreader businesses will be very useful for policymakers. This can allow policymakers to help plan to reopen these super-spreader businesses in the safest way possible. Our index classifies restaurants as the most common type of super-spreader business. When planning to reopen, policymakers can consider more options to help restaurants reopen while mitigating the risk to the public. This could include more outside seating, limitations on the number of visitors at a time, and monitoring traffic to potential super-spreader businesses.
This study can also be useful for hospital decision makers. Knowledge of the density of super-spreader businesses in their service area and monitoring traffic to these businesses may help hospitals prepare for a potential second-wave if traffic increases to these businesses very quickly.
4.2 Limitations
There are several limitations to this study. First, COVID-19 cases are based upon a positive COVID-19 test. Thus, this will not account for individuals who may be COVID-19 positive but did not receive a test, either because of scarcity of tests or because they were asymptomatic. To help mitigate some of this bias, we continue to explore other measures of COVID-19 incidence at the county-level.
Second, while we control for population density at the county-level, there is variation in population density within counties that is likely correlated with both the variation in super-spreader business density and COVID-19 cases within counties. Thus, we are currently seeking out more granular data on COVID-19 cases in order to more accurately adjust for potential confounding by population density.
4.3 Future Work
Incorporating airflow and outside options into the Index, such as outside seating options for restaurants will be an important next step10.
Additionally, we are currently building an online decision-support tool that will allow policymakers and hospital decision makers to visualize potential super-spreader businesses in their area and monitor weekly traffic to these businesses. This can help policymakers and hospital decision makers plan for a potential second wave.
Finally, as states begin to reopen non-essential businesses in phases, we plan to evaluate the effects of these re-openings on COVID-19 transmission. We plan to measure the dynamic effects of reopening on COVID-19 cases. Knowing the effects of reopening can help future policymakers and hospital decision makers plan for the potential impact of reopening.
5. Conclusion
In conclusion, we built a COVID-19 Business Transmission Risk Index based on the frequency, density, and duration of visitors to businesses in 8 states. We find a positive association between the density of super-spreader businesses and COVID-19 cases in a county. We control for several socio-demographic and economic characteristics of counties, population density, and attempt to account for differences in testing across states.
This study can have important implications for policymakers as they consider how to most safely reopen these potential super-spreader businesses. We continue to work on acquiring more granular data to better account for confounding from population density. We also are in the process of building a tool for policymakers and hospital decision makers to monitor traffic to potential super-spreader businesses in their community as they begin to reopen.
Data Availability
The Johns Hopkins University COVID-19 data and the New York Times COVID-19 data are publically-available, linked below. SafeGraph's Monthly Patterns data and Core POI data are restricted access. The documentation for SafeGraph's datasets and the link to apply for access are linked below.
https://github.com/CSSEGISandData/COVID-19
https://github.com/nytimes/covid-19-data
https://docs.safegraph.com/docs/places-schema#section-patterns
Acknowledgements
We’d like to thank the MIT COVID-19 Challenge Datathon, where this project began.
Footnotes
↵0 SafeGraph is a data company that aggregates anonymized location data from numerous applications in order to provide insights about physical places. To enhance privacy, SafeGraph excludes census block group information if fewer than five devices visited an establishment in a month from a given census block group.