Abstract
Importance Optimizing the public health response to reduce coronavirus disease 2019 (COVID-19) burden necessitates characterizing population-level heterogeneity of COVID-19 risks. However, heterogeneity in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testing may introduce biased estimates depending on analytic design.
Objective Characterizing individual, environmental, and social determinants of SARS-CoV-2 testing and COVID-19 diagnosis.
Design We conducted cross-sectional analyses among 14.7 million people comparing individual, environmental, and social determinants among individuals who were tested versus not yet tested. Among those diagnosed, we used three analytic designs to compare predictors of: 1) individuals testing positive versus negative; 2) symptomatic individuals testing positive versus negative; and 3) individuals testing positive versus individuals not testing positive (i.e. testing negative or not being tested). Analyses included tests conducted between March 1 and June 20, 2020.
Setting Ontario, Canada.
Participants All individuals with ≥1 healthcare system contact since March 2012, excluding individuals deceased before, or born after, March 1, 2020, or residing in a long-term care facility.
Exposures Individual-level characteristics (age, sex, underlying health conditions, prior healthcare use), area-based environmental (air pollution) exposures, and area-based social determinants of health (income, education, housing, marital status, race/ethnicity, and recent immigration).
Main Outcomes and Measures Odds of SARS-CoV-2 test, and of COVID-19 diagnosis.
Results Of a total of 14,695,579 individuals, 758,691 had been tested, of whom 25,030 (3.3%) tested positive. The further the odds of testing from the null, the more variability observed in the odds of diagnosis across analytic design, particularly among individual factors. There was less variability in testing by social determinants across analytic design. Residing in areas with highest household density (adjusted odds ratio: 2.08; 95%CI: 1.95-1.21), lowest educational attainment (adjusted odds ratio: 1.52; 95%CI: 1.44-1.60), and highest proportion of recent immigrants (adjusted odds ratio: 1.12; 95%CI: 1.07-1.16) were consistently related to increased odds of COVID-19 across analytic designs.
Conclusions and Relevance Where testing is limited, risk factors may be better estimated using population comparators rather than test-negative comparators. Optimizing COVID-19 responses necessitates investment and sufficient coverage of structural interventions tailored to heterogeneity in social determinants of risk, including household crowding and systemic racism.
Question What are the social determinants of health that contextualize individual-level risks for coronavirus disease 2019 (COVID-19), and how do selection biases affect our understanding of these risks?
Findings In this province-wide observational study of 14.7 million Canadians, social determinants related to housing, education, and recent immigration were associated with increased COVID-19 risks, with little evidence of selection bias. Individual factors, such as underlying health conditions, were more prone to selection bias using certain analytic approaches.
Meaning Social determinants of health appear to drive COVID-19 incidence in Ontario, Canada. Interventions aiming to prevent COVID-19 transmission should address these empiric structural risks.
Introduction
The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 2019 (COVID-19), has resulted in a pandemic with heterogeneity in exposure and transmission risks.1-4 There has been greater focus on characterizing individual determinants of COVID-19, such as age,2,5 sex,6,7 and underlying health conditions,2,8 than on social determinants of health.9
Heterogeneity in social determinants of COVID-19 may exist at the individual and network levels (for example, by housing density10-12). In addition, social determinants including barriers to healthcare, systemic racism, and xenophobia have been implicated in COVID-19 risk.13,14 Environmental determinants such as ambient air pollution may also play a role, as existing evidence indicates that higher ambient air pollution increases risk for infection with other respiratory viruses.15 While previous studies suggest that air pollution may be biologically related to COVID-19,16 including severe COVID-19,17,18 it may also play a role in COVID-19 risk by operating within the context of low-quality housing and environmental racism.19,20
Using observational data to identify determinants of COVID-19 relies on SARS-CoV-2 testing, which is not equally distributed.7 Differential testing introduces the potential for selection biases,21,22 including collider bias.22 Collider bias may be introduced into observational studies if the determinants under investigation are related to both COVID-19 and the likelihood of testing.22-24
Thus, to identify heterogeneity across potential determinants of COVID-19, we examined individual, environmental, and social determinants associated with SARS-CoV-2 testing and diagnoses among 14.7 million individuals in Ontario, Canada. We compared three analytic approaches to examine the role of selection biases.22
Methods
Study population, setting, and design
We conducted an observational study using population-based laboratory and health administrative databases from Ontario, Canada. Ontario has a single-payer health system that provides universal access to hospital and physician services25 and laboratory testing.26 We used data from individuals tested between March 1 and June 20, 2020 to identify determinants associated with testing, and then used three analytic designs to identify determinants associated with laboratory-confirmed diagnosis of COVID-19.
Data sources, linkages, and inclusion criteria
We linked individual-level SARS-CoV-2 testing data from the Ontario Laboratories Information System (OLIS) to relevant health-related datasets containing demographic, healthcare use, and area-level information (eTable 1). These datasets were linked using unique encoded identifiers and analyzed at ICES, a not-for-profit research institute in Ontario.27
OLIS captured approximately 88% of all laboratory-confirmed COVID-19 cases reported by the province during the study period. OLIS records included specimen collection date, results, and a text field for symptoms completed by healthcare providers at the time of sampling. We obtained individual- and area-level demographic and environmental information from the Registered Persons Database; the Canadian Institute for Health Information’s Discharge Abstract Database, Same Day Surgery Database, and National Ambulatory Care Reporting System; the Ontario Health Insurance Program; the Ontario Mental Health Reporting System; the Ontario Population Health and Environment Cohort; and the 2016 Canadian Census.28 Further details are available below and in eTable 1.
To assess community transmission risks, we included individuals who were tested for SARS-CoV-2 by polymerase chain reaction tests and were not residing in a long-term care facility as of March 1, 2020 (Figure 1). For individuals with more than one test in OLIS, we used the first positive or indeterminate test, or the first negative test if all tests during the study period were negative. Never-tested individuals during the study period were included if they were not recorded as deceased before, or born after, March 1, 2020, and were not residing in a long-term care facility. A diagram of individuals included and excluded is shown in Figure 1.
Individuals were excluded if they had a listed age >105 years, had a listed postal code outside of Ontario, or had no record of contact with the healthcare system in the past 8 years (since March 2012).
Selection and definition of potential determinants of COVID-19 diagnoses
Individual-level determinants included sex, age group, underlying health conditions, and prior healthcare use (eTable 1). Underlying health conditions were selected as: a) health conditions identified in the literature as associated with COVID-19 severity2,29-32 or associated with symptoms similar to those caused by COVID-19, because severity and symptoms may lead to differential testing and thus, selection bias;33-38 or b) health conditions such as dementia that increase the need for personal care support, thereby reflecting network-level contact patterns that intersect with occupational risks among essential care providers.39,40
Healthcare use was hypothesized to increase access to testing, and/or signal a marker for comorbidities, and was measured by number of hospital admissions in the past 3 years; number of outpatient physician visits in the past year; and influenza vaccination in the 2019-2020 season. We also included ACG® System41 Aggregated Diagnostic Groups (ADGs)42 as a composite measure of comorbidities.
Environmental determinants included fine particulate matter (PM2.5) using satellite-derived estimates43 and land-use regression model for NO2,44 at the postal-code level.
Social determinants were conceptualized as area-based variables that may signal contact rates within and/or outside households [household density, uncoupled (for example, not married) status];45,46 socio-economic barriers to healthcare access and/or housing or proxies of occupation (household income, educational attainment, apartment building density, and social assistance);47,48 and unmeasured confounders related to race/ethnicity (visible minority status and recent immigration).13,14 These determinants were derived from available measures within the long-form 2016 Canada Census at the level of dissemination areas (DA), which are the smallest geographic unit that Census data are disseminated.49 DAs were ranked at the city level (for median per-person income equivalent) or at the province level (for all other social determinants), and then categorized into quintiles. For apartment building density and recent immigration, the high frequency of zeros permitted the creation of only three categories (i.e., comparing the fourth and fifth quintiles with the lowest 60%). Variable definitions are detailed in eTable1.
Statistical analysis
To identify determinants of testing, the outcome was defined as receipt of at least one SARS-CoV-2 test during the study period. The comparator group comprised Ontario residents who did not have a record of testing during the study period. Determinants of testing were examined in unadjusted, age/sex-adjusted, and fully-adjusted logistic regression models that included all determinants. The fully-adjusted model also included a fixed-effect covariate for public health region. These are geographic areas in which public health measures were differentially applied50 and along which there may be considerable variability in measured and unmeasured social determinants.51 We opted to include public health regions a priori in the model due to these predicted variations in public health response.
To identify determinants of COVID-19, the outcome variable was a laboratory-confirmed diagnosis. To explore the potential for effects of a variety of sources of bias from testing, we compared the results according to different comparison sub-groups, by characteristics that may have influenced the probability of being tested. We first compared individuals who tested positive to individuals who tested negative (“pseudo-test-negative design”). Second, we restricted the analysis to individuals with symptomatic illness, mirroring the test-negative design commonly used to assess influenza vaccine effectiveness (the “true test-negative design”).52 Third, we compared all individuals who tested positive to all individuals who did not test positive (i.e., to individuals who tested negative or were not tested; the “case-control design”). Potential mechanisms of selection biases in each analytic design are outlined in eFigure 1A-C.
We then conducted unadjusted, age/sex-adjusted, and fully-adjusted logistic regression models (including all determinants and public health region, as with testing) using each of the three analytic approaches to identify heterogeneity in individual, environmental, and social determinants of COVID-19 diagnosis. We interpreted each set of determinants (individual, environmental, social) as independent analyses based on the directed acyclic graphs (eFigure 1A-C). Statistical analyses were conducted using SAS v. 9.4 (Cary, NC).
Ethical review
The use of data in this project was authorized under section 45 of Ontario’s Personal Health Information Protection Act, which does not require review by a Research Ethics Board.
Results
Of 758,691 individuals tested during the study period, 25,030 (3.3%) received a laboratory-confirmed diagnosis (Figure 1). Only 11.8% of those tested had a symptom recorded by the provider, 13.6% were considered asymptomatic, and 74.6% were missing symptom information. Table 1 describes the characteristics of the study population.
Determinants of SARS-CoV-2 testing
In fully-adjusted analyses, odds of testing increased with age (Table 2). Males had lower odds of testing than females. Nearly every underlying health condition was associated with increased odds of testing, as were most measures of prior healthcare use. In contrast, higher measures of ambient air pollution were associated with reduced odds of testing. There was little variability in the odds of testing by most area-based social determinants. However, areas with higher visible minority populations had lower odds of testing, whereas areas with higher household income and greater percentages of uncoupled individuals had higher odds of testing. Effect measures across nearly all determinants appeared to be progressively attenuated from unadjusted, to age/sex-adjusted, to fully-adjusted regression models (eTable 2).
Odds of ever being tested for SARS-CoV-2 and of COVID-19 diagnosis in fully-adjusted analyses, using three analytic approaches, in Ontario, Canada between March 1 and June 20, 2020.
Variability in determinants of COVID-19 diagnosis across three analytic approaches
Choice of analytic design and comparison group restrictions had substantial influences on the magnitude and direction of the association between individual-level determinants and diagnosis (Table 2). Determinants associated with odds of testing further from the null appeared to be more variable in terms of odds of diagnosis across analytic designs. For example, the odds of testing for adults aged ≥85 years compared to those aged <5 years deviated considerably from the null (adjusted odds ratio [aOR]=5.57; 95%CI, 5.45-5.70), and the odds of diagnosis differed between the pseudo-test-negative design (aOR=1.72; 95%CI, 1.48-2.01), the true test-negative design (aOR=5.95; 95%CI, 3.57-9.92), and the case-control design (aOR=7.09; 95%CI, 6.08-8.26). Some health conditions associated with higher odds of testing, such as chronic respiratory conditions, and indicators of prior healthcare use appeared protective against COVID-19 using the pseudo-test-negative design, but reverted to the null or showed increased odds of diagnosis using the case-control design. As with testing, effect measures for COVID-19 diagnosis across all determinants appeared to be progressively attenuated from unadjusted, to age/sex-adjusted, to fully-adjusted regression models (eTables 3-5).
Individual determinants of COVID-19 diagnosis using the case-control design
Using the case-control design, age, certain comorbidities (e.g., hypertension, diabetes, dementia), and increased prior healthcare use were associated with increased odds of COVID-19. In contrast, other comorbidities (e.g., cancer, substance abuse) and receipt of influenza vaccine in the 2019-2020 season were associated with reduced odds of diagnosis (Table 2).
Environmental determinants of COVID-19 diagnosis using the case-control design
The two highest categories of PM2.5 exposure were associated with increased odds of diagnosis, whereas no categories of exposure to NO2 were associated with increased odds of diagnosis consistently across analytic approaches.
Social determinants of COVID-19 diagnosis using the case-control design
Area-level social determinants independently associated with COVID-19 included higher household income, increased receipt of social assistance, lower educational attainment, greater percentages of uncoupled individuals, higher household density, increased apartment building density, and greater percentages of recent immigrants.
Figures 2A and 2B highlight the changes from unadjusted to fully-adjusted models for the associations between area-level social determinants and testing and diagnosis, respectively. For testing, the associations were attenuated after adjustment for nearly all determinants. The association for household income reversed direction following adjustment. For diagnosis, the associations were attenuated after adjustment for all social determinants except for household density. Similar to testing, the association with COVID-19 for household income reversed direction following adjustment.
Forest plots show the odds of being tested for SARS-CoV-2, compared to the reference group (1st quintile or 1st category). In all cases, the first quintile/ category represents the lowest quintile of values for the variable at the dissemination area (DA)-level. For example, the lowest quintile of household income represents individuals living in the lowest 20% of DAs by income quintile. Age/ sex-adjusted models are shown in Table 2. Fully-adjusted models control for all variables listed in Table 2, including (but not limited to) all variables shown in this figure; effect estimates and 95% CIs for all models shown here can be found in eTable 2.
Forest plots show the odds of being tested for SARS-CoV-2, compared to the reference group (1st quintile or 1st category). In all cases, the first quintile/ category represents the lowest quintile of values for the variable at the dissemination (DA)-level. For example, the lowest quintile of DA-level income represents individuals living in the lowest 20% of DAs by income quintile. Fully-adjusted models are shown in the rightmost column of Table 2 and control for all variables listed in Table 2, including (but not limited to) all variables shown in this figure. Effect estimates and 95% CIs for all models shown here can be found in eTable 5.
Discussion
In this large-scale, population-based study, we identified social determinants as key potential determinants for a diagnosis of COVID-19. We identified variability in associations, likely due to selection biases, by analytic design across individual determinants of COVID-19 diagnosis.
We identified increased odds of diagnosis associated with household density, educational attainment, uncoupled status, and recent immigration, consistent with findings from other settings.51,53,54 In particular, the results concerning social determinants and COVID-19 risk presented here suggest these social determinants are not likely artifacts of selection bias from study design, given the consistency of findings across different analytic approaches. Lower educational attainment or uncoupled status may be associated with higher exposure risk through lower-paying jobs in the service industry55 and/or other high-exposure occupations, either because those jobs cannot be done feasibly with proper protections, or because protective policies fail to be issued, leaving workers at high risk.56,57 Higher percentages of recent immigrants in an area were associated with COVID-19 diagnosis, even after adjustment, although the percentage of visible minorities was not. Both variables might represent residual measures of mediators of systemic racism, potentiating relative risks of SARS-CoV-2 exposure and COVID-19 severity,58-60 including COVID-19-related hospitalization/death.14,21,30,54 We found that an association between visible minority and COVID-19 diagnosis in Ontario was attenuated after adjustment for individual, environmental, and other social determinants. These findings likely reflect what is already known about race and ethnicity as social constructs and social determinants of health.61 This is further supported by the attenuation of effect estimates from the unadjusted, to age/sex-adjusted, to fully-adjusted models. Finally, the fact that there was little variation in the odds of testing rates across levels of most of the social determinants suggests that testing resources may not be prioritizing those who are most at risk.62 However, even with this finding, our results indicate that social determinants of health play important roles in determining COVID-19 risk. Taken together, our findings suggest a need to increase and/or redirect resources that specifically address social determinants such as household density48,63 (e.g., voluntary temporary isolation hotels64), proxies for occupational risk56,60 (e.g., paid sick leave65), and other mediators of systemic racism62,66,67 (e.g. community-led outreach testing68).
We also identified the size and direction of influence that selection biases may have, adding to the ongoing conversation on the challenges of interpreting determinants for COVID-19 due to collider bias.22 Although we did not directly explore each potential collider mechanism, the pattern that emerged was that underlying health conditions associated with COVID-19 severity2 may have been prone to collider bias, as evidenced by higher testing and a seemingly protective effect when using the test-negative designs that reversed when using the case-control design. It is, however, also possible that a high number of covariates in fully-adjusted models caused instability in effect measure estimates for these covariates. Age also demonstrated a similar potential susceptibility to collider bias, possibly mediated by COVID-19 severity and symptoms (i.e., older age groups are more likely to develop severe/symptomatic infection if infected2). Over the study period, the testing criteria in Ontario shifted from returning symptomatic travellers to severely symptomatic individuals and those with occupational exposure to additional testing of asymptomatic individuals.33-36,38 In our study, the restriction of the test-negative design to symptomatic individuals did not yield substantially different results than the test-negative design including symptomatic and asymptomatic individuals, but that may have been partly due to the extensive (74.6%) missingness of symptom reporting in the testing data. We opted not to use multiple imputation methods to determine symptom information because, at the time the analysis was conducted, reported symptoms of COVID-19 were highly variable and the extent to which individuals may have asymptomatic or pre-symptomatic illness was unclear, limiting our confidence in the generalizability of existing information to individuals whose information was missing. Thus, while there is an intuitive desire to compare test positivity rates among those tested—a common metric of comparison across surveillance reports and dashboards69,70—the risks of erroneously identifying a “risk/protective factor”, particularly individual determinants such as underlying health conditions, sex, and age, are high and deserve careful interpretation by examining, as best possible, the reasons for testing.22 In the context of low overall levels of testing, the case-control design may have mitigated some potential sources of selection bias, with the assumption that those not tested are similar to those who tested negative.21,22 Additionally, we found that some underlying health conditions and prior healthcare use remained associated with diagnoses, reflecting either unmeasured confounders or possible biological susceptibility to infection if exposed;15,16,71-73 and suggesting that strategies tailored to reduce exposures among individuals characterized by these individual determinants could also be important in risk-tailored prevention.
Finally, these analyses identified PM2.5 as being related to the odds of COVID-19. It is likely that some of this effect is due to underlying social determinants of health and access to tests. Air pollution is often more intense, and results in worse health outcomes, in areas with higher social deprivation.74,75 However, existing studies have also implicated environmental pollution as having a biological relationship to the risk and severity of COVID-19 and other respiratory infections.15,16,19 Taken together with other findings regarding social determinants of health, these results indicate the importance of considering air pollution and particulate matter in both individual- and area-directed COVID-19 interventions.
Limitations
Our diagnoses were restricted to laboratory-confirmed cases and to the 88% of diagnoses available via OLIS, and thus could miss probable cases as well as the remaining laboratory-confirmed cases that may have different determinants of infection.
Results are also conditioned on the assumption that determinants remained constant across the study period, whereas surveillance data suggest shifts in how infections propagate between social networks.76 Future analyses include examining changes in the direction and magnitude of determinants over the course of the outbreak. Although we generated directed acyclic graphs for the general categories of determinants to help conceptualize and mitigate selection bias while also identifying plausible determinants, it is possible that we over-adjusted when interpreting individual variables within each category.77 Our models also adjusted for public health region, within which many social determinants cluster51, and thus we cannot infer from the results presented how social determinants of COVID-19 risk may vary between and within these geographic regions. Social determinants were measured at the area level and were not available at the individual level; however, by describing individuals’ neighbourhoods, analyses reflect the role of structural and environmental determinants for individuals living amongst them. Some important determinants identified in the prior literature, such as obesity,8,78 were not available for our study.79 Others, such as occupation, were not included due to uncertainty related to quantifying and appropriately grouping the occupational classifications available in the Census data at the area level.
Conclusion
Individual-level risks for COVID-19 defined by demographic and health-related determinants representing general targets of current response strategies appear to be subject to selection bias, including collider bias. Moving forward and advancing the response necessitates characterizing and addressing the social determinants potentiating heterogeneity in COVID-19 acquisition and transmission risks with risk-tailored, community-based interventions to reduce COVID-19 burden.
Funding Statement
This study was funded by the St. Michael’s Hospital Research Innovation Council COVID-19 Research Grant, and research operating grant (VR5 -172683) from the Canadian Institutes of Health Research. SM is supported by a Tier 2 Canada Research Chair in Mathematical Modeling and Program Science. JCK is supported by a Clinician-Scientist Award from the University of Toronto Department of Family and Community Medicine. This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The study sponsors did not participate in the design and conduct of the study; collection, management, analysis and interpretation of the data; preparation, review or approval of the manuscript; or the decision to submit the manuscript for publication.
Disclaimers
The opinions, results, and conclusions reported in this paper are those of the authors and are independent from the funding sources. Parts of this material are based on data and/or information compiled and provided by the Canadian Institute for Health Information (CIHI) and by Cancer Care Ontario (CCO). However, the analyses, conclusions, opinions, and statement expressed herein are those of the authors, and not necessarily those of CIHI or CCO. No endorsement by ICES, MOHLTC, CIHI, or CCO is intended or should be inferred.
Author contributions
JK, SM, and SB designed the study. ACa conducted all data analyses (dataset and variable creation and statistical modeling). JK, SM, SB, RK, ACa designed the analyses plans and conducted variable selection, with input from HC and ACK on variable selection and definitions. MAH, MD, LCR, TW contributed to analytic plans related to collider bias. BC contributed to data analyses and data preparation for the symptomatic dataset. MS prepared the figures. MS, JK, SB, and SM wrote the manuscript. All authors interpreted the data and critically reviewed and edited the manuscript.
Acknowledgements
We thank IMS Brogan Inc. for use of their Drug Information Database. Finally, we are grateful to the 14.7 million Ontario residents without whom this research would be impossible.
References
- 1.↵
- 2.↵
- 3.
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.↵
- 33.↵
- 34.
- 35.
- 36.↵
- 37.
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.