Abstract
Wastewater-based epidemiology is a promising public health tool that can yield a more representative view of the population than case reporting. However, only about 80% of the U.S. population is connected to public sewers, and the characteristics of populations missed by wastewater-based epidemiology are unclear. To address this gap, we used publicly available datasets to assess sewer connectivity in the U.S. by location, demographic groups, and economic groups. Data from the U.S. Census’ American Housing Survey revealed that sewer connectivity was lower than average for American Indian and Alaskan Native, White, non-Hispanic, older, not in poverty, and larger households, but smaller geographic scales revealed local variations from this national connectivity pattern. For example, data from the U.S. Environmental Protection Agency showed that sewer connectivity was positively correlated with income in Minnesota, Florida, and California. Data from the U.S. Census’ American Community Survey and Environmental Protection Agency also revealed geographic areas with low sewer connectivity, such as Alaska, the Navajo Nation, Minnesota, Michigan, and Florida. However, with the exception of the U.S. Census data, there were inconsistencies across datasets. Using mathematical modeling to assess the impact of wastewater sampling inequities on inferences about epidemic trajectory at a local scale, we found that in some situations, even weak connections between communities may allow wastewater monitoring in one community to serve as a reliable proxy for an interacting community with no wastewater monitoring, when cases are widespread. A systematic, rigorous assessment of sewer connectivity will be important for ensuring an equitable and informed implementation of wastewater-based epidemiology as a public health monitoring system.
Introduction
Wastewater-based epidemiology (WBE) plays an important role in surveillance of SARS-CoV-2 [1,2], polio virus [3,4] and other pathogens [5,6] and has applications to monitoring a variety of other public health concerns [7], including opioid usage [8]. One proposed benefit of wastewater-based epidemiology is that wastewater data is more representative of the population than case reporting, which can be biased towards those with health-seeking behavior or access to healthcare. For example, the populations served by the Health and Human Services SARS-CoV-2 National Wastewater Surveillance System were more representative of the entire US’s age distribution and Black and Hispanic populations than the vaccinated population [9].
While WBE offers convenient sampling of populations served by public sewers, about 20% of individuals in the US live in homes not connected to public sewers [10,11]. This includes those on decentralized wastewater systems and those with no wastewater treatment systems. The most common decentralized wastewater system is septic tanks [12], which collect and treat wastewater onsite, typically the yard of the home. Households without wastewater treatment systems may have outhouses or privies, chemical toilets, or no plumbing. Variability in sewer connectivity exists across the U.S., with more connectivity in urban areas than rural areas [10,13].
As with any emerging public health tools, it is important to ask to what extent the tool exacerbates or alleviates inequities. A few studies have evaluated equity of sewer connectivity on broad geographic scales. A 2017 study from Environmental Protection Agency Office of Water showed that households in the U.S. that earned less than the national median household income (MHI, $61,000) were almost 10% more likely to have a decentralized wastewater system or no wastewater treatment system compared to households that earned more than the MHI [14]. Additionally, this study found that as household income decreased, decentralized wastewater system usage increased in Florida, Hawaii, and Delaware, but not in Rhode Island. Based on the 2019 U.S. Census Bureau’s American Housing Survey, rural areas are less connected to sewers, and while the income in rural areas was lower than in metropolitan areas, households connected to septic tanks were wealthier than those connected to sewers in both rural and metropolitan areas [10].
However, we have an incomplete understanding of the factors associated with sewer connectivity across the US and the implications for the interpretation of wastewater data. In this study, we sought to address the following questions, focusing our analyses on the U.S.: (1) To what extent is there demographic and economic inequity in sewer connectivity? (2) Which geographic areas have low sewer connectivity? (3) What is the applicability of WBE data to neighboring unsampled communities? To address the first question, we analyzed household-level data of sewer connectivity stratified by geographic, demographic, and economic variables. For the second question, we evaluated datasets aggregated at the county or county subdivision levels to qualitatively identify geographic areas that have low sewer connectivity. To address the third question, we used a mathematical model to simulate WBE in two interacting populations.
Methods
Dataset compilation
We used a convenience sample of publicly available datasets on sewer connectivity, septic connectivity, or no plumbing at the household level. The types of data included household-level data, locations of sewer systems and population served, and population served by or lacking wastewater systems aggregated by county or county subdivision. All datasets are summarized in Table 1 and described in more detail in the Supplementary Information.
Calculation of weights, error bars, and summary statistics in American Housing Survey (AHS) data
All summary statistics were calculated by applying the weights reported in the AHS Public Use File [19]. Because not every household was sampled and there was uneven sampling of households across the U.S., the weight estimates the number of similar households that each surveyed household represented. Error bars on summary statistics represent the middle 95% of values calculated using the reported replicate weights (160 replicate weights per household). Each summary statistic was calculated for each replicate separately before taking the middle 95% of values. Only categories with at least 5 households were used.
Analysis of sewer connectivity by urban or rural areas in AHS data
We used the most recent AHS National Sample with publicly available data on the urban or rural location of the household (2013) which categorized households into being in a central city of a Metropolitan Statistical Area (MSA), in an urban area of an MSA but not in a central city, in a rural area of an MSA but not in a central city, in an urban outside of an MSA, or in a rural area outside of an MSA [27]. Central cities are defined as having either (1) ≥250,000 population or at least 100,000 people working within corporate limits, (2) ≥25,000 population, at least 75 jobs for each 100 residents who were employed, and 60% or fewer of the city’s resident workers commuted to jobs outside, or (3) 15,000-25,000 population, at least one third the size of the metropolitan statistical area’s largest city, and met the two commuting requirements in (2). Metropolitan Statistical Areas were defined as whole counties that have significant levels of community and contiguous urban areas in common and includes rural areas. Urban areas were defined as having ≥2500 people, with at least 1500 residing outside of institutional group quarters. Rural areas were defined as those not in urban areas.
Analyses of geographic areas with low connectivity to sewer
Data of individual locations of septic tanks or sewer collection systems, when they existed, were aggregated based on latitude and longitude into county or county subdivision boundaries as defined by the U.S. Census’ TIGER shapefiles of the corresponding year [28]. To calculate the fraction of a geographic area connected to septic or sewer, the total population size or total number of households in that geographic area was taken from the corresponding year in the American Community Survey (ACS) 5-year estimate [20]. For county subdivision and census designated place analyses, only areas with at least 5 households and 20 population size were used for the analyses. Detailed data analyses of the state and island area datasets are described in the Supplementary information.
Counties and county subdivisions were classified into Metropolitan statistical areas (at least one urbanized area of 50,000 or more population), Micropolitan statistical areas (at least one urban cluster of 10,000-50,000 population), or Rural areas (all other areas) using the U.S. Census classification of the encompassing Core Based Statistical Area (core area containing a substantial population nucleus together with adjacent territory that has a high degree of social and economic integration with the core as measured by commuting ties) [29].
Demographic and economic variable correlations with sewer or septic connectivity
To assess correlations of demographic and economic variables with sewer and septic connectivity when aggregating at the county, county subdivision, or census designated place levels, we used the ACS 5-year estimates from the corresponding year and geographic scale. Only geographic subdivisions with at least 5 households and 20 population size were used. The Pearson correlation coefficient between demographic or economic variables of interest and the percentage of households in a geographic subdivision connected to septic tanks or sewer was calculated using the stats.pearsonr function in the scipy package (version 1.10.1) in Python (version 3.11.0). A Bonferroni correction was applied to correct for multiple hypothesis testing to calculate the q values.
Simulations of interacting populations and wastewater sampling
Two interacting populations (A and B) were modeled in a deterministic compartmental model with susceptible, infected, and recovered (SIR) compartments for each population and a fraction, ε, of cross-population contacts. The base parameters were set at population size NA=NB=5000, recovery rate γI=0.18 inverse days (corresponding to an infectious period of 5.6 days), basic reproduction number and were varied in sensitivity analyses. The differential equations were solved using the integrate.odeint function in the scipy package (version 1.6.2) in Python (version 3.11.0) using a timestep of 0.1 days.
Wastewater sampling was modeled by the number of copies of pathogen genetic material shed per day by each population divided the volume of sampled wastewater produced per day by each population. Both numerator and denominator depended on the fraction of waste produced (including shed pathogen genetic material) by each population that is sampled by wastewater, fA and fB. A detailed description of the simulations is provided in S1 Appendix.
Code availability
All code and data to reproduce the analyses and simulations are available at https://github.com/gradlab/wastewater_equity.
Results
Is there inequity in who is sampled by wastewater?
To address this question, we used the American Housing Survey (AHS), a biannual longitudinal survey of representative housing units within the U.S that reports household-level data of sewer connectivity, demographic factors, and economic factors (Table 1). We focused our analyses on geographic scales with a reliable sample size for statistical analyses (census division, urban/rural at the national level, metropolitan statistical areas) [30].
According to the 2021 AHS, 83% of US households were connected to public sewers and 16% to standard septic tanks (Fig S 1). Each of the other forms of sewage disposal (non-standard septic tanks, chemical toilets, outhouses or privies, other, none, or not reported) was more than an order of magnitude less common (Fig S 1). Across census divisions in 2021, the western US was overall better connected to sewers than the eastern US, with the highest levels of sewer connection in the Pacific (91.3% of households, CI: 90.8%-91.7%) and Mountain (87.6% of households, CI: 86.8%-88.4%) divisions and the lowest levels of sewer connection in the New England (72.9% of households, CI:72.1%-74.0%) and East South Central (74.9% of households, CI: 73.7%-76.3%) divisions (Fig 1).
Across census divisions, households with an Asian, Black or African American, or Native Hawaiian and Other Pacific Islander householder (the owner or renter of the unit) were on average 14.0%, 13.1%, and 12.5% more connected to sewers, respectively, than the overall census division; however, specifically in the Pacific census division, households with a Native Hawaiian and Other Pacific Islander householder were 5.3% less connected to sewers than the overall census division (Fig 2). Households with an American Indian and Alaska Native or White householder were both on average 2.6% less connected to sewers than the overall census division. Across census divisions, households with a Hispanic householder were on average 11.7% more connected to sewers whereas households with a non-Hispanic householder were 1.1% less connected to sewers than the overall census division.
Connection to sewers decreased with the age of the householder (Fig 2), with the exception that households with a householder aged 75+ were better connected in most census divisions than those with a householder aged 65-74 (79.7% vs 77.9% of households). Households with 1 person were on average 4.2% more connected whereas households with 2 people were on average 3.6% less connected to sewers than the overall census division. There were no consistent trends across census divisions for households with more than 2 people. Additionally, households that were in poverty were better connected to sewers (average of 87.5% of households across census divisions) than households not in poverty (82.7% of households across census divisions; Fig 2). The threshold used to determine whether a household was in poverty depended on family size and age of persons in the household; for example, in 2021 the threshold for 1 person under 65 years is $14,097 and for 4 persons with two children is $27,479 [31].
We sought to describe how the differences in sewer connection between demographic and economic groups observed at the census division level was affected by household location in urban or rural areas (see Methods). The qualitative trends observed at the census division level for race, ethnicity, age, household size, and poverty level were largely preserved across national urban and rural areas (Fig 3). One notable exception was that lower sewer connectivity for households with an American Indian and Alaska Native or Hawaiian and Other Pacific Islander householder compared to those with a Black or Asian householder was only observed in locations inside of an Metropolitan Statistical Area (MSA) but outside of a central city (both urban and rural). In central cities and locations outside of an MSA (both urban and rural), households with an American Indian and Alaska Native or Hawaiian and Other Pacific Islander householder had similar levels of sewer connectivity as households with a Black or Asian householder.
Next, we assessed sewer connectivity in large metropolitan areas, which was the smallest geographic scale resolved by the AHS due to sample size. In 35 of the largest metropolitan areas in 2019 and 2021, household connectivity to sewers ranges from 69.2% (Birmingham-Hoover, AL) to 99.4% (LA-Long Beach-Anaheim, CA) (Fig S 2). In each metropolitan area, households with a White or American Indian and Alaska Native householder, with an older householder, not in poverty, with a larger household size, and with a non-Hispanic householder were less connected to sewers than the average household, which was similar to what was seen when stratifying by census division. Exceptions were lower connectivity for households with a Hispanic householder in San Jose, CA (Hispanic householder: 96.1% of households, CI: 95.1%-97.2%; non-Hispanic householder: 97.8% of households, CI: 97.5%-98.2%) and lower connectivity for households in poverty in San Francisco, CA (in poverty: 97.7%, CI: 96.7%-98.7%; not in poverty: 99.3%, CI:99.1%-99.5%); Dallas, TX (in poverty: 92.5%, CI: 90.5%-94.5%; not in poverty: 95.1%, CI: 94.7%-95.6%); Chicago, IL (in poverty: 92.8%, CI:91.2%-94.3%, not in poverty: 94.8%, CI: 94.3%-95.3%); and Tampa, FL (in poverty: 83.9%, CI: 81.2%-87.0%; not in poverty: 88.8%, CI: 88.1%-89.6%) (Fig S 2).
In summary, at the census division level, broad trends of sewer connectivity by racial group revealed higher connectivity for households with an Asian, Black or African American, Native Hawaiian and Other Pacific Islander, or Hispanic householder and lower connectivity for households with an American Indian and Alaska Native or White householder. Sewer connectivity decreased with age, was highest in households with one person, and higher for households in poverty. These trends were similar when stratifying by urban, rural, central city, and MSA status, except that the broad racial disparities described above were only observed in urban and rural areas that are inside of an MSA but outside of a central city. Broad trends were also similar for individual large metropolitan areas, with some exceptions for households with a Hispanic householders and households in poverty in particular MSAs.
Which geographic areas have low connectivity to sewers and what are their demographics?
Next, we sought to assess which geographic areas in the U.S. have low connectivity to sewer systems. These are areas where wastewater-based epidemiology may provide little direct information on disease dynamics. Additionally, we assessed correlations of sewer connectivity at the county or county subdivision level with demographic and economic factors. The compilation of the datasets and an assessment of data completeness are described in the S2 Appendix, Table 1, and Table S 3.
Using data from the 2021 American Community Survey on the percentage of occupied housing units lacking complete plumbing facilities by county, we defined the lower bound on the percentage of the population not connected to sewers. In 14 counties, ≥10% of occupied housing units lacked complete plumbing facilities; of these counties, 12 were in Alaska and 2 were in the Navajo Nation. In 5 counties, ≥20% of occupied housing units lacked complete plumbing facilities, all in Alaska. The Yukon-Koyukuk Census Area in Alaska, had the highest percentage of occupied housing units lacking complete plumbing facilities at 36% (Fig S 3, Table S 1).
The Florida Department of Health onsite sewage treatment and disposal systems inspection data (Fig 4) revealed that the percentage of households not on septic tanks (expected to correlate with the percentage of households connected to sewer) was higher in metropolitan compared to micropolitan county subdivisions and higher in micropolitan compared to rural county subdivisions, but the effect was not significant (p = 0.052 and p = 0.37, respectively) (Fig S 4a-b). The percentage of households not on septic tanks was significantly lower in county subdivisions in the panhandle (74.0% vs 92.0%, p<10-9) and in county subdivisions that did not border the coast (86.6% vs 95.7%, p<10-9) (Fig S 4). By combining this dataset with the 2012 American Community Survey (ACS) describing demographic and economic characteristics of county subdivisions, we found that county subdivisions with less septic usage (suggesting more sewer connectivity) were significantly more Hispanic, more Asian, and had a smaller household size (Table 2 and Fig S 5), consistent with the nation-wide results from the AHS. These trends were driven by metropolitan county subdivisions (Fig S 6), and an additional trend in metropolitan county subdivisions not observed for all county subdivisions was that county subdivisions with less septic usage had a significantly lower percentage of American Indian and Alaska Natives. There were too few micropolitan and rural county subdivisions to observe significant trends (Fig S 7 and Fig S 8).
In the 8 states with robust data in the EPA Clean Watersheds Needs Survey, multiple neighboring counties in Florida, Michigan, and Minnesota had less than 20% of residents receiving sewage collection, whereas this was not the case in California, Florida, Iowa, Maryland, New Jersey, and New York (Fig S 9). Across multiple states, sewer connectivity by county was positively correlated with percent Asian and percent Black or African American and negatively correlated with percent White (Table 3), consistent with the nationwide trends from the AHS data. While counties more connected to sewers appeared to have a significantly lower percent of American Indian and Alaskan Natives in California, this is affected by missing data from Indian reservations. Age and percent without health insurance were negatively correlated with sewer connectivity. Counties more connected to sewers had significantly higher income in Minnesota, Florida, and California, in contrast to the national results from the AHS data. Significant trends of sewer connectivity with demographic and economic factors were predominantly driven by metropolitan counties (Table S 2).
Analysis of the Island Areas of Guam, the Northern Mariana Islands, the Virgin Islands, and American Samoa revealed lower levels of sewer connectivity than in the states, with considerable spatial variability in connectivity (see S2 Appendix).
Factors influencing applicability of wastewater-based epidemiology to communities lacking sewer connection
The inequities in sewer connections among communities raise the question of the applicability of inferences drawn from wastewater data in one community to communities lacking sewer connections. To explore this question, we used a deterministic compartmental model of two interacting populations with tunable levels of interaction, each with susceptible-infected-recovered (SIR) dynamics, and with wastewater sampling in each population into a common sample (Fig 5a). We set population A to be entirely connected to sewers and population B to have a tunable level of sampling by wastewater.
A common application of wastewater data is to aid in determining when an outbreak has peaked, which can inform policy decisions on when to ease restrictions. We first asked how well the wastewater data could predict the time of peak infections in population B when it is completely unconnected to sewer systems and thus the only sampling is from population A. We found that when the two populations have similar sizes, the wastewater concentration peak and the infection peak in population B was within one generation time except when the interactions between the two population is weak (Fig 5b). The generation time is the average time between an individual’s infection and transmission, which in an SIR model is the same as the infectious period. For example, in two populations with 5000 individuals each, a disease basic reproduction number of 1.5, and infectious period of 5.6 days, the percentage of contacts that occur across populations must drop below 4% before the peak in wastewater concentration occurred more than one generation time apart from the peak in infections in the unconnected population (Fig 5c).
We then weakened the assumption that population B had no contribution to the sampled wastewater and allowed a fraction of population B to be sampled by wastewater. For example, individuals in population B may commute to a workplace sampled by wastewater, or a fraction of population B may be connected at home. As the fraction of population B that is sampled by wastewater increases, the time between infection and wastewater peaks decreases; however, the fraction of B that is sampled has a much weaker effect on the wastewater peak time than the interaction parameter between the communities. For example, with the above parameters and equal population sizes, with 10% sampling of population B, the cross-population contacts must drop below 4%, and with 50% below 3% and with 100% below 2% before the wastewater concentration peak and the peak in the number of infections in population B drops below one generation time.
Varying the population size of both populations (with an equal population size between the two populations), the recovery rate or generation time, and the R0 of both populations (with an equal R0 in the two populations) has only a weak effect on the discrepancy between wastewater concentration and population B infection peaks (Fig S 10).
When the unconnected population has a smaller size than the connected population, then even substantial interactions between the two populations can lead the wastewater concentration to peak before the infections in the unconnected population peaks (Fig S 10). Both very weak and very strong interactions cause the wastewater peak to coincide with the peak number of infections in the unconnected population.
Additionally, when the unconnected population has a smaller R0 than the connected population (which could occur for instance if the unconnected population was more spread out geographically and thus had a lower contact rate), the outbreak in population B peaks later than the wastewater peak leading to a discrepancy in the peak times (Fig S 10).
Discussion
Equity of sewer connectivity
Our analysis of available datasets revealed considerable variability in sewer connectivity within and between communities, across locations, demographics, and economic statuses in the United States. The western US had higher levels of sewer connectivity than the eastern US. Across the US, lower connectivity to sewers was observed for American Indian and Alaska Native householders; White householders; non-Hispanic householders; older populations; and larger households in most census divisions. In the Pacific census division, lower connectivity to sewers was observed for Native Hawaiian and Other Pacific Islander householders specifically in areas inside of a metropolitan statistical area but outside of a central city. Across census divisions, households not in poverty were less connected to sewers, consistent with the observation that, nation-wide, higher income households were less connected to sewers [10]. Aggregating by large metropolitan areas yielded similar trends as when aggregating by census divisions, except for lower connection for Hispanic householders in San Jose area and for households in poverty in San Francisco, Dallas, Chicago, and Tampa metropolitan areas, whereas these groups were better connected to sewers at the census division level. The differences in the MSA-level data suggested heterogeneity in connectivity across locations.
Geographic variation in sewer connectivity
Large parts of Alaska and the Navajo Nation lacked plumbing. This is consistent with the finding from the Annual Report to Congress on Sanitation Deficiency Levels for Indian Homes and Communities in 2019 that approximately 20% of Indian homes in Alaska do not have sewage disposal and about 10% of Indian homes in the Navajo Nation do not have sewage disposal [33].
State-level datasets revealed that sewer connectivity varied between and within communities. While in some states most areas had sewer connectivity (California, Iowa, Maryland, New Jersey, New York), in others there were large regions that lacked connectivity (Minnesota, Michigan, Florida). Asian and Hispanic populations were less connected to septic tanks across county subdivisions in Florida; however, there was considerable variability across the state and potential biases in the data. In contrast to the national data, in Minnesota, Florida, and California, counties with lower median household incomes were less connected to sewers, suggesting variability across the nation. This result in Florida is consistent with the observation that as income levels rose, household decentralized system usage declined in Florida [14]. However, we did not observe any significant correlation of income with septic usage in the Florida Department of Health onsite sewage treatment and disposal systems inspection permitting dataset, possibly due to not all septic tanks having a permit that is reported in this dataset.
While these correlations may be largely driven by differences between urban and rural areas within a county or county subdivision, they may still be very useful in assessing inequities in sewer connectivity in large catchments that span neighboring counties or county subdivisions.
Data limitations
Substantial data gaps and biases prevented a comprehensive analysis of disparities in sewer system connectivity within and between communities across the US, particularly for small communities, tribal lands and Alaska Native Villages, and state and local geographic scales. The current design of the American Housing Survey has too small of sample size to study state and local geographic scales [30]. Additionally, the AHS may have underestimated the fraction of households using septic systems due to its survey design (Table 1). The differences between Florida state data supports the need for detailed and reproducible longitudinal studies of sewer connectivity (both in developing sampling and analysis methods).
Public health officials may find it beneficial to conduct local equity studies of sewer connectivity. Data on the catchment area sizes of the wastewater treatment plants would allow a better understanding of the geographic extent of the sampled population and the level of geographic aggregation needed to study sewer connectivity inequities. A recent study mapping wastewater treatment plant catchment areas and their population sizes served in New York state provides a template for how to create these sewer catchment maps from a combination of permitting, survey, and tax record data [13]. If possible, it would be helpful to consolidate and standardize existing data where it exists (for instance, sewage disposal permitting at the local or state levels). Additionally, if the EPA’s request to expand the question on the ACS about access to plumbing facilities to additionally ask about the type of plumbing facility [14,34] is granted, then this would be a valuable dataset with the geographic and temporal resolution to study equity in sewer connectivity moving forward.
Generalizability of wastewater-based epidemiology data in light of heterogeneities in sewer connectivity
Our modeling results suggest that even weak interactions between two communities allow wastewater monitoring in one community to serve as reliable proxy for the time of maximum infections in the other community when the population sizes and R0 of the two populations are comparable, but not when the unconnected population has a substantially lower population size or R0. In the scenario with unequal population size, outbreaks that are seeded at the same time in the two populations in the absence of interactions peak earlier in the unconnected population due to its smaller size. With weak interactions, seeding of infections from population A to B occurs slower, leading to more coincidence in peaks; with strong interactions, then the dynamics of B are dominated by those in A, leading to more coincidence in peaks; at intermediate interaction strengths, infections are seeded earlier and peaks early, leading to the largest discrepancy in wastewater and infection peaks.
As the purpose of our model was to explore the impact of factors on the generalizability of wastewater rather than to accurately capture the dynamics in all scenarios, we made simplifying assumptions, including that all infected individuals shed the same amount of pathogen genetic material into wastewater and shed only during the infected period; that all individuals contribute equally to wastewater; and that pathogen detection in wastewater has perfect sensitivity.
Our results suggest that in assessing the generalizability of wastewater data, it would be useful to estimate the extent of mobility between connected and unconnected communities. Interestingly, Ref [35] found no correlation between the size of a catchment area and the correlation of wastewater with case data for SARS-CoV-2, consistent with the result that interactions between connected and unconnected communities cause the disease dynamics to look similar to the wastewater data. In regions without sewer connectivity or with little interaction between connected and unconnected communities, wastewater data from neighboring communities will be less informative. In areas with low sewer connectivity in households, sampling wastewater outflow at frequently visited non-household locations (i.e., schools, offices, malls, etc.) may capture a more representative population.
Additional considerations for equity in WBE
In addition to inequities in sewer connectivity, the sewer locations used for wastewater sampling should be considered to promote demographic equity and ensuring the ability to capture spatiotemporal trends [36,37]. While we have focused on analysis on the US due to data availability, internationally disadvantaged populations are associated with lower access to sewers [38]. The ongoing development of wastewater sampling in non-sewered settings [39], for example in water channels in Las Vegas Valley [40], onsite sanitation facilities in Bangladesh, a refugee camp in Lebanon [41], and various non-sewered settings in low and middle income countries [42], represents a critical area for research and development.
Conclusions
In summary, while wastewater-based epidemiology is a useful tool to monitor disease burden and dynamics, our analyses suggest that access to this new tool varies across the US. More comprehensive data on sewer connectivity is needed, and in combination with assessments of mobility and population parameters, these data can help with the design of wastewater sampling schemes and the interpretation for epidemic trends in sampled and neighboring unsampled communities.
Data Availability
All code and data to reproduce the analyses and simulations are available at https://github.com/gradlab/wastewater_equity.
Supporting information
S1 Appendix. Supplementary method
S2 Appendix. Supplementary analyses. Compilation of data, assessment of data completeness, sewer connectivity in U.S. Island Areas, supplementary figures, and supplementary tables.
S1 Appendix. Supplementary methods
Dataset descriptions and data cleaning
U.S. Census Bureau American Housing Survey 2019 and 2021
The American Housing Survey (AHS) National and Metropolitan Public Use Files were downloaded from Ref. [11,15,16]. The sample design, weighting, and error estimation survey is described in Refs. [17–19]. Briefly, the AHS is a biannual survey that assesses housing characteristics. Two samples of households are chosen every survey cycle, the Integrated National Sample and the Independent Metropolitan Sample. The Integrated National Sample consists of nationally representative households and includes an oversample of each of 15 largest metropolitan areas. The Independent Metropolitan Sample consists of representative households in 20 large metropolitan areas that are representative of the next largest 50 metropolitan areas. Ten of the 20 large metropolitan areas are sampled every other survey cycle (10 metropolitan areas surveyed every 2 years). The Integrated National Sample in 2019 included 86,257 selected representative housing units, of which about 63,500 met the AHS definition of a housing unit and were able to be interviewed, and in 2021 included 95,295 selected representative housing units, of which about 64,100 were interviewed. The Integrated Metropolitan Samples in 2019 and 2021 included around 3000 selected representative housing units in each metropolitan area, of which around 1000-3000 were eligible and were able to be interviewed (same criteria as in the Integrated National Sample).
Geographic data of each household available in the Public Use File included the census division; core based statistical area (CBSA) for the 15 largest metropolitan areas and the 20 large metropolitan areas in the metropolitan data. All other households are classified as in “all other metropolitan areas” or “not in a metropolitan area” (all other locations).
The race, ethnicity, and age of the householder (the person or one of the persons who is an owner or renter of the unit) was used to categorize the race, ethnicity, and age of the household. Due to the low numbers of households where the householder responded as having more than one race, we focused on households where householders identified as having a single race.
U.S. Census Bureau American Housing Survey National Sample 2013
The American Housing Survey National (2013) Use File was downloaded from Ref. [32]. The sample design, weighting, and error estimation survey is described in Ref [43]. The National Sample included 84,400 selected housing units representative nationally and including a supplemental sample of 15,553 housing units in the Chicago, Detroit, New York City, Northern New Jersey, and Philadelphia metropolitan statistical areas, of which about 71,600 met the definition of a housing unit and were able to be interviewed.
Geographic data of each household available in the Public Use File included the census division; census region; central city, urban, or rural status; and select metropolitan area codes. All other housing units are classified into “not in a metropolitan area” or had their code suppressed for confidentiality [27].
Florida Department of Health onsite sewage treatment and disposal system inspections
The locations of onsite sewage treatment and disposal systems inspected by Florida Department of Health reported June 2012 was downloaded from Ref. [22]. As of 2022, an inspection permit is required for construction [44] and inspection is recommended every 3 to 5 years after [45]. The number of systems inspected that are active at the latest inspection (after removing repeat inspections) is 601,430. The years of inspection are primarily 1998-2011, with some outliers as early as 1900, which may be old septic systems or typos, and as late as 2200, which are likely typos.
Utah Municipal Wastewater Planning Program (MWPP) Survey 2021
Data from the Utah Municipal Wastewater Planning Survey was provided by the Utah Department of Environment Quality, Division of Water Quality. The dataset includes an estimate of the population receiving collection by each utility that owns or operates a sanitary sewerage system and its latitude and longitude. The data gathered from treatment plant operators and participation is reported as mandatory [46], although only 71% of contacts responded [47]. We filtered the data to include any facility that has collection (this included collection only, treatment and collection, or small lagoons which are often collection and treatment). Manual correction of facility longitude and latitude is described in the supplementary methods.
Manual editing of facility longitude and latitude was done for 5 facilities’ that did not fall within the state of Utah when combining with the 2021 TIGER shapefile (BRIAN HEAD TOWN, CANYONG LAND IMPROVEMENT DISTRICT, HILL AFB (AMERICAN WATER), JORDANELLE SSD, PLAIN CITY) and 1 facility that did not report a latitude and longitude (PANGUITCH LAKE S. S. D.). For these facilities, the latitude and longitude were manually edited to be that of the location found in a Google Maps search of the facility name. If the Google Maps search found a city, then an arbitrarily chosen location in that city was used to obtain a latitude and longitude.
Minnesota Wastewater Infrastructure Needs Survey (WINS) 2021
Data for the Minnesota Wastewater Infrastructure Needs Survey [48] was provided by the Minnesota Pollution Control Agency. Communities (places and townships) responded to the question “Does your community have a collection system?” with “yes”, “no”, or no response. The survey did not receive a response from every community in Minnesota.
The Census Designated Place (CDP) name was obtained from the reported community name with manual changes reported in Table S 2.
Minnesota Subsurface Sewage Treatment Systems 2017
The Minnesota Subsurface Sewage Treatment Systems (SSTS) dataset reported the number of SSTS by local government units with a known SSTS program. Data was taken from Ref. [49].
Two hundred and eleven out of 2018 known SSTS programs submitted an annual report. Not all SSTS programs were able to be identified and contacts were not always provided. Fourteen reported having zero SSTS within their jurisdiction despite permitting SSTS in 2017. Data were reported to sometimes be estimates.
U.S. Environmental Protection Agency Clean Watersheds Needs Survey 2012
Data for the most recent Environmental Protection Agency (EPA) Clean Watersheds Needs Survey (CWNS) survey (2012) [24] was downloaded from [50] and extracted from a Microsoft Access database file to csv files using the mdbtools package (https://github.com/mdbtools/mdbtools). The survey is administered by the EPA and the States to assess funding needs for unfunded capital costs of treatment works projects that (1) address a water quality or related public health problem existing as of January 1, 2012, or is expected to occur within the next 20 years and (2) meet the CWNS documentation criteria. CWNS documentation criteria includes (1) description and location of the problem, (2) site-specific solution to the problem, and (3) detailed cost information for implementing the solution. Data is collected about (1) publicly owned wastewater collection and treatment facilities, (2) combined sewer overflow control facilities, (3) stormwater management activities, and (4) decentralized wastewater treatment facilities. Data is collected on (a) estimated needs, costs and technical information, (b) facility location and contact information, and (c) facility population served, flow, effluent, and unit process information. Supporting documentation was required for the entered data with some exceptions (for instance small communities could use a more simple form), and quality control was performed by reviewers to compare documentation with data entered in the system to ensure consistency of technical and cost data [23]. Data entry occurred from January to December 2012.
There was large variability in the level of effort and resources that each state put into the survey. New York, California, Florida, New Jersey, Maryland, Iowa, Minnesota, and Michigan likely invested heavily in participating in the survey or already had similar state-level data collection systems in place for getting comprehensive responses [22,48,49].
Facilities not included in the dataset are those in South Carolina, the Northern Mariana Islands, and American Samoa as they did not participate in the survey; facilities whose projects did not have documented solutions or cost estimates; privately owned wastewater facilities that serve privately owned industrial facilities, military installations, national parks, or other federal facilities; facilities on tribal lands and in Alaskan Native Villages, which were separately surveyed by the Indian Health Service; and facilities with projects that had existing funding prior to January 1, 2012. Additionally, larger facilities were more likely to be captured by the survey due to the resource requirements for obtaining the right documentation. To address this bias, the survey had decreased documentation needs for small communities (population <=10,000); despite this, the small community facilities were still underrepresented (pg. 2 of Ref. [24]).
We filtered the facilities to the current facilities of type “collection: combined sewers” or “collection: separate sewers”. The population receiving collection is taken from the reported total number of residents who are connected to a sewer system which empties into a treatment plant. This does not include non-resident populations, populations served by acceptable decentralized wastewater treatment systems, or populations connected to sewers that do not discharge to a treatment plant.
U.S. Census Bureau Island Areas Decennial Survey 2020
The Island Areas Decennial Survey surveys housing, social, and economic information for all housing units in Guam, the Northern Mariana Islands, the Virgin Islands, and American Samoa [26]. Data of general demographic characteristics (DP01), selected economic characteristics (DP03), and selected housing characteristics (DP04) were downloaded from https://data.census.gov/. Housing characteristics included data for both occupied and vacant housing units.
U.S. Census Bureau American Community Survey
The American Community Survey (ACS) is a monthly survey of a representative subsample of addresses in the US [20] on characteristics of populations and households. The ACS 5 year estimates (average over 5 years, less subject to fluctuations) for selected social characteristics (DP02), selected economic characteristics (DP03), and demographic and housing estimates (DP05) stratified county or county subdivision for select years were downloaded from https://data.census.gov/.
Data analysis and visualization
Florida Department of Health onsite sewage treatment and disposal system inspections dataset analysis
The number of septic tanks in each county subdivision was calculated as the number of septic permits whose latitude and longitude occurred inside the boundaries of each county subdivision according to the 2012 U.S. Census Bureau county subdivision TIGER shapefile. To calculate the fraction of a county subdivision connected to septic tanks, the number of septic tanks in each county subdivision was divided by the number of households in the county subdivision as reported by the 2012 American Community Survey 5-year estimate. Only county subdivisions with at least 5 households and 20 population size were used.
Utah Municipal Wastewater Planning Program Survey analysis
The population in each county receiving collection was calculated by summing the population receiving collection across all reported facilities in that county. The population in each county subdivision receiving collection was calculated by summing the population receiving collection across all reported facilities whose latitude and longitude fell within that county subdivision’s boundaries from the 2021 TIGER shapefile. To calculate the fraction of a county or county subdivision receiving sewer collection, the population receiving collection was divided by the population size of each county or county subdivision reported by the 2021 American Community Survey 5-year estimate. Only county subdivisions with at least 5 households and 20 population size were used.
U.S. Environmental Protection Agency Clean Watersheds Needs Survey analysis
To compare the EPA CWNS and AHS data in Core Based Statistical Areas (CBSA), counties were categorized into CBSAs using the February 2013 U.S. Census Bureau file. To calculate the fraction of a county receiving collection, the population size receiving collection in that county was divided by the population size of each county reported by the 2012 American Community Survey 5-year estimate. We flagged counties that had no facilities reported, which could either be because there were no facilities in that county or because of a lack of reporting, and counties that had >100% sewer connectivity by population (potentially due to misreporting or collection occurring across multiple counties) were set to 100% connectivity.
U.S. Census Bureau Island Areas Decennial Survey analysis
To calculate the fraction of a Census Designated Place (CDP) connected to sewers, the number of households connected to sewers was divided by the total number of households in a CDP. Only CDPs with at least 5 households and 20 population size were used.
Generation of maps
TIGER shapefiles from the U.S. Census Bureau in the closest matching year to the dataset were used for generating maps. Maps were generated using the geopandas package (version 0.12.2) in Python (version 3.11.0) and were displayed in the Mercator projection.
Simulations of interacting populations and wastewater sampling
Two interacting populations are modeled in a compartmental model with susceptible, infected, and recovered (SIR) compartments for each population. The number of susceptible, infected, and recovered individuals in population A (SA, IA, and RA, respectively) and in population B (SB, IB, and RB, respectively) are described by: where NA and NB are the number of individuals in populations A and B respectively, βA and βB are the overall contact rates of populations A and B respectively times the probability of infection given contact, γI is the rate of recovery. ε is the fraction of the total contacts made by individuals in a population that occurs with individuals in the other population and it describes the interaction strength between the two populations. ε can take on values between 0 and 1, where ε=0 indicates that all interactions occur within each population and none between populations, and ε=1 indicates that all interactions occur between the populations and none within populations. The base parameters are set at NA=NB=5000, γI=0.18 inverse days, , which represent an outbreak similar to COVID-19, but the parameters are varied in sensitivity analyses. The initial condition was a single infected individual in population A (IA=1) and all other individuals were susceptible (SA=N-1, SB=N, IB=0, RA=0, RB=0). The differential equations are solved deterministically using the integrate.odeint function in the scipy package (version 1.6.2) in Python using a timestep of 0.1 days.
The concentration C of the pathogen in wastewater is given by where nA and nB are the number of copies of pathogen genetic material (DNA or RNA) shed per day by population A and B, respectively, into the sampled wastewater and vA and vB are the volume of wastewater produced per day by population A and B, respectively, into the sampled wastewater.
The number of copies of pathogen genetic material, nA and nB, shed per day by population A and B, respectively, into the sampled wastewater is given by
Where n is the number of copies of pathogen genetic material shed per infected person per day (assumed to be the same for all infected individuals) and fA and fB are the fraction of total shed pathogen genetic material by population A and B respectively that is sampled by wastewater (determined by the sewer connectivity of each of the populations).
The volume of wastewater, vA and vB, produced per day by population A and B, respectively, into the sampled wastewater is given by where v is the volume of wastewater produced per person per day (assumed to be the same for all individuals). The value used for n was 5 x 106.7 gene copies/person/day (estimate taken from Ref. [51] by multiplying 5 x 104.7 gene copies/mL feces by 500 mL feces/day/person) and for v was 20 liters wastewater/person/day (per person daily wastewater usage approximated from per household daily wastewater usage from Ref. [52] assuming 4 people per household).
In addition to the assumptions mentioned above, the model makes the following additional assumptions: The probability of an infected individual shedding pathogen genetic material to wastewater is 1. Given that there is pathogen genetic material present in wastewater, the probability of detecting it is 1. Shedding of pathogen genetic material occurs only when an individual is infected. There are no stormwater contributions to wastewater to dilute the pathogen concentration.
To calculate the time of the maximum number of infections in a population and the time of the maximum concentration of pathogen gene content in wastewater data, we numerically calculated the time derivative at the midpoints of the time points as the difference between consecutive datapoints divided by the time interval and took the first time point at which the derivative crossed 0.
S2 Appendix. Supplementary analyses
Compilation of data and assessment of data completeness
As the AHS does not provide finer scale geographic information than the census division for households outside of the 35 selected large metropolitan areas, we used the American Community Survey, which reports the percentage of occupied housing units lacking complete plumbing facilities by county and which allowed us to define the lower bound on the percentage of the population not connected to sewers. Additionally, we identified several state-level datasets, including locations of permitted septic tanks in Florida, sewer locations and populations served in Utah and Minnesota, and the 2012 EPA Clean Watersheds Needs Survey of sewer locations and populations served for 49 states (South Carolina did not participate). However, many of these state-level datasets are incomplete or potentially biased and thus only allow qualitative analyses. Finally, the 2020 US Census’ Island Areas Decennial survey provided information the percentage of households connected to sewer by Census Designated Place in the island areas of American Samoa, Guam, the Northern Mariana Islands, and the Virgin Islands (see Error! Reference source not found. and Table S 3 for a summary of datasets).
We checked the consistency of the Florida Department of Health onsite sewage treatment and disposal systems inspections data reported in 2012 with other sources. We noted that an estimated 7% of households across the state were connected to septic tanks, but that this was substantially less than the Florida Department of Environmental Protection’s estimate that approximately one third of Florida’s population uses septic tanks [21] even after accounting for potential correlations of septic usage with household size (approximate increase in household size of 10% for a 50% increase in septic connectivity (Fig S 5) cannot explain the almost 5 time difference in septic connectivity across the two sources), suggesting that more households may be on septic tanks, and thus less on sewer systems. This discrepancy may exist because not every septic tank had a permit recorded in the state system, because there were changes since 2012, or because of uncertainty in the Florida Department of Environmental Protection’s estimate. Thus, we focused our analysis on relative comparisons across county subdivisions (Fig 4).
The 2012 EPA Clean Watersheds Needs Survey dataset includes voluntary submissions of the locations of publicly owned wastewater collection and treatment facilities and their estimated population served. The following states had among the highest reported needs and survey participation level (pg. 6 and Table A-1 of Ref. [24]) and were the focus of our analyses: New York, California, Florida, New Jersey, Maryland, Iowa, Minnesota, and Michigan. Comparison of Florida in the EPA and Florida Department of Public Health datasets showed consistent spatial trends (less connection to sewers and more connection to septic systems in the panhandle and inland) (Fig 4 and Fig S 9). Additionally, in both datasets, counties with more connection to sewers are more Asian (EPA: Pearson r = 0.53 (p = 4.2×10-4), Florida Department of Public Health: Pearson r = 0.31 (p = 3.2×10-7)). Comparison of Minnesota in the EPA dataset, Minnesota Wastewater Infrastructure Needs Survey, and Minnesota SSTS datasets also showed consistent spatial trends (more connection to sewers and less connection to subsurface sewage treatment systems around Minneapolis) (Fig S 11). We found a significant correlation between the AHS and EPA datasets for the largest 35 metropolitan areas in the states with good data collection (Fig S 12), despite deviations from a one-to-one correspondence suggesting overall systematic biases in the data.
The 2021 Utah Municipal Wastewater Planning Survey estimated the populations receiving collection by municipal utilities that owned or operated a sanitary sewerage system. The purpose of the survey is to aid in overseeing and communicating with the wastewater industry in Utah. We noticed that these data may have not be quantitatively accurate but are likely qualitatively informative. First, large swaths of the state appeared not to receive wastewater collection (Fig S 13), but this may partly reflect that only 71% of contacts responding to the survey [47]. Second, in some county subdivisions, >100% of the population appeared connected to sewers, likely due to errors in estimation, differences in how combined treatment and collection facilities reported population estimates, or collection populations resided in neighboring county subdivisions. While the 5 county subdivisions with the highest percentages of American Indian and Alaska Natives (~20-100%) were reported to not be connected to sewers at all in this dataset, we believe this is because Indian reservations are not included in the dataset (as they are under federal, not state jurisdiction). In fact, all 5 of these counties (Uintah and Ouray, Casa del Eco Mesa-White Mesa, West Juab, Oljato, Blanding) include Indian reservations. Additionally, the county subdivision with the highest percentage of Black or African American; Native Hawaiian and Other Pacific Islander; or Hispanic are in county subdivisions has nobody receiving collection (Fig S 14); however, we cannot rule out that this is due to underreporting from these county subdivisions.
The 2021 Minnesota Wastewater Infrastructure Needs Survey reported whether communities have a collection system. The data from this survey was geographically sparse, as communities (the level of aggregation of the dataset) often directly mapped to a census designated place (CDP), and CDPs do not cover the entire state (Fig S 11a). Consequently, the sewer connectivity of the rural areas between census designated places could not be determined from this dataset. Additionally, not all communities responded to the survey. Of the 523 communities that participated in the survey, all but 11 had a collection system. The geographic distribution of sewer connectivity qualitatively matched that of the 2017 Subsurface Sewage Treatment Systems (SSTS) Annual Report of the number of reported subsurface sewage treatment systems (septic tanks) by county in Minnesota (Fig S 11b); however, 7 out of the 218 contacted local government units did not respond, 14 reported having zero SSTS within their jurisdiction despite permitting SSTS and not all SSTS programs were able to be identified to be contacted.
Given the data incompleteness and biases for the Utah and Minnesota datasets, we excluded them from the analyses in the main text.
Sewer connectivity in the U.S. Island Areas
From the 2020 US Census Island Areas Decennial Survey of household characteristics, the overall levels of household sewer connection in the island areas were lower than in the states (Fig S 15). In Guam, the Northern Mariana Islands, the Virgin Islands, and American Samoa, the overall connection to sewer systems across each island was 65%, 51%, 66%, and 52% of households, respectively (compared to an overall 83% connection across mainland US). To ask whether there was strong spatial variability across each island, we looked at the median household connection by Census Designated Place (CDP). A median of 64%, 32%, 70%, and 13% households by CDP in Guam, the Northern Mariana Islands, the Virgin Islands, and American Samoa, respectively, were connected to sewers. This suggests that sewer connectivity is particularly spatially concentrated in the Northern Mariana Islands and American Samoa (difference in median household connection by CDP and overall island connection), although all island areas showed some spatial variability in connectivity when visually assessing maps (Fig S 16). The only significant correlation of sewer connectivity with demographic or economic characteristics was in American Samoa, where CDPs with a higher percentage of Black or African Americans were more connected to sewers (Pearson r = 0.38, q-value = 0.01).
Supplementary figures
Supplementary tables
Acknowledgements
We thank Cara Omana from the Minnesota Pollution Control Agency for generously sharing the Minnesota WINS data and Harry Campbell from the Utah Department of Environment Quality, Division of Water Quality for generously sharing the Utah MWPP survey data and to both for answering questions about the datasets. We thank colleagues in the Grad lab and the Harvard T. H. Chan School of Public Health Center for Communicable Disease Dynamics for helpful discussions, particularly Stephen Kissler.