Combining genomic data and infection estimates to characterize the complex dynamics of SARS-CoV-2 Omicron variants in the United States ======================================================================================================================================= * Rafael Lopes * Kien Pham * Fayette Klaassen * Melanie H. Chitwood * Anne M. Hahn * Seth Redmond * Nicole A. Swartwood * Joshua A. Salomon * Nicolas A. Menzies * Ted Cohen * Nathan D. Grubaugh ## Abstract SARS-CoV-2 Omicron surged as a variant of concern in late 2021. Subsequently, several distinct Omicron variants have appeared and overtaken each other. We combined variant frequencies and infection estimates from a nowcasting model for each US state to estimate variant-specific infections, attack rates, and effective reproduction numbers (Rt). BA.1 rapidly emerged, and we estimate that it infected 47.7% of the US population between late 2021 and early 2022 before it was replaced by BA.2. We estimate that BA.5, despite a slower takeoff than BA.1, infected 35.7% of the US population, persisting in circulation for nearly 6 months. Other Omicron variants - BA.2, BA.4, and XBB - together infected 30.7% of the US population. We found a positive correlation between the state-level BA.1 attack rate and social vulnerability and a negative correlation between the BA.1 and BA.2 attack rates. Our findings illustrate the complex interplay between viral evolution, population susceptibility, and social factors during the Omicron emergence in the US. ## Introduction Nearly four years since the World Health Organization declared the COVID-19 outbreak as a pandemic, SARS-CoV-2 caused more than 778 million confirmed cases globally and more than 6.9 million deaths (1). The emergence of genetically distinct SARS-CoV-2 variants of concern (VOC) posed a major challenge for control programs and greatly extended the length and health impact of the pandemic. Following the emergence of the first major VOC, Alpha, in late 2020 (2), new VOCs have arisen and resulted in successive waves of infection (3,4). Alpha co-circulated with both Beta and Gamma variants (first detected contemporaneously in late 2020 in South Africa and Brazil (5,6), respectively); these variants were subsequently replaced after the emergence and spread of the Delta variant (7) in mid-2021. The emergence of the Omicron variant, first detected in South Africa and Botswana in November 2021 (8,9) was followed by rapid global spread and the replacement of the Delta variant. Large-scale genomic sequencing of SARS-CoV-2 isolates collected from individuals with detected COVID-19 disease has been instrumental in documenting the evolution of successive VOC in many settings (5,7,9–11). However, a considerable fraction of SARS-CoV-2 infections do not result in documented disease (12–15), especially after the introduction of vaccines and the development of partial immunity associated with previous infection (16–18). Understanding the dynamics of transmission and strain replacement requires methods to infer time trends in variant-specific infections. Here, we combine nationwide SARS-CoV-2 sequencing data from GISAID with infection estimates from a Bayesian nowcasting model to better characterize the rise and fall of Omicron variants in the United States (US) between late 2021 and March 2023. ## Results ### Quantifying variant-specific infections by combining variant frequency and infection estimates The emergence and spread of multiple SARS-CoV-2 variants has been a hallmark of the COVID-19 pandemic. Combining 3,103,250 SARS-CoV-2 genomic sequences (**Figs. S1-S3**) and infection estimates from a nowcasting model (*covidestim* (19); **Fig. 1A**), we estimated daily infections by each major variant of Omicron (BA.1*, BA.2*, BA.4*, BA.5*, and XBB*) from each US state and the District of Columbia from December 2021 to March 2023 (**Fig. 1**). ![Fig. 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F1.medium.gif) [Fig. 1.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F1) Fig. 1. Time series of daily Omicron variant infections across the entire United States. The left y-axis, in black, is the state-level, and the right y-axis, in red, is the national scale. Note that the scale of the y-axis differs between time series for each variant. The red shading is the 95% Credible Interval (CrI) for the national estimate. The state-level 95% CrI for each of the variant infection estimates are provided in **Table S3.** Time series of infection estimates for all variants. The gray lines are infection estimates per state and the red line is the mean infection estimates per day for the whole US. **A)** Time series of infection estimates for each variant. The gray lines are infection estimates per state and the red lines are the mean of infection estimates per day for the whole US. The scales differ by each variant subplot, as each variant had a different size of total infection per day. Reported cases, hospitalizations, and deaths provide an incomplete picture of the status of the COVID-19 pandemic since the majority of infections are asymptomatic. We address this by using infection estimates from *covidestim* (19), a nowcasting model that generates daily infection estimates while correcting for under-reporting and notification delays (**Fig. 1A**). We then sorted the SARS-CoV-2 sequences for all 50 states and the District of Columbia and binned the lineages into variant categories - BA.1*, BA.2*, BA.4*, BA.5*, and XBB* (**Table S1**). Combining these two sets of analytic outputs, we calculated the daily frequencies of each Omicron variant (**Fig. S1**). We used this information to estimate the number of daily variant-specific infections via a spline interpolation (**Fig. 1B**). For more details see Materials and Methods. ![Fig. S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F5.medium.gif) [Fig. S1.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F5) Fig. S1. Flowchart of the process of joining genomic and epidemiological data streams. From the genomic sequences metadata GISAID and infection estimates from *covidestim*, we produced infection estimates by multiplying the frequencies of each Omicron variant by the infection estimates. Those estimates are then imputed to *EpiEstim* functions to produce variant-specific effective reproduction numbers, Rt, and state attack rates. We identified three peaks of infections in 2022 associated with the prevalence of distinct variants, one period in the winter, one in spring to early summer, and one in the late fall (**Fig. 1A**). The first Omicron period (BA.1*, December 2021 - January 2022) caused an estimated 4.2 million (95% credible interval [CrI] = 2.6-6.0 million) infections per day at its peak (about 1.25% of the US population being infected per day) (**Fig. 1B, Tables 1** and **S2**). In total, we estimate that BA.1* caused approximately 169 million infections (95% CrI = 97-249 million) in the US during this wave (**Table 1**). The second Omicron period started in April 2022 (>2% frequency) with the emergence of Omicron BA.2* and lasted until November 2022 (<2% frequency) with the initial emergence of BA.4* and BA.5*. These variant-specific surges peaked at ∼625,000 (BA.2*), ∼140,000 (BA.4*), and ∼800,000 (BA.5*) infections per day in the US. Finally, the third Omicron period, from November 2022 to March 2023, was driven by a resurgence of BA.5* and the emergence of the recombinant variant, XBB*, which peaked at ∼500,000 and ∼300,000 infections per day in the US, respectively. View this table: [Table 1.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/T1) Table 1. Variant-specific Attack Rate, Peak, and Total Infections for the United States At the state level, we estimated that the daily BA.1* infections peaked at ∼548,000, ∼422,000, ∼318,000, and ∼281,000, for California, Texas, Florida, and New York, respectively. Similar to our national estimates, at these peaks over 1% of the state population was being infected per day. We summarize the total and peak daily infections for each Omicron variant for all 50 states, the District of Columbia, and the whole country **(Tables 1, S2,** and **S3)**. ![Fig. S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F6.medium.gif) [Fig. S2.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F6) Fig. S2. Number of genomic sequences per variant category per week during the period of December 1st, 2021 to May 1st, 2023, to the whole country. From the GISAID metadata, we calculate the amount of sequences deposited to the database per week, during the analyzed period. Each bar is a week of the period and the filling of the bar is the frequency of each variant during that week. It is possible to see the pattern of succession of variants over the year. ![Fig. S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F7.medium.gif) [Fig. S3.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F7) Fig. S3. Number of genomic sequences per variant category per week during the period of December 1st, 2021 to May 1st, 2023, to all individual states. From the GISAID metadata, we calculate the amount of sequences deposited to the database per week per state, during the analyzed period. Each bar is a week of the period and the filling of the bar is the frequency of each variant during that week. It is possible to see the pattern of succession of variants over the year. ### Omicron variant attack rates for each state We used the daily infections to calculate the percent of the population estimated to have been ever infected during each variant wave (variant-specific attack rates) for each US state (**Figs. 2** and **S4**). During the BA.1* wave, states with the highest attack rates - Kentucky (57.2%), Alabama (56.5%), and Louisiana (56.3%) - were concentrated in the southeast, while we estimate the lowest attack rates from Iowa (38.3%), South Dakota (38.0%), and Idaho (42.1%). The highest and lowest state attack rates for the other Omicron variants were as follows: BA.2* highest in Hawaii (30%), lowest in South Dakota (6%); BA.4* highest in North Carolina (6.8%), lowest in Vermont (2.3%); BA.5* highest in Kentucky (48%), lowest in Vermont (24%); XBB* highest in Rhode Island (15.6%), lowest is Arkansas (3.4%; **Fig. 2B, S4**). While Kentucky often had high attack rates and Vermont and South Dakota generally had lower attack rates, we did not detect consistent geographical patterns for each Omicron variant. ![Fig. S4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F8.medium.gif) [Fig. S4.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F8) Fig. S4. Frequency from the raw number of genomic sequences per variant category, during the period ranging from December 1st, 2021 to May 1st, 2023, over all the individual states. From the GISAID metadata, we calculate the amount of sequences deposited to the database per week, during the analyzed period. Each bar is a week of the period and the filling of the bar is the frequency of each variant during that week. It is possible to see the pattern of succession of variants over the year. ![Fig. 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F2.medium.gif) [Fig. 2.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F2) Fig. 2. Distribution of attack rate estimates across the United States for each Omicron variant. **A)** Attack rate distribution and state-level attack rate estimates. Each dot is a state attack rate estimate, and the boxplots show the distribution of attack rate values across all states. **B)** Maps of the attack rate estimates. For all the Omicron variants we show the US map, with Alaska and Hawaii placed below. Color on the state map indicates the state-level attack rate value of each variant. ### Variant-specific effective reproduction numbers estimated from across the US We estimated Omicron variant-specific effective reproductive numbers (Rt) for each state to gain insight into variant transmission (**Fig. 3)**. We produced variant-specific estimates of Rt across all states by applying the *EpiEstim* R package (20,21) functions to our variant-specific daily infection estimates (**Fig. 1B**). For Omicron BA.1*, the median Rt across all states started as high as 3 (1.5, 3) (**Table S5**), while the Rt estimates for the other variants were smaller. We found similar longitudinal Rt estimates for BA.4* and BA.5*, indicating that they were generating similar numbers of secondary cases in the US and thus able to co-exist for several months. This observation suggests that there are variant-specific factors that can impact their relative transmissibility (e.g. immune escape, infectivity), but there are important population factors that also impact infection incidence. ![Fig. 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F3.medium.gif) [Fig. 3.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F3) Fig. 3. **Time series of variant-specific effective reproductive numbers across all states.** On each facet is depicted the time series for all US states and its confidence interval to the Rt estimate. The red line is the national average overall states. To help the visualization we apply over each state Rt time series a locally estimated scatterplot smoothing function (LOESS). The y-axes showing the Rt values are independently scaled for each variant to highlight changes over time. ### Variant-specific associations between attack rates and social vulnerability To investigate whether SARS-CoV-2 transmission is associated with population level social vulnerability, we examined correlations between our estimated outcomes and the CDC social vulnerability index (SVI) metric (22). Comparing the state SVI (**Fig. 4A**) to the attack rates for each variant, weighted by the state population sizes (**Fig. 4B**), we found that the Omicron BA.1* (correlation coefficient *R* = 0.56). BA.4* (*R* = 0.3), and BA.5* (*R* = 0.31) attack rates positively correlate with the SVI (**Fig. 4B**). The BA.2* and XBB* emergences occurred immediately following the two largest Omicron waves, BA.1* and BA.5*, respectively. We, therefore, hypothesized that while individuals living in states with higher SVIs have higher exposure rates, they are less susceptible to infection during variant emergence immediately following exposure to a previous novel variant wave. ![Fig. 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F4.medium.gif) [Fig. 4.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F4) Fig. 4. Correlation between variant attack rates and the social vulnerability index. **A)** Map of the SVI for all states, colors correspond to the SVI scores. **B)** Scatterplot between attack rates by variant category and the SVI. Sizes are equivalent to the size of the state population and colors correspond to the variant categories as in the Panel B of Fig. 1. **C)** Scatterplot between the attack rate of Omicron BA.1* and Omicron BA.2*, colors correspond to the SVI quartile, and size is proportional to the state population size. Correlation between the attack rates. **D)** Scatterplot between the attack rate of Omicron BA.1* and Omicron BA.5*, colors correspond to the SVI quartile, and size is proportional to the state population size. Correlation between the attack rates. To test the hypothesis that states with higher SVI had higher exposure rates, we compared the Omicron BA.1* attack rates to those for BA.2* (peaked ∼4 months after BA.1) and BA.5* (peaked ∼6 months after BA.1). We calculated a negative correlation between the BA.1* and BA.2* attack rates (*R* = −0.31, 95% CI [-0.54, −0.04]; **Fig. 4C**) and a positive correlation between BA.1* and BA.5* (*R* = 0.39, 95% CI [0.13, 0.6]; **Fig. 4D**). States like Kentucky, Louisiana, and Alabama, which are on the higher end of the SVI scale, had attack rates that were relatively low for BA.1*, low for BA.2* attack rates, and high for BA.5*. Four states that did not fit the negative BA.1\*-BA.2\* correlations were South Dakota, Iowa, Idaho, and Nebraska, all of which had low SVI values and relatively low attack rates for both variants. Thus our analysis supports our hypothesis that variant waves are driven by opposing forces of social vulnerability that govern exposure rates and population susceptibility following previous outbreaks. ## Discussion We investigated the Omicron variant-specific infection dynamics across all US states, estimating daily infections, attack rates, and effective reproduction numbers. By combining sequencing data with infection estimates, we aimed to disambiguate infection dynamics during periods of strain replacement and when variants were co-circulating, revealing features of the epidemic that could not be inferred from the reported epidemiological data alone. We found that Omicron variants were responsible for approximately 404 million (95% CrI = 221-617 million) infections across the US from December 2021 to March 2023, including approximately 169 million during the BA.1* wave. The transmission dynamics of variants differed markedly: BA.1* emerged as a genetically distinct (3,23–25) variant which caused large rapid epidemics, especially in states with a higher degree of social vulnerability. Subsequent Omicron variants, while able to both co-circulate and eventually outcompete extant strains, spread at lower levels and often for longer durations, exhibiting much weaker association with social vulnerability measures than the BA.1* variant. These findings reveal the complex interplay between viral evolution, population susceptibility (driven by previous infections and population-level immunity), and social factors that affect the risk of exposure and infection. The validity of our estimated variant-specific infections, attack rates, and effective reproduction numbers depends on several assumptions (20,21,26,27). The state-level estimates of total infections (i.e. not stratified by variant) were obtained from a published Bayesian nowcasting model which used publicly available time series of COVID-19 case notifications, hospitalizations, and deaths, accounting for effective population immunity. These estimates are calibrated to hospitalization and death data, accounting for delays associated with disease progression and estimates of infection hospitalization and infection fatality ratios (17). The model maintains two sets of assumptions, before and after the introduction of the Omicron variants. The pre-Omicron model does not allow for reinfection or waning of immunity, while the Omicron-era model allows for waning of immunity after infection. Because the underlying mathematical model uses a weekly spline function to model the transmission rates, no explicit assumptions were made about the transmissibility of each variant. Rather, the model allows the transmissibility of circulating variants to vary over time, while the infection-hospitalization ratio remains fixed. By using publically available SARS-CoV-2 sequencing data from GISAID to estimate variant frequencies at the state level, we were able to disaggregate the total number of infections into variant-specific incidence in the current analysis. As such, we assumed the sequencing was done at random within states. We also note that our analysis of the association between state-level attack rates and state-level SVI has the potential for ecological fallacy and should thus be interpreted with caution. Our findings align with data from blood donors (28) and another modeling study in China (29). The prevalence of anti-spike and anti-nucleocapsid antibodies (infection-induced and hybrid-induced) in the blood donor sample rose from 20.9% in April - June 2021 (Pre-Omicron), to 54.6% in January - March 2022, and then to 70.3% in July - September 2022. The latter two periods align with our estimates of the Omicron BA.1* and BA.5* waves. After the Omicron BA.5* wave, we estimate a cumulative attack rate of 83.4% of the US population. The China study estimates after the BA.5* introduction in a naive population, that 97% of the population had been infected. The overall attack rate we estimate is larger than the US population, which is explained by reinfections over the Omicron variant waves. Our findings provide evidence that the dynamic evolution of SARS-CoV-2 variants is a result of the interplay between exposure and immunity to the virus (3,18,30,31). The pandemic’s history has been marked by the initial emergence of highly transmissible variants (3,7,27,32) and the Omicron era is marked with immune escape characteristics (3,30,33), necessitating ongoing adaptations in public health responses. By quantifying infection rates, attack rates, and effective reproduction numbers for different variants across all states, we provide valuable insights that can guide preparedness and resource allocation. ## Materials and Methods First, we describe the processing of lineage information and how the lineages were summarized into categories. Second, we describe how the variant-specific infection estimates are produced by joining the infection estimate time series and variant frequency time series. Third, we describe the use of a modified version of *EpiEstim* tools to estimate the Rt for each of the variants. Lastly, we describe the joint analysis of the attack rate estimates and social vulnerability index scores. ### Data Sources The GISAID database contains more than 16 million genomes, of which approximately one-third come from US genomic surveillance efforts (11). We processed the metadata and generated counts and frequencies of each variant lineage. Frequencies of variants have been used as a surveillance tool by the Centers for Disease Control and Prevention (CDC) and can give information on new invading variants. The GISAID metadata contains the Pango lineage nomenclature system classification of the genome. We can further distribute lineage information into variant categories by aggregating the major parental lineages and their sublineages into the same category. We categorized those lineages into major lineages categories (which we refer to as “Omicron variants”), such as Omicron BA.1* to incorporate Omicron BA.1 and its sublineages, Omicron BA.2* to Omicron BA.2 and its sublineages, and so on (**Table S1**). We used the published *covidestim* model data to render weekly variant-specific estimates of infection from December 1, 2021 until May 1, 2023. This model back-calculates infections from the observed case, death, vaccination and hospitalization reports, using assumptions on reporting and progression delays, and a variable probability of case reporting over time. From the beginning of the pandemic until December 1, 2021, reinfections were not assumed to occur and the model was based on case and death reports. After December 1, 2021, infections are back-calculated from case and hospitalization reports, to accommodate the reduced mortality rate under Omicron variants. Furthermore, vaccination reports and assumed waning of immunity are included in the assumptions, and the serial interval and infection mortality rate assumptions are adjusted to match the new disease dynamics (17). The model output is a median of the infection estimates and its 95% credible interval (CrI). ### Lineages collapsing into major lineage categories We pre-processed the metadata downloaded from GISAID and categorized the Pango lineages into 8 major categories: ‘Omicron BA.1*’, ‘Omicron BA.2*’, ‘Omicron BA.3*’, ‘Omicron BA.4*’, ‘Omicron BA.5*’, ‘Omicron XBB*’, ‘Other Recombinant’ and ‘Other’, see **Table 1** for details on each lineage and its sublineage alias. As for the categories such as ‘Omicron BA.3*’, ‘Other Recombinant’, and ‘Other’, had less than 2% in frequency and we suppressed them from the main analysis. We collapsed all the sub-lineages into major categories, the following table summarizes our categorization. From the categorization, we count and calculate the frequency of each of these categories in every state and week. See **Fig. S1** in the supplementary material for the counts of genomic sequences for the whole US during the studied period, Dec 2021 to May 2023, into the 8 previously mentioned categories. ### Variant-specific estimates by joining genomic frequencies and infection estimate We summarized the genomic sequence data to align with the *Covidestim* weekly infection estimates. From weekly counts, we calculate the frequency of each of the major variant categories described above. This process guarantees compatibility between the dates of metadata and infection estimates. We filter out frequencies below 2% on a week. By multiplying the frequencies of each category at each state by the number of total infections estimated for each state weekly, we produce estimates of the infections per variant in each state per week. We round the number of infections estimated to an integer number of infections. In a formula, we have: ![Formula][1] Where the infection estimate time series for the variant *v* at state *s, Iv,s*(*t*), is given by the infection estimate time series of total infection at state *s*, *Is*(*t*), times the frequencies time series of each variant *s* within the state *s*, *fv,s* (*t*). With *fv,s* (*t*) > 0. 02 every week. We interpolate the weekly time series using a b-spline function to produce a daily time series of infections. We repeated the same procedure to the 2.5th and 97.5th quantiles of the infection estimates generated by *Covidestim*. We report the 2.5th and 97.5th quantiles of the posterior distribution trajectories as the lower and upper bound, respectively, for the 95% credible interval (CrI) of the infection estimates. To compare the incidence estimates by each variant, we calculated the cumulative incidence over the epidemic of each variant for all states. The incidence is given as the percent of the population ever infected with the variant in the state. ### Effective reproduction number estimates The daily time series of each variant in each state was then given to the ‘estimate_R()’ function from the R package *EpiEstim*. To avoid non-converging problems with the model employed by *EpiEstim*, we only parse time series with more than ten days of continuous infection estimates. The model is parametrized using an uncertain Serial Interval (SI) setting, estimating the serial intervals of SARS-CoV-2 (Omicron variant specific) by drawing from two (truncated) normal distributions for the mean and standard deviation of the SI. The truncated normal distribution of the SI is then parametrized with a mean of 3.5 (1–6 days). ### Variant-specific attack rate From the variant-specific infection estimates we can calculate the variant-specific attack rate (AR), *i.e*. the proportion of state-population ever infected on each variant *v* wave. We calculate the AR by summing all the infections for a specific variant *v* at state *s* over the period of the study: ![Formula][2] Where *Incv,s* (*t*) is the incidence of variant *v* at state *s*, given by: ![Formula][3] ### Rt ratios per variant for each state We calculated the Rt ratio for pairs of variants to compare the Rt values between each variant across states (**Fig. S6**). We created pairs based on temporal succession; for time points with two or more variants co-circulating, we divided the succeeding variant time series by the preceding variant time series (**Fig. S6A**). In all pairs of succession, we found the average Rt ratio was greater than 1 as expected, consistent with the observation that succeeding variants were capable of invasion (**Fig. S6B** and **Fig. S5**). ![Fig. S5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F9.medium.gif) [Fig. S5.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F9) Fig. S5. Attack rate per each variant category for all individual states. Bar chart to the variant-specific attack rates estimates in the layout of the US states. Each chart is the attack rates of the variants with the corresponding color. The double-letter state abbreviation is displayed on the right side of each subchart. We estimate the advantage of one variant over another by taking the Rt ratios during their period of coexistence. From the Rt ratios, we can classify two different periods to the succession of Omicron variants. Periods of complete clearance of the previous variants are marked with higher Rt ratios, as for the Rt between BA.2\*/BA.1\* and XBB/BA.5* (**Fig. 3**.). Conversely, we see periods of coexistence of more than one variant have smaller Rt ratios, e.g., the ratios between BA.4\*/BA.2\*, BA.5\*/BA.2\* and BA.5\*/BA.4\*. The median Rt values of BA.2*, across the US, were almost 20% higher than the Rt values of BA.1*, and to XBB* distribution of Rt values it was more than 20% bigger than the Rt values of BA.5*. In summary, variants with comparable higher Rt (BA.2* and XBB*) values to their predecessor, can completely invade the dominant variants. See **Fig. S6.** for the Rt ratios for all states. ![Fig. S6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F10.medium.gif) [Fig. S6.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F10) Fig. S6. Effective reproduction number (Rt) ratios to each pair of succeeding variants by each state overall. The Rt ratio is calculated by dividing the average Rt of the predecessor variant by the successor variant. When the slope rises it means the entering variant has a larger value of average Rt over the predecessor variant. ![Fig. S7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/01/22/2023.11.07.23298178/F11.medium.gif) [Fig. S7.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/F11) Fig. S7. The ratio between variant-specific Rt boxplot and dots to the state-specific ratios. Dots are the state-level Rt ratio and the boxplot is the distribution over all the states. The pairs of variants are chosen as the succeeding history of variants throughout 2022. After the BA.1* wave and before the XBB* Introduction to the US, the Rt ratios are pretty similar, which was a period of coexistence of variants. The BA.2\*/BA.1\* and BA.5\*/XBB\* are significantly higher, and mark the complete clearance of the previous variant, respectively BA.1* and XBB*. ### Assessing the association between state-wide Social Vulnerability Index and variant-specific AR The social vulnerability index (SVI) is a metric compiled by the CDC summarizing the social conditions that may affect the outcome in the face of disasters, such as infectious disease outbreaks (22) (**Fig. 4**). The SVI is a summary metric, incorporating 4 main domains: socioeconomic status; household characteristics; racial and ethnic minority status; and housing type and transportation. States that are high on the SVI scale tend to have larger populations and are primarily concentrated in the southern half of the US (**Fig. 4A**). Originally the index was compiled at the census tract and county level; we have aggregated them by state to be able to use it with the state-level estimates of infections by variant. We calculated the correlation between the SVI and the state-level variant-specific AR using Pearson correlation. For each of the variant-specific correlations between the SVI and the AR, we calculate statistical significance with Bonferroni correction for multiple testing. ## Data availability The findings of this study are based on metadata associated with 3,103,250 sequences available on GISAID from September 1st, 2021 up to April 22, 2023, and accessible at [https://doi.org/10.55876/gis8.231023hd](https://doi.org/10.55876/gis8.231023hd) (GISAID Identifier: EPI_SET_231023hd). All genome sequences and associated metadata in this dataset are published in GISAID’s EpiCoV database. To view the contributors of each sequence with details such as accession number, Virus name, Collection date, Originating Lab and Submitting Lab, and the list of Authors, visit [https://doi.org/10.55876/gis8.231023hd](https://doi.org/10.55876/gis8.231023hd) The *covidestim* model uses publicly available data on case and death reports from Johns Hopkins University and the CDC, vaccination data from the CDC and hospitalization data from [Healthdata.gov](http://Healthdata.gov) (34–36). The script to join these data sources is available on Github ([https://www.github.com/covidestim/covidestim-sources](https://www.github.com/covidestim/covidestim-sources)), and the full description on how the data is modeled is available in the linked publications (16,17,19). Both the input data to the model, and the produced estimates used for this analysis are available on Github ([https://www.github.com/covidestim/data-archive](https://www.github.com/covidestim/data-archive)). ## Code availability The pipeline used to calculate the variant-specific infections, attack rates, Rt, Rt ratio, and SVI comparison is available on the following GitHub repository: [https://github.com/rafalopespx/Variant\_infections\_rate](https://github.com/rafalopespx/Variant_infections_rate) ## Data Availability All data produced are available online at: [https://github.com/rafalopespx/Variant\_infections\_rate](https://github.com/rafalopespx/Variant_infections_rate) [https://github.com/rafalopespx/Variant\_infections\_rate](https://github.com/rafalopespx/Variant_infections_rate) ## Author contribution Conceptualization: RL, TC, NDG Methodology: RL, KP, FK, TC, NDG Investigation: RL, SR, AH Visualization: RL Funding acquisition: TC, NDG Supervision: JAS, NAM, TC, NDG Writing – original draft: RL, TC, NDG Writing – review & editing: all authors ## Delration of Interests NDG is a paid consultant for BioNTech. View this table: [Table S1.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/T2) Table S1. Categorization of Pango lineages and sublineages alias View this table: [Table S2.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/T3) Table S2. Peak of infections by variant categories View this table: [Table S3.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/T4) Table S3. Total of infections by variant categories View this table: [Table S4.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/T5) Table S4. Attack rates by variant categories View this table: [Table S5.](http://medrxiv.org/content/early/2024/01/22/2023.11.07.23298178/T6) Table S5. Interval values to the Rt estimates by variant categories ## Acknowledgments We gratefully acknowledge T. Thornhill for the discussion and helpful insights with the Social Vulnerability Index and P. Jack and S. Taylor for technical support, and the authors from the originating laboratories responsible for obtaining the specimens, as well as the submitting laboratories where the genomic data were generated and shared via GISAID, on which this research is based. To view the contributors of each sequence with details such as accession number, Virus name, Collection date, Originating Lab and Submitting Lab, and the list of Authors, visit [https://doi.org/10.55876/gis8.231023hd](https://doi.org/10.55876/gis8.231023hd). All plots use color palettes from the ‘MetBrewer’ R package [https://github.com/BlakeRMills/MetBrewer](https://github.com/BlakeRMills/MetBrewer). This project is supported by Cooperative Agreement NU38OT000297 from the Centers for Disease Control and Prevention (CDC) and the Council of State and Territorial Epidemiologists (CSTE), SHEPheRD Contract 200-2016-91779 from the CDC, and the CDC Broad Agency Announcement Contract 75D30122C14697. This work does not necessarily represent the views of the CDC or CSTE. ## Footnotes * † Co-senior authors * Figure 1 updated; Table 1 updated; supplementary tables updated * Received November 7, 2023. * Revision received January 18, 2024. * Accepted January 22, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.1. WHO Coronavirus (COVID-19) Dashboard [Internet]. [cited 2023 Oct 22]. Available from: [https://covid19.who.int](https://covid19.who.int) 2. 2.2. Davies NG, Abbott S, Barnard RC, Jarvis CI, Kucharski AJ, Munday JD, et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science [Internet]. 2021 Apr 9 [cited 2023 Oct 16];372(6538):eabg3055. Available from: [https://www.science.org/doi/full/10.1126/science.abg3055](https://www.science.org/doi/full/10.1126/science.abg3055) 3. 3.3. Roemer C, Sheward DJ, Hisner R, Gueli F, Sakaguchi H, Frohberg N, et al. SARS-CoV-2 evolution in the Omicron era. Nat Microbiol [Internet]. 2023 Oct 16 [cited 2023 Oct 17];1–8. Available from: [https://www.nature.com/articles/s41564-023-01504-w](https://www.nature.com/articles/s41564-023-01504-w) 4. 4.Martin DP, Weaver S, Tegally H, San JE, Shank SD, Wilkinson E, et al. The emergence and ongoing convergent evolution of the SARS-CoV-2 N501Y lineages. Cell [Internet]. 2021 Sep 30 [cited 2023 Oct 17];184(20):5189–5200.e7. Available from: [https://www.sciencedirect.com/science/article/pii/S0092867421010503](https://www.sciencedirect.com/science/article/pii/S0092867421010503) 5. 5.Tegally H, Wilkinson E, Giovanetti M, Iranzadeh A, Fonseca V, Giandhari J, et al. Detection of a SARS-CoV-2 variant of concern in South Africa. Nature [Internet]. 2021 Apr [cited 2023 Oct 22];592(7854):438–43. Available from: [https://www.nature.com/articles/s41586-021-03402-9](https://www.nature.com/articles/s41586-021-03402-9) 6. 6.Faria NR, Mellan TA, Whittaker C, Claro IM, Candido D da S, Mishra S, et al. Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science [Internet]. 2021 May 21 [cited 2023 Oct 22];372(6544):815–21. Available from: [https://www.science.org/doi/10.1126/science.abh2644](https://www.science.org/doi/10.1126/science.abh2644) 7. 7.Earnest R, Uddin R, Matluk N, Renzette N, Turbett SE, Siddle KJ, et al. Comparative transmissibility of SARS-CoV-2 variants Delta and Alpha in New England, USA. Cell Rep Med [Internet]. 2022 Apr 19 [cited 2023 Jun 14];3(4). Available from: [https://www.cell.com/cell-reports-medicine/abstract/S2666-3791(22)00090-8](https://www.cell.com/cell-reports-medicine/abstract/S2666-3791(22)00090-8) 8. 8.Viana R, Moyo S, Amoako DG, Tegally H, Scheepers C, Althaus CL, et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature [Internet]. 2022 Mar [cited 2023 Oct 22];603(7902):679–86. Available from: [https://www.nature.com/articles/s41586-022-04411-y](https://www.nature.com/articles/s41586-022-04411-y) 9. 9.Tegally H, Moir M, Everatt J, Giovanetti M, Scheepers C, Wilkinson E, et al. Emergence of SARS-CoV-2 Omicron lineages BA.4 and BA.5 in South Africa. Nat Med [Internet]. 2022 Sep [cited 2023 Oct 22];28(9):1785–90. Available from: [https://www.nature.com/articles/s41591-022-01911-2](https://www.nature.com/articles/s41591-022-01911-2) 10. 10.Volz E, Mishra S, Chand M, Barrett JC, Johnson R, Geidelberg L, et al. Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature [Internet]. 2021 May [cited 2023 Oct 24];593(7858):266–9. Available from: [https://www.nature.com/articles/s41586-021-03470-x](https://www.nature.com/articles/s41586-021-03470-x) 11. 11.Brito AF, Semenova E, Dudas G, Hassler GW, Kalinich CC, Kraemer MUG, et al. Global disparities in SARS-CoV-2 genomic surveillance. Nat Commun [Internet]. 2022 Nov 16 [cited 2023 Oct 22];13(1):7003. Available from: [https://www.nature.com/articles/s41467-022-33713-y](https://www.nature.com/articles/s41467-022-33713-y) 12. 12.Bajema KL, Wiegand RE, Cuffe K, Patel SV, Iachan R, Lim T, et al. Estimated SARS-CoV-2 Seroprevalence in the US as of September 2020. JAMA Intern Med [Internet]. 2021 Apr 1 [cited 2023 Oct 16];181(4):450–60. Available from: doi:10.1001/jamainternmed.2020.7976 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamainternmed.2020.7976&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33231628&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F01%2F22%2F2023.11.07.23298178.atom) 13. 13.Oran DP, Topol EJ. Prevalence of Asymptomatic SARS-CoV-2 Infection. Ann Intern Med [Internet]. 2020 Sep [cited 2023 Oct 26];173(5):362–7. Available from: [https://www.acpjournals.org/doi/10.7326/M20-3012](https://www.acpjournals.org/doi/10.7326/M20-3012) 14. 14.Hitchings MDT, Dean NE, García-Carreras B, Hladish TJ, Huang AT, Yang B, et al. The Usefulness of the Test-Positive Proportion of Severe Acute Respiratory Syndrome Coronavirus 2 as a Surveillance Tool. Am J Epidemiol [Internet]. 2021 Jul 1 [cited 2023 Oct 26];190(7):1396–405. Available from: doi:10.1093/aje/kwab023 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwab023&link_type=DOI) 15. 15.Rader B. Use of At-Home COVID-19 Tests — United States, August 23, 2021–March 12, 2022. MMWR Morb Mortal Wkly Rep [Internet]. 2022 [cited 2023 Oct 19];71. Available from: [https://www.cdc.gov/mmwr/volumes/71/wr/mm7113e1.htm](https://www.cdc.gov/mmwr/volumes/71/wr/mm7113e1.htm) 16. 16.Klaassen F, Chitwood MH, Cohen T, Pitzer VE, Russi M, Swartwood NA, et al. Population Immunity to Pre-Omicron and Omicron Severe Acute Respiratory Syndrome Coronavirus 2 Variants in US States and Counties Through 1 December 2021. Clin Infect Dis [Internet]. 2023 Feb 1 [cited 2023 Oct 16];76(3):e350–9. Available from: doi:10.1093/cid/ciac438 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/cid/ciac438&link_type=DOI) 17. 17.Klaassen F, Chitwood MH, Cohen T, Pitzer VE, Russi M, Swartwood NA, et al. Changes in Population Immunity Against Infection and Severe Disease From Severe Acute Respiratory Syndrome Coronavirus 2 Omicron Variants in the United States Between December 2021 and November 2022. Clin Infect Dis [Internet]. 2023 Aug 1 [cited 2023 Oct 16];77(3):355–61. Available from: doi:10.1093/cid/ciad210 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/cid/ciad210&link_type=DOI) 18. 18.Ankomah PO, Siedner MJ, Bhattacharyya RP. Pre-Existing Population Immunity and severe acute respiratory syndrome coronavirus 2 Variant Establishment and Dominance Dynamics in the United States: An Ecological Study. Open Forum Infect Dis [Internet]. 2022 Dec 2 [cited 2023 May 11];9(12):ofac621. Available from: [https://academic.oup.com/ofid/article/doi/10.1093/ofid/ofac621/6916970](https://academic.oup.com/ofid/article/doi/10.1093/ofid/ofac621/6916970) 19. 19.Chitwood MH, Russi M, Gunasekera K, Havumaki J, Klaassen F, Pitzer VE, et al. Reconstructing the course of the COVID-19 epidemic over 2020 for US states and counties: Results of a Bayesian evidence synthesis model. PLOS Comput Biol [Internet]. 2022 Aug 30 [cited 2023 Oct 6];18(8):e1010465. Available from: [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010465](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010465) 20. 20.Nouvellet P, Cori A, Garske T, Blake IM, Dorigatti I, Hinsley W, et al. A simple approach to measure transmissibility and forecast incidence. Epidemics [Internet]. 2018 Mar [cited 2023 Apr 26];22:29–35. Available from: [https://linkinghub.elsevier.com/retrieve/pii/S1755436517300245](https://linkinghub.elsevier.com/retrieve/pii/S1755436517300245) 21. 21.Nash RK, Nouvellet P, Cori A. Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. Tizzoni M, editor. PLOS Digit Health [Internet]. 2022 Jun 27 [cited 2023 Apr 26];1(6):e0000052. Available from: [https://dx.plos.org/10.1371/journal.pdig.0000052](https://dx.plos.org/10.1371/journal.pdig.0000052) 22. 22.CDC/ATSDR SVI Data and Documentation Download | Place and Health | ATSDR [Internet]. 2022 [cited 2023 Oct 6]. Available from: [https://www.atsdr.cdc.gov/placeandhealth/svi/data\_documentation\_download.html](https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html) 23. 23.Kandeel M, Mohamed MEM, Abd El-Lateef HM, Venugopala KN, El-Beltagi HS. Omicron variant genome evolution and phylogenetics. J Med Virol [Internet]. 2022 [cited 2023 Oct 16];94(4):1627–32. Available from: [https://onlinelibrary.wiley.com/doi/abs/10.1002/jmv.27515](https://onlinelibrary.wiley.com/doi/abs/10.1002/jmv.27515) 24. 24.Lentini A, Pereira A, Winqvist O, Reinius B. Monitoring of the SARS-CoV-2 Omicron BA.1/BA.2 lineage transition in the Swedish population reveals increased viral RNA levels in BA.2 cases. Med [Internet]. 2022 Sep [cited 2023 Jul 25];3(9):636–643.e4. Available from: [https://linkinghub.elsevier.com/retrieve/pii/S2666634022003178](https://linkinghub.elsevier.com/retrieve/pii/S2666634022003178) 25. 25.van Dorp C, Goldberg E, Ke R, Hengartner N, Romero-Severson E. Global estimates of the fitness advantage of SARS-CoV-2 variant Omicron. Virus Evol [Internet]. 2022 Jul 1 [cited 2023 Oct 24];8(2):veac089. Available from: doi:10.1093/ve/veac089 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ve/veac089&link_type=DOI) 26. 26.Britton T, Scalia Tomba G. Estimation in emerging epidemics: biases and remedies. J R Soc Interface [Internet]. 2019 Jan [cited 2023 Apr 26];16(150):20180670. Available from: [https://royalsocietypublishing.org/doi/10.1098/rsif.2018.0670](https://royalsocietypublishing.org/doi/10.1098/rsif.2018.0670) 27. 27.Volz E. Fitness, growth and transmissibility of SARS-CoV-2 genetic variants. Nat Rev Genet [Internet]. 2023 Jun 16 [cited 2023 Jun 21];1–11. Available from: [https://www.nature.com/articles/s41576-023-00610-z](https://www.nature.com/articles/s41576-023-00610-z) 28. 28.Jones JM. Estimates of SARS-CoV-2 Seroprevalence and Incidence of Primary SARS-CoV-2 Infections Among Blood Donors, by COVID-19 Vaccination Status — United States, April 2021–September 2022. MMWR Morb Mortal Wkly Rep [Internet]. 2023 [cited 2023 Oct 10];72. Available from: [https://www.cdc.gov/mmwr/volumes/72/wr/mm7222a3.htm](https://www.cdc.gov/mmwr/volumes/72/wr/mm7222a3.htm) 29. 29.Goldberg EE, Lin Q, Romero-Severson EO, Ke R. Swift and extensive Omicron outbreak in China after sudden exit from ‘zero-COVID’ policy. Nat Commun [Internet]. 2023 Jul 1 [cited 2023 Oct 10];14(1):3888. Available from: [https://www.nature.com/articles/s41467-023-39638-4](https://www.nature.com/articles/s41467-023-39638-4) 30. 30.Zhang X, Wu S, Wu B, Yang Q, Chen A, Li Y, et al. SARS-CoV-2 Omicron strain exhibits potent capabilities for immune evasion and viral entrance. Signal Transduct Target Ther [Internet]. 2021 Dec 17 [cited 2023 Oct 17];6(1):1–3. Available from: [https://www.nature.com/articles/s41392-021-00852-5](https://www.nature.com/articles/s41392-021-00852-5) 31. 31.Hirabara SM, Serdan TDA, Gorjao R, Masi LN, Pithon-Curi TC, Covas DT, et al. SARS-COV-2 Variants: Differences and Potential of Immune Evasion. Front Cell Infect Microbiol [Internet]. 2022 [cited 2023 Oct 16];11. Available from: [https://www.frontiersin.org/articles/10.3389/fcimb.2021.781429](https://www.frontiersin.org/articles/10.3389/fcimb.2021.781429) 32. 32.Petros BA, Turcinovic J, Welch NL, White LF, Kolaczyk ED, Bauer MR, et al. Early Introduction and Rise of the Omicron Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Variant in Highly Vaccinated University Populations. Clin Infect Dis [Internet]. 2023 Feb 1 [cited 2023 Oct 16];76(3):e400–8. Available from: doi:10.1093/cid/ciac413 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/cid/ciac413&link_type=DOI) 33. 33.Carabelli AM, Peacock TP, Thorne LG, Harvey WT, Hughes J, de Silva TI, et al. SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nat Rev Microbiol [Internet]. 2023 Mar [cited 2023 Oct 17];21(3):162–77. Available from: [https://www.nature.com/articles/s41579-022-00841-7](https://www.nature.com/articles/s41579-022-00841-7) 34. 34.COVID-19 Reported Patient Impact and Hospital Capacity by Facility | HealthData.gov [Internet]. [cited 2024 Jan 15]. Available from: [https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u/about\_data](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u/about_data) 35. 35.The COVID Tracking Project [Internet]. [cited 2024 Jan 15]. The COVID Tracking Project. Available from: [https://covidtracking.com/](https://covidtracking.com/) 36. 36.CDC. Centers for Disease Control and Prevention. 2020 [cited 2024 Jan 15]. COVID Data Tracker. Available from: [https://covid.cdc.gov/covid-data-tracker](https://covid.cdc.gov/covid-data-tracker) [1]: /embed/graphic-6.gif [2]: /embed/graphic-7.gif [3]: /embed/graphic-8.gif