Abstract
We describe regional variation in the reproduction number of SARS-CoV-2 infections observed using publicly reported data in the UK, with a view to understanding both if there are clear hot spots in viral spread in the country, or other spatial patterns. Based on case data up to the 9th April, we estimate that the viral replication number remains above 1 overall in the UK but that its trend is to decrease. This suggests the peak of the first wave of COVID-19 patients is imminent. We find that there is significant regional variation in the UK and that this is changing over time. Within England currently the reproductive ratio is lowest in the Midlands (1.11 95% CI 1.07; 1.14), and highest in the North East of England (1.38 95% CI 1.33-1.42). There are long and variable time delays between infection and detection of cases, and thus it remains unclear whether the reduction in the reproductive number is a result of social distancing measures. If we are to prevent further outbreaks, it is critical that we both reduce the time taken for detection and improve our ability to predict the regional spread of outbreaks.
Background
In late 2019 an outbreak of a novel infectious disease was detected. It manifested principally with severe acute respiratory distress, and pneumonia1, although many cases followed a mild course2. The pathogen was rapidly identified as a new species of coronavirus (severe acute respiratory syndrome coronavirus 2 - SARS-CoV-2), and the disease named COVID-193. Global transmission of the virus followed and major outbreaks have been observed in Europe, beginning with Italy4. On the 31st Jan 2020 the first cases were identified in the UK5. This was initially managed using testing of suspected individuals in the community, contact tracing and isolation of affected cases. However this was successful only in delaying the spread of the disease and on 13th March 2020 the UK government moved towards a mitigation strategy reserving testing for hospital inpatients only6. Following this, a stepwise implementation of social distancing measures were mandated by the government including voluntary self isolation of any symptoms & vulnerable people7, a ban on non essential travel worldwide8 and school closures9. Finally on 23rd March 2020 the government mandated that everyone apart from essential workers should stay at home and away from others10
Epidemiological studies conducted during the outbreak in China have provided us with a number of estimates of the parameters describing the virus’s spread through the population including a reproduction number between 2.24 and 3.5811 and a median incubation period of 5.1 days (credible interval 4.5 to 5.8)12. It is estimated that fewer than 2.5% of people will show signs before 2.2 days and 97.5% of people who will develop symptoms will have done so by 11.2 days after exposure 12.
We investigated the reproductive index of SARS-CoV-2 in the UK in order to determine whether there are any obvious spatial or temporal patterns beyond those resulting from the social distancing measures imposed. To identify and try to understand the regional variations, we have collected a time series of confirmed cases of COVID-19 in the UK at a regional level and present them here.
Methods
Data
Over the period of 7th March 2020 until the present day we have collected all the daily incidence statistics officially released on the Public Health England (PHE) COVID-19 website13. This includes a regional breakdown of cases in England by Unitary Authority and by NHS region. It also includes a country level summary of cases in England, Scotland, Wales and Northern Ireland. Regional breakdowns for Scotland and Wales are available from Public Health Wales14 and Public Health Scotland15 respectively. A regional breakdown in Northern Ireland is made available on weekdays, but at present a full time series is not available16.
The time series data (available at https://bit.ly/39qPSs0) is based on daily statistics in which there are various inconsistencies. The local breakdowns provided do not tally with the country wide figures as the locations of some cases were not confirmed at the time of publication. The cases with unknown location are necessarily left unresolved in our collated data. To adjust for these cases, which vary in quantity from day to day, we assumed that they are distributed across the locales in a manner proportional to the daily incidence of cases in the given locale (unitary authority or NHS region as appropriate).
Less frequently, in the time series we find further data quality issues, presumably where cases may have been reassigned from one locale to another between daily publications. This can result in an apparent negative incidence of disease in a given locale for a given day. In this event we have assumed the incidence for that locale is zero. Both these data cleansing steps are performed by a publicly available R library (available at https://github.com/terminological/uk-covid-datatools).
Serial interval estimates
Estimating the reproduction number of SARS-CoV-2 requires an estimate of the expected time interval between infections in a chain of transmission (the serial interval - SI). Case by case transmission data is not published in the UK however there are numerous estimates of the serial interval in the scientific literature from other geographical regions with established outbreaks 17–22. Where specified these articles generally describe the serial interval to be distributed as a gamma function, with parameters as summarised in table 1.
From this data we calculate a weighted mean of the published parameters and inferred a serial interval described by a gamma distribution, parameterised with mean SI of 4.56 days (credible interval: 2.54 - 7.36) and standard deviation 4.53 days (credible interval 4.17 - 5.05). With a serial interval shorter than the estimated incubation period we expect that some transmission will occur before the index case has symptoms.
Statistical methods
Using the inferred serial interval distribution, we analysed the time series data using R23 and the EpiEstim24–26 library to estimate the time varying reproduction numbers between the 11th March until the 9th April, with the underlying assumption that there is negligible mixing of populations between each geographical area. The validity of this assumption decreases as we consider smaller geographies, particularly in the early stages of the outbreak before unnecessary travel was restricted.
Essentially, R(t) is estimated by inferring connections between pairs of cases based on the difference in their dates of symptom onset and an assumed serial interval. The EpiEstim package implements the method outlined by Cori et al. (2013)25, which is based on an original method from Wallinga and Teunis (2004) 26. The method uses a sliding time window during which the instantaneous reproductive number is assumed to be constant. We empirically decided on a 5 day sliding window for calculations of the time varying reproductive index, R(t), which aligns with the median incubation period of the virus, and offered a good trade off between noise and loss of detail. R(t) was visualised using ggplot and r-spatial libraries, using administrative maps from the UK Office for National Statistics27. The R(t) calculation was performed for the various regions of the NHS in England and compared to the overall time series for England, using a two sided t-Test to detect the significance of regional differences between the observations.
In the last part of our analysis we look at the rate of change in R(t) estimated using a simple linear regression. We calculated a regional estimate of the current rate of change of R(t), by fitting a linear equation to the last 10 points, and extracting the derivative of R(t), associated confidence intervals and measures of goodness of fit.
Results
Figure 1 shows the time dependence of the reproductive index of SARS-CoV-2 in different regions of the UK. The value of R(t) is within the estimates described in other countries11 (2.24 to 3.58) and is seen in all regions to be decreasing overall. R(t) remains above or equal to 1 at all times and hence within the region of exponential growth of COVID-19 cases. Over the period from March 19th to March 25th it is notable that, whilst decreasing in England, the reproductive index rose in Northern Ireland, Scotland and Wales before falling again. On the 8th April the R(t) was close to one in all regions apart from England. Two significant dates are marked on the time series. Firstly, the recommendation for specific social isolation of the vulnerable on the 16th March, and secondly, the widespread order to remain at home on the 23rd March. These dates represent the initial and final dates of implementation of social restrictions. The date of one serial interval post-lockdown is also shown.
Table 2 shows the current estimated values of R(t) for each individual administrative region of NHS England. The lowest values for R(t) are observed in the Midlands and in London, whilst higher values at this stage were seen in the North East and the South West. These point estimates are seen to change quite significantly over time.
In Figure 2 we plot the absolute difference of R(t) in the 7 NHS England administrative regions, from the R(t) of England overall, as a baseline. This demonstrates the volatility described above and highlights the regional differences in R(t) over time. Prior to the imposition of social distancing the patterns observed are dominated by noise, as in the early phase of the outbreak the case numbers in individual regions were small. However from the 27th March onwards we can see a clearer trend emerging with the East of England, Midlands, South East and South Western regions approximately tracking the England baseline. London is consistently below this baseline and the North West, and less so North East & Yorkshire are consistently above the baseline. These trends of the differences between the 27th March to the 9th April were further analysed with a pairwise t-Test and presented in table 3 where we can see that the R(t) differences observed in London, the East, the North East and North West of England over this time period are statistically significant. The mean difference in R(t) between London and the North West could be about 0.35, which could have important consequences for the outbreak progression.
In figures 3 and 4 we present the detailed regional breakdown of reproduction index in the 149 unitary authorities in England. These are illustrated at time points representing the start of the 2 social distancing measures implemented by the UK government and described above, and marked on figures 1 and 2. In the individual unitary authority regions case numbers may be quite small so the estimates of R(t) may have wide confidence intervals which are not shown. The full regional breakdown including confidence intervals is available as supplementary materials. In the vast majority of regions and time points the reproductive index is greater than 1, but the same decreasing trend is present.
In figure 5 we present the regional analysis of the current rate of change of R(t). This aims to demonstrate the magnitude in change of R(t) over time and hence give us a sense of whether social distancing measures are currently having the desired effect of reducing the overall velocity of infection. In this time series, if the rate of change of R(t) is positive, the infection is accelerating. Negative values, on the other hand, represent deceleration. We can see that in England, the viral infection has been decelerating from before the onset of the full social distancing policies of the 23rd March. In the other regions of the UK on the other hand we see a continued acceleration of the viral spread until approximately one serial interval after the 23rd March after which the infection begins to decelerate.
In figures 6 and 7 we present a more detailed regional analysis of the rate of change of R(t) by Unitary Authority in England or Local Health Board in Scotland and Wales. We do not have complete data for this regional breakdown in Wales, but in the far right panel representing the most up to date time point, we see that in the vast majority of areas in England the rate of change is negative (cyan) representing a continued day on day fall in transmission. However there is seen to be regional variation in the preceding days and weeks, and that the widespread negative rate of change of R(t) is a relatively recent feature.
Discussion
The results presented above demonstrate a regional and time based variation in the reproductive index of SARS-CoV-2. Currently the replication rate is close to 1 in most regions and the rate of change is negative suggesting that we will see a peak to the infection shortly.
There are several significant dates of both social distancing measures, and methodological changes, presented in this analysis. In interpreting the relationship between the dates and changes in the reproductive index we must remember that changes in reproductive index will lag the introduction of a social measure by at least the incubation period of SARS-CoV-2 (median 5.1 days - 95% IQR 2.2-11.2)12. Given the testing strategy that is currently in place, COVID-19 cases will not be identified until a patient is admitted to hospital with symptoms, the test is performed and results obtained. This introduces further time delays which may vary in length from individual to individual. The delay between onset of symptoms and admission has been estimated at 5 to 9 days28. The delay induced by the test processing is unknown. If we assume a 2 day delay for testing, we would estimate that interventions begin to have an effect from 9 days after the intervention, have half of their impact in 14 days, and begin to have full impact after 22 days.
In the accompanying supplementary materials we present an animation of the R(t) over time, and in the rate of change in R(t) over time. This animation raises the possibility of waves of increasing and decreasing R(t) spreading throughout the UK. If such waves can be identified and predicted we may be able to intercept them in the future, through a targeted application of community testing and a more localised social distancing intervention. For this to be an option however we will need much more granular data on the location of confirmed and suspected cases, and to as far as possible reduce the time delay between infection and detection.
Globally COVID-19 case growth rates are observed to pass through different phases29, with an initial 35% day on day growth rate, typically falling to 22%, presumably after both the introduction of social distancing measures, and as exponential growth in resolved cases reduces the size of the infective cohort. However, because of changes in test strategy, the step wise implementation of social distancing measures, and the variable lag introduced by the incubation period, it is difficult to get a picture of the causative influence of social distancing measures on the drop in R(t) rates over time30. It is however possible to note that significant changes in both R(t), and the rate of change of R(t), predate the implementation of social distancing measures in the UK.
Our analysis demonstrated significant regional variability exists, most notably at a country level, but it does not identify any single location that is a “hot spot” of viral infectivity within the UK, but rather that areas of increased viral replication migrates around the country over time. This migration does not seem entirely random, and is a focus of future investigation.
Limitations
Due to the fragmented nature of reporting of the outbreak across the 4 main countries of the UK there is insufficient time series data available to perform a UK wide regional breakdown of R(t). The availability of time series data is particularly challenging in Northern Ireland and Wales where we only have a very limited data set.
The time series data we have collected has data quality issues identified in the methods section, which has required some data cleansing. Furthermore we believe changes in the methodology, and clinical criteria for testing patients has created an artifactual decrease in the apparent R(t) in the days after the 13th March, which influences the overall ability to draw conclusions.
There was a change in testing strategies that took place around the 13th March, before which there were attempts at community tracing, and after which testing was only performed on hospital admissions for suspected COVID-19. One possible source of regional variation that would influence the results of this analysis is that of potentially differing implementations of testing strategies, and differing processing times for test results, which anecdotally have had long turn around times in some centres. If either of these were observed then the comparisons between regions would be difficult. However we have no evidence that this is the case as until recently the tests for SARS-CoV-2 have been centrally controlled through a network of public health laboratories.
There is an inherent assumption that limited travel has occurred during the period of estimation, we would however expect that some people are moving from denser centres such as London to more rural regions, which could inflate rural figures, and may explain the surge in R(t) in Wales and Scotland, following the 23rd March.
Conclusions
The analysis presented above reflects the state of the ongoing crisis of COVID-19 in the UK, but with a sense that the effective reproduction number is slowly coming under control. The interpretation of the time series data presented is particularly challenging due to the extensive time delay in identifying positive cases. As such it is impossible to conclude whether social distancing is the cause of this improvement.
The current trends in improvement are encouraging and we hope to see bigger changes in the near future, however we note that this improvement is not uniformly distributed across the UK and policy makers will have to be cautious until we can be certain that COVID-19 is under control in all regions. The difference in apparent R(t) between London and the North Western region of the UK is currently approximately 0.35. Until that gap has closed it will be difficult to ease the current country wide restrictions.
As we move forward, assuming we get to a phase after the first peak of SARS-CoV-2 infections has passed we will be in a new phase where the early detection and prevention of spread of emerging clusters SARS-CoV-2 infections is critical to prevent large scale outbreaks. This will be challenging as the long incubation period and high rate of asymptomatic individuals makes undetected rapid spread easy. If it is possible to predict at a more localised level where in the community infections will spread, then we can focus both community testing and more targeted social interventions on high risk areas in the future.
Data Availability
All data is public domain and accessible through GitHub
https://docs.google.com/spreadsheets/d/1snb-vYuH7fVpTuyoQrM8zWiABYoXbSrnn44w-zlhM90/edit?usp=sharing
Competing interests
Support for RC and KTA’s research is provided by the EPSRC via grant EP/N014391/1, RC is also funded by TSFT as part of the NHS Global Digital Exemplar programme (GDE); no financial relationships with any organisations that might have an interest in the submitted work in the previous three years, no other relationships or activities that could appear to have influenced the submitted work.
Funding
RCh and KTA gratefully acknowledges the financial support of the EPSRC via grant EP/N014391/1 and NHS England, Global Digital Exemplar programme. LD and KTA gratefully acknowledges the financial support of The Alan Turing Institute under the EPSRC grant EP/N510129/1. LL acknowledges the financial support of the EPSRC via Early Career Fellowship EP/P01660X/1.
Supplementary materials
Time series spreadsheet:
https://docs.google.com/spreadsheets/d/1snb-vYuH7fVpTuyoQrM8zWiABYoXbSrnn44w-zlhM90/edit?usp=sharing
Github repo:
https://github.com/terminological/uk-covid-datatools
Full R(t) time series by unitary authority:
Animations:
https://github.com/terminological/uk-covid-datatools/blob/master/vignettes/UK_Rt_over_time.gif
Supplementary Figure 1 - Panel A: R(t) over time for unitary authorities in England and local health board boards in Scotland and Wales; Panel B: Rate of change of R(t) over time; Panel C: R(t) over time in London; Panel D: rate of change in R(t) in London. The animation covers the period from March the 7th to April the 9th. Significant dates are marked, on March 16th the government advised all vulnerable people to self isolate (purple) by March the 23rd (cyan) this was extended to all non essential people. March the 27th (green) marks one serial interval after that full “lock down” and April the 9th (red) is the latest time point available.
This supplementary material is hosted by Eurosurveillance as supporting information alongside the article “Estimates of regional infectivity of COVID-19 following imposition of social distancing measures”, on behalf of the authors, who remain responsible for the accuracy and appropriateness of the content. The same standards for ethics, copyright, attributions and permissions as for the article apply. Supplements are not edited by Eurosurveillance and the journal is not responsible for the maintenance of any links or email addresses provided therein