Abstract
There has been an outbreak of coronavirus disease (COVID-19) in Wuhan city, Hubei province, China since December 2019. Cases have been exported to other parts of China and more than 20 countries. We provide estimates of the daily trend in the size of the epidemic in Wuhan based on detailed information of 10,940 confirmed cases outside Hubei province.
Background
As of February 13, 2020, the National Health Commission (NHC) of China has confirmed a total of 63,851 cases of COVID-19 in mainland China, including 10,204 severe cases, 1,380 deaths, and 6,723 recoveries. An additional total of 10,109 suspected cases were reported. Wuhan, the epicenter of the COVID-19 outbreak, has 35,991 confirmed cases. The NHC has also received 53 confirmed reports in Hong Kong Special Administrative Region, China, 10 in Macau Special Administrative Region, China, and 18 in Taiwan, China. [1] More than 500 cases have been detected outside China.
Despite the considerable medical resources and personnel that have been dispensed to combat COVID-19 in Hubei province, hospital capacity continues to be overburdened. There continues to be a shortage of hospital beds needed to accommodate the rising number of COVID-19 patients. In response to this growing crisis, Wuhan plans to transform hotels, venues, training centers and college dorms into quarantine and treatment centers for COVID-19 patients. Further, 13 mobile cabin hospitals will be built to provide over 10,000 beds. [2] Therefore, a careful and precise understanding of the potential number of cases in Wuhan is crucial for the prevention and control of the COVID-19 outbreak. Wu et al. (2020) provided an estimate of the total number of cases of COVID-19 in Wuhan, using the number of cases exported from Wuhan to cities outside mainland China. [3] However, since the number of cases exported from Wuhan to cities outside mainland China is small, their estimate of the size of the epidemic in Wuhan may not be precise and has large variability. Using the number of cases exported from Wuhan to all cities, including cities in China, outside Hubei Province, You et al. (2020) proposed a new method to estimate the total number of cases of COVID-19 in Wuhan. [4] However, their method can only give an estimate of the cumulative number of cases until a certain date.
In this article, we propose a new statistical method to estimate daily number of cases in Wuhan under a similar dynamic equation model as the one in [3]. Unlike the one in [3], our method can also handle the missing information on whether a case is exported from Wuhan.
Results
We estimate the number of cases that should be reported in Wuhan by January 11, 2020, is 4,094 (95% confidence interval [CI]: 3,980 – 4,211) and 58,153 (95% CI: 56,532 – 59,811) by February 13, 2020. Figure 1 shows how the estimated number of cases in Wuhan increases over time, together with the 95% confidence bands. As shown in Figure 2, the reporting rate has grown rapidly from 1.41% (95% CI: 1.37% - 1.45%) on January 20, 2020, to 61.89% (95% CI: 60.17% - 63.66%) on February 13, 2020. The date of first infection is estimated as November 30, 2019.
Data Description
Data retrieved from publicly available records from provincial and municipal health commissions in China and ministries of health in other countries included detailed information for 10,940 confirmed cases outside Hubei province, including region, gender, age, date of symptom onset, date of confirmation, history of travel or residency in Wuhan, and date of departure from Wuhan. Among the 7,500 patients with gender data, 3,509 (46.79%) are female. The mean age of patients is 44.48 and the median age is 44. The youngest confirmed patient outside Hubei province was only five days old while the oldest is 97 years old.
We display the epidemiological data categorized by the date of confirmation in Table 2. An imported case means a patient that had been to Wuhan and was detected outside Hubei province. A local case means a confirmed case that had not been to Wuhan. Among the total of 10,940 cases, 6,903 (63.10%) have such epidemiological information. The number of imported cases reached its peak on January 29, 2020, and the fourth column of Table 2 shows that the proportion of imported cases declines over time. This might reflect the effect of containment measures taken in Hubei province to control the COVID-19 outbreak. [5] Meanwhile, the daily counts of local cases are over 300 from February 2, 2020, to February 7, 2020, which indicate that infections among local residents should be a major concern for authorities outside Hubei province.
Demographic Characteristics of Patients with COVID-19 outside Hubei Province.
Patient data categorized by the date of confirmation.
The last column of Table 2 lists the mean time from symptom onset to confirmation for patients confirmed on each day. The median duration of all cases is 5 days, and the mean is 5.54 days. In general, the detection period decreased in the first week after January 20, 2020, but increased since then. The improvements in detection speed and capacity might cause the initial decline, and the rise may be due to more thorough screening, leading to the detection of patients with mild symptoms who would otherwise not go to the hospitals. [6]
Assumptions
The proposed method relies on the following assumptions:
Between January 10, 2020, and January 23, 2020, the average daily proportion of departing from Wuhan is p.
There is a d = d1 + d2-day window between infection to detection, including a d1-day incubation period and a d2-day delay from symptom onset to detection.
Trip durations are long enough that a traveling patient infected in Wuhan will develop symptoms and be detected in other places rather than after returning to Wuhan.
All travelers leaving Wuhan, including transfer passengers, have the same risk of infection as local residents.
Traveling is independent of the exposure risk to COVID-19 or of infection status.
Patients are not able to travel d days after infection.
Recoveries are not considered in this method.
The proportion of imported cases in the patients with no information is the same as the observed proportion on each day.
We next make some remarks about our assumptions.
January 10, 2020, is the start of Chinese New Year travel rush, and January 23, 2020, is the date of Wuhan lockdown. [5] In the total of 10,940 cases, only 131 (1.2%) cases’ date of departure from Wuhan are not in this period. They are excluded from our analysis.
If the true average daily proportion of leaving Wuhan is larger the assumed p, this violation of Assumption 1 could lead to overestimation of the number of cases in Wuhan,
If the average time from infection to detection is longer than the assumed d days, this violation of Assumption 2 would lead to an overestimation.
If travelers have a lower risk of infection than residents in Wuhan, this violation of Assumption 4 would cause an underestimation.
If infected individuals are less likely to travel due to the health conditions, this violation of Assumption 5 would cause an underestimation.
Given that the number of recoveries in early days of outbreak is relatively small compared to the number of COVID-19 patients, Assumption 8 should not significantly influence the result.
We perform Sensitivity Analysis on the effect of some of the violations on our results.
Methods
The spread of COVID-19 outside Hubei province is relatively controlled given the adequate medical resources. We use the reported number outside Hubei as it is a fairly accurate representation of the actual epidemic situation. In this modelling study, we first estimate the epidemic size in Wuhan from January 11, 2020, to February 13, 2020, based on the confirmed cases outside Hubei province that left Wuhan by January 23, 2020. Since some confirmed cases have no information on whether they visited Wuhan before, we adjust the number of imported cases after taking these missing values into account. We then calculate the reporting rate in Wuhan from January 20, 2020, to February 13, 2020. Finally, we estimate the date when the first patient was infected.
Notations
Let Day t0 denote the date of infection for the very first case. Let Nt be the cumulative number of cases that should be confirmed in Wuhan by Day t. Other notations of our model are defined in Table 3.
Notations for our model.
Model
The growth trend of the size Nt of infected population is determined by the following ordinary differential equation:
where K is the size of the population that are susceptible to COVID-19 in Wuhan, and r is a constant that controls the growth rate of Nt. This is the simplified version of the famous SIR model [3, 7] in epidemiology. It is a good model at early stage of the epidemic when the number of recoveries is still relatively small compared to infected cases. The growth rate of Nt is proportional to the product of Nt and the number K − Nt of people that are susceptible but not infected yet. The equation (1) has an analytical solution
Where , and the derivative
is maximized at
is the growth rate of log Nt at time tc, K is a parameter to be estimated.
Estimation
We use data on the confirmed cases who left Wuhan between January 10, 2020 and January 23, 2020, to estimate K. Under Assumption 2, cases infected on Day t will be detected on Day t + d, so the number of infected cases in Wuhan is Nt+d on Day t. If t0 ≤ t ≤ t0 + d, there should be no confirmed cases. If t0 + d < t ≤ t0 + 2d, imported cases on Day t are infected in Wuhan on Day t − d. If t > t0 + 2d, under Assumption 6, Nt−d patients are not able to travel. There are Nt infected cases in Wuhan on Day t − d, the number of imported cases xt on Day t follows a binomial(Nt, p) distribution, where p is the assumed average daily proportion of leaving Wuhan between January 10, 2020, and January 23, 2020. Let Xt be the cumulative number of imported cases by Day t, then
From equations (2) and (3), . The parameter estimate
is derived by maximizing the likelihood function
The lower and upper bound of the 95% confidence interval are values such that the cumulative distribution function
equals 0.975 and 0.025, respectively. The reporting rate is the reported cumulative number of cases in Wuhan on Day t divided by our estimated number
. The estimate of the date t0 of first infection is obtained by solving the equation
.
Determining the number of imported cases xt plays a crucial role in the modeling procedure. Note that not all cases have clear records on the history of travel or residency in Wuhan, we need to impute the missing values. Under Assumption 8, the proportion of imported cases in the Ut patients with no information is the same as the observed proportion . Therefore,
The average daily proportion of leaving Wuhan between January 10, 2020 and January 23, 2020 is estimated to be the ratio of daily volume of travelers to the population of Wuhan (14 million). More than 5 million people have left Wuhan due to the Spring Festival and epidemic. [8]. The Chinese New Year travel rush started at January 10, 2020, and the lockdown of Wuhan city happened on January 23, 2020. During the travel rush, 34% of the passengers traveled across 300 km. [9] Major cities outside Hubei province are generally over 300 km from Wuhan. This would imply, on average, the daily probability p of traveling from Wuhan to places outside Hubei province would be 5*0.34/14/14=0.009. Li et al. estimated that the mean incubation period of 425 patients with COVID-19 was 5.2 days (95% CI, 4.1 - 7.0). [10] The mean time from symptom onset to detection calculated from our data is 5.54 days, so we choose d1 = d2 = 5 days. January 29, 2020, has the maximum count of imported cases. Since xt has a binomial(Nt − Nt−d, p) distribution with constant p, Nt − Nt−d also reaches its maximum at t = January 29, 2020. From the logistic function (2), tc is the midpoint of t and t − d, that is , which is shortly after the lockdown of Wuhan city. [5] Wu et al. estimated the epidemic doubling time as 6.4 days (95% CI: 5.8 – 7.1) as of January 25, 2020. [3] From this result, we estimate that
. Using these values for parameters p, d, t, and r, we can derive the maximum likelihood estimate
, with 95% confidence interval [57567, 60906].
Sensitivity Analysis
We explore the sensitivity of the estimate of total cases in Wuhan to our assumptions and choices of parameters p, d, and r. Note that .
Compared to the baseline, the parameters are expanded or shrunk by about 30% to reflect the possible uncertainty. Table 4 summaries the estimate the number of cases should be reported on January 11, 2020, and February 13, 2020, under baseline assumptions and alternative scenarios. Confidence intervals are omitted. The currently reported number 35,991 on February 13, 2020, is substantially smaller than the estimate of our most conservative scenario.
Estimated case numbers on January 11 and February 13, 2020, based on different choices of parameters.
Conclusions
The estimated reporting case rate has increased rapidly, reaching over 30% by February 11, 2020. It is almost doubled in the following two days, mainly due to the inclusion of 14,031 clinically diagnosed cases in the case reports of Wuhan. This might indicate that the testing capacity of Wuhan is insufficient. Clinical diagnosis could be a good complement to the current method of confirmation. The currently reported number of 35,991 cases as of February 13, 2020, is still far below our estimate of 58,153. There may still be a lot of unreported cases. More thorough screening of all patients with a mild or moderate symptoms of respiratory diseases should be conducted to better control the spread of COVID-19.