1 Summary
Background The novel coronavirus (SARS-CoV-2) is currently causing concern in the medical, epidemiological and mathematical communities as the virus is rapidly spreading around the world. Internationally, there are more than 1 200 000 cases detected and confirmed in the world on April 6. The asymptomatic and mild symptomatic cases are just going to be really crucial for us to understand what is driving this epidemic to transmit rapidly. Combining a mathematical model of severe (SARS-CoV-transmission with data from China, South Korea, Italy, France, Germany and United Kingdom, we provide the epidemic predictions of the number of reported and unreported cases for the SARS-CoV-2 epidemics and evaluate the effectiveness of control measures for each country.
Methods We combined a mathematical model with data on cumulative confirmed cases from China, South Korea, Italy, France, Germany and United Kingdom to provide the epidemic predictions and evaluate the effectiveness of control measures. We divide infectious individuals into asymptomatic and symptomatic infectious individuals. The symptomatic infectious phase is also divided into reported (severe symptoms) and unreported (mild symptoms) cases. In fact, there exists a period for the cumulative number of reported cases to grow (approximately) exponentially in the early phase of virus transmission which is around the implementation of the national prevention and control measures. We firstly combine the date of the implementation of the measures with the daily and cumulative data of the reported confirmed cases to find the most consistent period for the cumulative number of reported cases to grow − approximately exponentially with the formula χ1 exp(χ2 t) χ3, thus we can determine the parameters χ1, χ2, χ3 in this formula and then determine the parameters and initial conditions for our model by using this formula and the plausible biological parameters for SARS-CoV-2 based on current evidence.We then provide the epidemic predictions, evaluate the effectiveness of control measures by simulations of our model.
Findings Based on the simulations using multiple groups of parameters (d1, d2, N), here [d1, d2] is the consistent period for the cumulative number of reported cases to grow approximately exponentially with the formula χ1 exp(χ2 t) χ3 and N is the date at which public intervention measures became effective, we found that the ranges of the turning point, the final size of reported and unreported cases are respectively Feb.6 − 7, 67 000 − 69 000 and 45 000 − 46 000 for China, Feb.29−Mar.1, 9 000 − 9 400and 2 250 − 2 350 for South Korea, Mar.24 − 26, 156 000 − 177 000, and 234 000 − 265 000 for Italy, Mar.30−Apr.9, 104 000 − 212 000, and 177 000 − 318 000 for France, Mar.30−Apr.20, 141 000 − 912 000, and 197 000 − 1 369 000 for Germany, Apr.1−May12, 140 000 − 473 000, and 210 000 − 709 000 for UnitedKingdom. Our prediction relies on the cumulative data of the reported confirmed cases. As more data become available, the ranges become smaller and smaller, that means the prediction becomes better and better. It is evident that our estimates and simulations have shown good correspondence with the distribution of the cumulative data available of the reported confirmed cases for each country and in particularly, the curves plotted by using different parameter groups (d1, d2, N) for reported and unreported cases tend to be consistent in China and South Korea (see (e) in Figures 2-3). For Italy, France, Germany and United Kingdom, the prediction can be updated to higher accuracy with on-going day by day reported case data (see Figures 4-7).
Interpretation We used the plausible biological parameters f, ν, η for SARS-CoV-2 based on current evidence which might be refined as more comprehensive data become available. Our prediction also relies on the cumulative data of the reported confirmed cases. Using multiple groups of parameters (d1, d2, N), we have attempted to make the best possible prediction using the available data. We found that with more cumulative data available, the curves plotted by using different parameter groups (d1, d2, N) for reported and unreported cases will be closer and closer, and finally tend to be consistent. This shows that when we have no enough cumulative data available, we need to use all possible parameter groups to predict the range of turning point, the final size of reported and unreported cases. When we have enough cumulative data, for example, when we get the data after the turning point, we only need to use any one of these parameter groups to get a prediction with high accuracy.
Funding NSFC (Grant No. 11871007), NSFC and CNRS (Grant No. 11811530272) and the Fundamental Research Funds for the Central Universities.
2 Introduction
As coronavirus outbreaks surge worldwide, more and more facts [8] show that many new patients which are asymptomatic or have only mild symptoms can transmit the virus. Research [9] traced COVID-19 infections which resulted from a business meeting in Germany attended by someone infected but who showed no symptoms at the time and found that four people were ultimately infected from that single contact. It has also been confirmed that asymptomatic transmission occurs in [1] which echoes the report in [9]. A German research team [11] showed that some new crown pneumonia patients had higher viral levels in the throat swabs during the early stage of the disease, that is, when the symptoms were mild. It is reported in [7] that 13 evacuees from Wuhan, China on chartered flights were infected, of whom 4, or 31%, never developed symptoms. The estimated asymptomatic proportion in [6] is at 17.9% which overlaps with a derived estimate of 31% from data of Japanese citizens evacuated from Wuhan [7]. A team in China [10] suggests that by February 18, there were 37,400 people with the virus in Wuhan whom authorities didn’t know about. The asymptomatic and mild symptomatic cases were missed because authorities aren’t doing enough testing, or ‘preclinical cases’ in which people are incubating the virus but would not be ill enough to seek medical help, would probably slip past screening methods such as temperature checks. The asymptomatic and mild symptomatic cases are just going to be really crucial for us to understand the rapid transmission.
In previous works [3, 4, 5], our team has developed differential equations models of COVID-19 epidemics. Our goal was to predict forward in time the future number of cases from early reported case data in regions throughout the world. Our models incorporate the following important elements of COVID-19 epidemics: (1) the number of asymptomatic infectious individuals (with no or very mild symptoms), (2) the number of symptomatic reported infectious individuals (with severe symptoms) and (3) the number of symptomatic unreported infectious individuals (with mild symptoms). With our model, we can show the prediction of the final size of the asymptomatic infectious, reported (with severe symptoms) and unreported cases (with mild symptoms) which is an important epidemiological problem research teams around the world are trying to solve.
In an early phase of the epidemic, the reported case data grows exponentially, which corresponds to a constant transmission rate. We assume that government measures and public awareness cause this early constant transmission rate to change to a time dependent exponentially decreasing rate. We identify the early constant transmission rate using a method developed in [3]. We then identify the time dependent exponentially decreasing transmission rate from reported case data, and project forward the time-line of the epidemic course. With this time dependent transmission rate, the effectiveness of control measures for each country could be evaluated.
Our model is applicable to COVID-19 epidemics in any region with reported case data, which can be updated to higher accuracy with on-going day by day reported case data.
3. Research in context
Evidence before this study
We searched PubMed, BioRxiv, and MedRxiv for articles published in English, Chinese, French, using the search terms “2019-nCoV", “novel coronavirus", “COVID-19", “SARS-CoV-2", “asymptomatic", “mild symptoms", AND “unreported case” with no time restrictions. We found several estimates of asymptomatic, mild symptoms, and unreported case. However, we obtained no estimations for turning point, the number of asymptomatic, the number of mild symptomatic, and the number of unreported case by combining a mathematical model with a phenomenological model and data.
Added value of this study
Our study gives the estimations of the turning point, the number of asymptomatic, the number of mild symptoms, and the number unreported case by using a mathematical model we present (which includes susceptible, asymptomatic infectious, reported symptomatic infectious, unreported symptomatic infectious individuals), together with a phenomenological model and data. Our model also incorporates government and social distancing measures, through the time-dependent transmission rate. Based on the values of f, µ, N in Table 2, it is evident that the strong government measures such as isolation, quarantine, and public closings, should start as early as possible, and should be as strong as possible. After a first outbreak of SARS-CoV-2, South Korea came back to linearly growing phase now. It seems that this epidemic of SARS-CoV-2 is not easy to eradicate. If the strong government measures are reduced too early or too extensively, the epidemic may return to a new exponential growth phase and then outbreaks again.
Implications of all the available evidence
We see that estimation of the number of asymptomatic infectious and unreported cases has major importance in understanding the severity of this epidemic. Strong measures are needed to curb mild and asymptomatic cases that are fueling the pandemic. Based on the study, for China and South Korea, the major distancing measures can not be reduced too early or too extensively, otherwise the epidemic may return to new exponential growth phase, for Italy, France, Germany and United Kingdom, very strong testing and isolation measures should start as early as possible, and should be as strong as possible.
4. Methods
Model
To provide the epidemic predictions, evaluate the effectiveness of control measures, we fit the following transmission dynamic model to the cumulative confirmed case data in China, South Korea, Italy, France, Germany and United Kingdom: with the initial data
Here t≥ t0 is time in days, t0 is the beginning date in the model of the epidemic, S(t) is the number of individuals susceptible to infection at time t, I(t) is the number of asymptomatic infectious individuals at time t, R(t) is the number of reported sever symptomatic infectious individuals at time t, and U (t) is the number of unreported mild symptomatic infectious individuals at time t.
The transmission rate at time t is τ (t). Asymptomatic infectious individuals I(t) are infectious for an average period of 1/ν days. Reported symptomatic individuals R(t) are infectious for an average period of 1/η days, as are unreported symptomatic individuals U (t). We assume that reported symptomatic infectious individuals R(t) are reported and isolated immediately, and cause no further infections. The asymptomatic individuals I(t) can also be viewed as having a low-level symptomatic state. All infections are acquired from either I(t) or U (t) individuals. f is the fraction of asymptomatic infectious that become reported symptomatic infectious. The rate asymptomatic infectious become reported symptomatic is ν1 = f ν, the rate asymptomatic infectious become unreported symptomatic is ν2 = (1 f) ν, where ν1 + ν2 = ν. The cumulative number of reported cases at time t is given by the formula and the cumulative number of unreported at time t is given by the formula
The interpretation of the parameters and initial conditions of the model are given in Table 1 and a flow diagram of the model is given in Figure 1.
Method to estimate the parameters
We assume that (100 f) % of symptomatic infectious cases go reported. The actual value of f is unknown and varies from country to country. We assume η = 1/7, which means that the average period of infectiousness of both reported and unreported symptomatic infectious individuals is 7 days. We assume ν = 1/7, which means that the average period of infectiousness of asymptomatic infectious individuals is 7 days. These values can be modified as further epidemiological information becomes known.
After a period of linear growth, the epidemic of COVID-19 starts to enter into a second phase where the cumulative number of reported cases CR(t) grows approximately exponentially. We assume that during this phase cumulative number of reported cases is described by the following phenomenological model:
We fix the value χ3 = 1. The values of χ1 and χ2 are fitted to cumulative reported case data in the second phase when it is recognized that CR(t) is growing exponentially (i.e. we use an exponential fit χ1 exp(χ2 t) to fit the data CR(t) + 1). We assume the initial value S0, corresponds to the population of the region of the reported case data. The value of the susceptible population S(t) is assumed to be only slightly changed by removal of the number of people infected in the beginning of the second phase. The following formulas for I0, U0, t0, τ0, and 0 were derived in [3]. Their numerical values are identified by using (4.5) from the exponential growth phase of the epidemic. The other initial conditions are
The transmission rate τ (t) is assumed to be constant when the number of reported infectious symptomatic cases starts growth exponentially fast:
The model starting time of the epidemic is
The value of the basic reproductive number is
When strong government measures such as isolation, quarantine, and public closings are implemented, the last phase of the epidemic begins. The actual effects of these measures are complex, and we use an exponential decrease for a time-dependent decreasing transmission rate τ (t) in the third phase to incorporate these effects. The formula for τ (t) is
The date N and the value µ are chosen so that the cumulative reported cases in the numerical simulation of the epidemic aligns with the cumulative reported case data after day N when the public measures take effect. In this way we are able to project forward the time-path of the epidemic after the government imposed public restrictions take effect.
The daily number of reported cases from the model can be obtained by computing the solution of the following equation:
A major challenge for the predictions from the reported cases data, is to determine the date interval for the second phase when the number of reported infectious symptomatic cases grows exponentially and the date N (at which public intervention measures became effective), which are key elements of our model, and they depend strongly on the implementation of social distancing measures. If these measures are implemented gradually, then the difficulty is increased. Usually, the measures government implemented took effect in daily reported cases after some days.
Another difficulty in applying our model is how to fix the value of the parameter f. A decreased value of f corresponds to a greater final size of the epidemic. The value of f is unknown, but information about the level of testing relates to the value of f. The increased testing can increase the value of f. Mortality can also be used as a reference to estimate the value of f. High mortality indicates high unreported ratio. In fact, from the values of f, N and µ, we can also obtain some information of the actual effects of these measures of testing, quarantining and isolation implemented by the governments in these countries.
The principle of our method is the following. By using an exponential best fit method we obtain a best fit of (4.5) to the data over a time [d1, d2] and we derive the parameters χ1 and χ2. We fix f = 0.4, 0.6 or 0.8, ν = 1/7 and η = 1/7. The values of I0 U0, τ0, and t0 are obtained by using (4.6)-(4.8). Next we fix N to some values and we obtain µ by trying to get the best to the data.
The uncertainty in our prediction is due to the fact that several sets of parameters (d1, d2, N) may give a good fit to the data. As a consequence, at the early stage of the epidemics (in particular before the turning point) the outcome of our method can be very different from one set of parameters (d1, d2, N) to another. We try to solve this uncertainty problem by using several choice of the period [d1, d2] to fit an exponential growth of the data to determine χ1 and χ2 and using several choice for N (the parameter χ3 = 1 being fixed). We vary the time interval [d1, d2] during which we use the data to obtain χ1 and χ2 by using an exponential fit. In the simulations below we vary the first day d1, the last day d2, N (date at which public intervention measures became effective) such that all possible sets of parameters (d1, d2, N) will be considered. For each (d1, d2, N) we evaluate µ to obtain the best fit of the model to the data. We use the mean absolute deviation as the distance to data to evaluate the best fit to the data. We obtain a large number of best fit depending on (d1, d2, N, f) and we plot the smallest mean absolute deviation MADmin. Then we plot all the best fit with mean absolute deviation between MADmin and MADmin + 5.
Remark 4.1 The number 5 chosen in MADmin + 5 is questionable. We use this value for all the simulations since it gives sufficiently many runs that are fitting very well the data and which gives later a sufficiently large deviation.
5. Results
Combining a mathematical model with multiple groups of parameters (d1, d2, N), we project the future number of cases, both reported and unreported for China, South Korea, Italy, France, Germany and United Kingdom. Here [d1, d2] is the period to fit an exponential growth of the data to determine χ1 and χ2, and N is the date at which public intervention measures became effective. For each (d1, d2, N) we evaluate µ to obtain the best fit of the model to the data. We use the mean absolute deviation asthe distance to data to evaluate the best fit to the data. We obtain a large number of best fit depending on (d1, d2, N) and we plot the one with the smallest mean absolute deviation MADmin and all the best fit with mean absolute deviation in between MADmin and MADmin + 5. Thus we could summarize the parameters giving the best fit to the data so far. In Table 2 and show the range of the turning point, the final size for both reported and unreported cases, the maximum number of the daily reported cases in Table 3 for each country.
6 Discussion
In the case of China and South Korea, according to our model, the peak of the epidemic occurred approximately on February 6 and February 29, respectively from Figures 2-3 and the daily number of cases reaches a maximum of approximately 3500 and 700 cases respectively, near the turning point. Wesee that our model agrees very well the data for China and South Korea. Compared to China and SouthKorea, the public interventions in Italy, France, Germany and United Kingdom were relatively late. The peak of the epidemic occurs in Italy around March 24, and the peak of the maximum daily number of cases in our simulation is approximately 5 500, which agrees well with the daily reported cases data forItaly. For France, Germany and United Kingdom, the number of daily reported cases are still rising. Our simulations captures these increasing values, but the advance later for both countries requires more data available.
Based on our estimated results in Table 3 and all the Figures, we found that the curves plotted by using all possible groups of parameters (d1, d2, N) for cumulative reported and unreported case and the daily reported case in China and South Korea become closer and closer, and finally tend to be consistent as more and more data of cumulative reported case is used (see Figures 2-3). For Italy, we can see from Figure 4 that these curves are very close to each other since we have the data after the turning point. But for France, Germany and United Kingdom, these curves could not tend to be consistent with the data available now. This shows that when we have enough cumulative data available, for example, when we get the data after the turning point, we could use only one of these parameter groups to get the prediction with high accuracy. When we have a few cumulative data available, we need to use all possible parameter groups to predict a range for the turning point, the final size of cumulative reported and unreported case and the daily reported case. It is evident that we used plausible biological parameters f, ν, η for SARS-CoV-2 based on current evidence which might be refined as more comprehensive data become available. Our prediction also relies on the cumulative reported data. The prediction becomes better and better as more data become available. Using multiple groups of parameters (d1, d2, N), we have attempted to make the best possible prediction using the available data. As more data for particularly Italy, France, Germany and United Kingdom become available, it will be possible to refine these estimates.
Our model incorporates social distancing measures through the time dependent transmission rate τ (t). It is evident that these measures should start as early as possible, and should be as strong as possible. The consequences of late public interventions may have severe consequences for the epidemic outcome. The example of South Korea shows that a background level of daily cases may persist for an extended time which also means that South Korea came back to linearly growth phase. If the strong measures are reduced too early or too extensively, the epidemic may return to new exponential growth phase.
There are many crucial epidemiological problems which research teams are racing to understand, for example, how to estimate the last day for COVID-19 outbreak and the proportion of people with mild or no symptoms who could be spreading the pathogen. Recently we estimated the last day for COVID-19 outbreak in mainland China and present the probability distribution of the extinction date of the epidemics combining our model (4.1) with the stochastic simulations. In fact, with our model (4.1), we could also predict the proportion of asymptomatic or mild symptomatic infectious which we will focus on in the future work.
Data Availability
Data are available from WHO or Wikipedia
7 Supplementary
Declaration of interests
We declare no competing interests.