Abstract
In this article, we use testing data as an input of a new epidemic model. We get nice a concordance between the best fit the model to the reported cases data for New-York state. We also get a good concordance of the testing dynamic and the epidemic’s dynamic in the cumulative cases. Finally, we can investigate the effect of multiplying the number of tests by 2, 5, 10, and 100 to investigate the consequences on the reduction of the number of reported cases.
1 Introduction
The epidemic of novel coronavirus (COVID-19) infections began in China in December 2019 and rapidly spread worldwide in 2020. Since the early beginning of the epidemic, mathematicians and epidemiologists have developed models to analyze the data and characterize the spread of the virus, and attempt to project the future evolution of the epidemic. Many of those models are based on the SIR or SEIR model which is classical in the context of epidemics. We refer to [26, 28] for the earliest article devoted to such a question and we refer to [1, 3–7, 10, 12, 13, 20, 25] for a rather complete overview on SIR and SEIR models in general. In the course of the COVID-19 outbreak, it became clear for the scientific community that covert cases (asymptomatic or unreported infectious case) play an important role. An early description of an asymptomatic transmission in Germany was reported by Rothe et al. [24]. It was also observed on the Diamond Princess cruise ship in Yokohama in Japan by Mizumoto et al. [19] that many of the passengers were tested positive to the virus, but never presented any symptoms. We also refer to Qiu [21] for more information about this problem. At the early stage of the COVID-19 outbreak, a new class of epidemic models was proposed in Liu et al. [14] to take into account the contamination of susceptible individuals by contact with unreported infectious. Actually, this class of model was presented earlier in Arino et al. [2]. In [14] a new method to use the number of reported in SIR models was also proposed. This method and model was extended in several directions by the same group in [15–17] to include non-constant transmission rates and a period of exposure. More recently the method was extended and successfully applied to a Japanese age-structured dataset in [11]. The method was also extended to investigate the predictability of the outbreak in several countries China, South Korea, Italy, France, Germany and the United Kingdom in [18]. The application of the Bayesian method was also considered in [9].
In parallel with these modeling ideas, Bayesian methods have been widely used to identify the parameters in the models used for the COVID-19 pandemic (see e.g. Roques et al. [22,23] where an estimate of the fatality ratio has been developed). A remarkable feature of those methods is to provide mechanisms to correct some of the known biases in the observation of cases, such as the daily number of tests. Here we will embed the data for the daily number of tests into an epidemic model, and we will compare the number of reported cases produced by the model and the data. Our goal is to understand the relationship between the data for the daily number of tests (which will be an input our model) and the data for the daily number of reported cases (which will be an output for our model).
The plan of the paper is the following. In Section 2, we will present a model involving the daily number of tests. In Section 3, we apply the method presented in [14] to our new model. In Section 4, we present some numerical simulations, and we compare the model with the data. The last section is devoted to the discussion.
2 Epidemic with testing data
Let n(t) be the number of tests per unit of time. Throughout this paper, we use one day as the unit of time. Therefore n(t) can be regarded as the daily number of tests at time t. The function n(t) is actually coming from a database for the New-York State [29]. Let N (t) be the cumulative number of tests from the beginning of the epidemic then
Section 4 is devoted numerical simulations. We will use n(t) as a piecewise constant function that varies day by day. Each day, n(t) will be equal to the number of tests that were performed that day. So n(t) should be understood as the black curve in Figure 4.
The model consists of the following system of ordinary differential equations This system is supplemented by initial data (which are all non negative) thereby assuming that the disease was introduced by an individual incubating the disease at some time before t1. The time t1 corresponds to the time where the tests started to be used constantly. Therefore the epidemic started before t1.
Here t ≥ t1 is the time in days. S(t) is the number of individuals susceptible to infection. E(t) is the number of exposed individuals (i.e. who are incubating the disease but not infectious). I(t) is the number of individuals incubating the disease, but already infectious. U (t) is the number of undetected infectious individuals (i.e. who are expressing mild or no symptoms), and the infectious that have been tested with a false negative result, are therefore not candidates for testing. D(t) is the number of individuals who express severe symptoms and are candidates for testing. R(t) is the number of individuals who have been tested positive to the disease. The flux diagram of our model is presented in Figure 1.
Susceptible individuals S(t) become infected by contact with an infectious individual I(t), U (t) or D(t). When they get infected, susceptible are first classified as exposed individuals E(t), that is to say that they are incubating the disease but not yet infectious. The average length of this exposed period (or noninfectious incubation period) is 1/α days.
After the exposure period, individuals are becoming asymptomatic infectious I(t). The average length of the asymptomatic infectious period is 1/ν days. After this period, individuals are becoming either mildly symptomatic individuals U (t) or individuals with severe symptoms D(t). The average length of this infectious period is 1/η days. Some of the U -individuals may show no symptoms at all.
In our model, the transmission can occur between a S-individual and an I-, U - or R-individual. Transmissions of SARS-CoV-2 are described in the model by the term τ S(t)[I(t) + U (t) + D(t)] where τ is the transmission rate. Here, even though a transmission from R-individuals to a S-individuals is possible in theory (e.g. if a tested patient infects its medical doctor), we consider that such a case is rare and we neglect it.
The last part of the model is devoted to the testing. The parameter σ is the fraction of true positive tests and (1 − σ) is the fraction of false negative tests. The quantity σ has been estimated at σ = 0.7 in the case of nasal or pharyngeal swabs for SARS-CoV-2 [27].
Among the detectable infectious, we assume that only a fraction g are tested per unit of time. This fraction corresponds to individuals with symptoms suggesting a potential infection to SARS-CoV-2. The fraction g is the frequency of testable individuals in the population of New-York state. We can rewrite g as where P is the total number of individuals in the population of the state of New-York and 0 ≤ κ ≤ 1 is the fraction total population with mild or sever symptoms that may induce a test.
Individuals who were tested positive R(t) are infectious on average during a period of 1/η days. But we assume that they become immediately isolated and they do not contribute to the epidemic anymore. In this model we focus on the testing of the D-individuals. The quantity n(t) σ g D is flux of successfully tested D-individuals which become R-individuals. The flux of tested D-individuals which are false negatives is n(t) (1 − σ) g D which go from the class of D-individuals to the U -individuals. The parameters of the model and the initial conditions of the model are listed in Table 1.
Before describing our method we need to introduce a few useful identities. The cumulative number of reported cases is obtained by using the following equation The daily number of reported cases DR′(t) is given by The cumulative number of detectable cases is given by and the cumulative number of undetectable cases is given by
3 Method to fit the cumulative number of reported cases
In order to deal with data, we need to understand how to set the parameters as well as some components of the initial conditions. In order to do so, we extend the method presented first in [14]. The main novelty here will concern the cumulative number of tests which is assumed to grow linearly at the beginning. This property is satisfied for the New-York State data as we can see in Figure 3. The black curve in this figure is close to a line from March 15 to April 15. Figure 4 shows day-by-day fluctuations of the number of tests while in Figure 3 the day-by-day fluctuations are not visible and the cumulative data allow to understand the growth tendency of the number of tests.
Phenomenological models for the tests
We fit a line to the cumulative number of tests in a suitable interval of days [t1, t2]. This means that we can find a pair of numbers a and b such that where a the daily number of tests and N1 is the cumulative number of tests on day t1.
By using the fact that N (t)′ = n(t) we deduce that
In the simulations we will fit a line to the cumulative number of tests from mid-March to mid-April. Figure 3 shows that the linear growth assumption is reasonable for the New-York State cumulative testing data.
Phenomenological models for the reported cases
At the early stage of the epidemic, we assume that all the infected components of the system grow exponentially while the number of susceptible remains unchanged during a relatively short period of time t ∈ [t1, t2]. Therefore, we will assume that We deduce that the cumulative number of reported satisfies hence by replacing D(t) by the exponential formula (3.3) and it is makes sense to assume that CR(t) − CR(t1) has the following form By identifying (3.5) and (3.6) we deduce that Moreover by using (3.2) and the fact that the number of susceptible S(t) remains constant equal to S1 on the time interval t ∈ [t1, t2], the E-equation, I-equation, U -equation and D-equation of the model (2.2) become By using (3.3) we obtain Computing further, we get Finally by using (3.7) and by using (3.8) we obtain
4 Numerical simulations
We assume that the transmission coefficient takes the form where τ0 > 0 is the initial transmission coefficient, Tm > 0 is the time at which the social distancing starts in the population, µ > 0 is serving to modulate the speed at which this social distancing is taking place.
To take into account the effect of social distancing and public measures, we assume that the transmission coefficient τ (t) can be modulated by γ. Indeed by the closing of schools and non-essential shops and by imposing social distancing the population of the New-York State, the number of contacts per day is reduced. This effect was visible on the news during the first wave of the COVID-19 epidemic in New-York city since the streets were almost empty at some point. The parameter γ > 0 is the percentage of the number of transmissions that remain after a transition period (depending on µ), compared to a normal situation. A similar non-constant transmission rate was considered by Chowell et al. [8].
In Figure 5 we consider a constant transmission rate τ (t) ≡ τ0 which corresponds to γ = 1 in (4.1). In order to evaluate the distance between the model and the data, we compare the distance between the cumulative number of cases CR produced by the model and the data (see the orange dots and orange curve in Figure 5-(a)). In Figure 5-(c) we can observe that the cumulative number of cases increases up more than 14 millions of people, which indeed is not realistic. Nevertheless by choosing the parameter in Figure 5-(d) we can see that the orange dots and the blue curve match very well.
In the rest of this section, we focus on the model with confinement (or social distancing) measures. We assume that such social distancing measures have a strong impact on the transmission rate by assuming that γ = 0.2 < 1. It means that only 20% of the transmissions will remain after a transition period.
In Figure 6-(c) we can observe that the cumulative number of cases increases up to 800 000 (blue curve) while the cumulative number reported cases goes up to 350 000. In Figure 6-(d) we can see that the orange dots and the blue curve match very well again. In order to get this fit we fix the parameter g = 10−5.
In Figure 7 (a) and (b), we aim at understanding the connection between the daily fluctuations of the number of reported cases (epidemic dynamic) and the daily number of tests (testing dynamics). The combination of both the testing dynamics and the infection dynamics gives indeed a very complex curve parametrized by the time. It seems that the only reasonable comparison that we can make is between the cumulative number of reported cases and the cumulative number of tests. In Figure 7 (c) and (d), the comparison of the model and the data gives a very decent fit.
In Figure 7, all the curves are time dependent parametrized curves. The abscissa is the number of tests (horizontal axis) and the ordinate is the number of reported cases (vertical axis). It corresponds (with our notations) to the parametric functions t → (ndata(t), DR(t)) in figures (a) and (b) and their cumulative equivalent t → (Ndata(t), CR(t)) in figure (c) and (d). In figures (a) and (c) we use only the data, that is to say that we plot t → (ndata(t), DRdata(t)) and t → (Ndata(t), CRdata(t)). In figures (b) and (d) we use only the model for the number of reported cases, that is to say that we plot t → (ndata(t), DRmodel(t)) and t → (Ndata(t), CRmodel(t)).
In Figure 8, our goal is to investigate the effect of a change in the testing policy in the New-York State. We are particularly interested in estimating the effect of an increase of the number of tests on the epidemic. Indeed people commonly say that increasing the number of tests will be beneficial to reduce the number of cases. So here, we try to quantify this idea by using our model.
In Figure 8, we replace the daily number of tests ndata(t) (coming from the data for New-York’s state) in the model by either 2 × ndata(t), 5 × ndata(t), 10 × ndata(t) or 100 × ndata(t).
As expected, an increase of the number of tests is helping to reduce the number of cases. However, after increasing 10 times the number of tests, there is no significant difference (in the number of reported) between 10 times and 100 times more tests. Therefore there must be an optimum between increasing the number of tests (which costs money and other limited resources) and being efficient to slow down the epidemic.
5 Discussion
In this article, we propose a new epidemic model involving the daily number of tests as an input of the model. The model itself is extending our previous models presented in [11, 14–18]. We propose a new method to use the data in such a context based on the fact that the cumulative number of tests grows linearly at the early stage of the epidemic. Figure 3 shows that this is a reasonable assumption for the New-York State data from mid-March to mid-April.
Our numerical simulations show a very good concordance between the number of reported cases produced by the model and the data in two very different situations. Indeed, Figures 5 and 6 correspond respectively to an epidemic without and with public intervention to limit the number of transmissions. This is an important observation since this shows that testing data and reported cases are not sufficient to evaluate the real amplitude of the epidemic. To solve this problem, the only solution seems to include a different kind of data to the models. This could be done by studying statistically representative samples in the population. Otherwise, biases can always be suspected. Such a question is of particular interest in order to evaluate the fraction of the population that has been infected by the virus and their possible immunity.
In Figure 7, we compare the testing dynamic (day by day variation in the number of tests) and the reported cases dynamic (day by day variation in the number of reported). Indeed, the daily case is extremely complex, but we also obtain some relatively robust curve for the cumulative numbers. Our model give a good fit for this cumulative cases.
In Figure 8, we compare multiple testing strategies. By increasing 2, 5, 10 and 100 times the number of tests, we can observe that this efficient up to some point 10 and but increasing 100 times is not making a big difference. Therefore, it is useless to test to many peoples and there must an optimum (between the cost of the tests) and the efficiency in the evaluation of the number of cases.
Data Availability
The data used are public. See the references in the paper
Conflict of Interest
None declared.
Funding
Q.G. and P.M. acknowledge the support of ANR flash COVID-19 MPCUII.
Acknowledgements
Data from [29].
Footnotes
↵* ANR flash COVID-19: MPCUII