Abstract
We review epidemiological models for the propagation of the COVID-19 pandemic during the early months of the outbreak: from February to May 2020. The aim is to propose a methodological review that highlights the following characteristics: (i) the epidemic propagation models, (ii) the modeling of intervention strategies, (iii) the models and estimation procedures of the epidemic parameters and (iv) the characteristics of the data used. We finally selected 80 articles from open access databases based on criteria such as the theoretical background, the reproducibility, the incorporation of interventions strategies, etc. It mainly resulted to phenomenological, compartmental and individual-level models. A digital companion including an online sheet, a Kibana interface and a markdown document is proposed. Finally, this work provides an opportunity to witness how the scientific community reacted to this unique situation.
1 Introduction
In the early months of the COVID-19 pandemic, dozens of thousands of research articles have been produced (source: https://www.semanticscholar.org/cord19 with 59,888 articles referenced by May 1, 2020). The present review focuses on the subset of articles developing and/or using mathematical models of the COVID-19 transmission during this period. In a broader context, many review articles dedicated to the modeling of diseases propagation have been published. They include, for instance, math-ematical formulations of the different models to estimate epidemic parameters, forecast the epidemic or assess the impact of intervention strategies [1, 2]. Moreover, several of these reviews offer a categorization of the different models with a precise terminology [3, 4, 1] or a mapping of the articles with their key features [5, 3]. Since November 2020, we can mention many projects of the latter type focusing on the COVID-19 epidemic, including published reviews [6, 7, 8] and continuously updated works of scientific watch [9, 10].
In the present manuscript, we review the early endeavour of mathematical modeling of the epidemic propagation. This effort of review seems crucial given the quantity and the diversity of research works produced within a short lapse of time. In particular, the aim is to valorize this prolific production of works and to facilitate the identification of models by offering a mapping of the approaches proposed from February to mid-April 2020. It is completed with supplementary external contributions until the 3rd of May 2020 through our online repository. That incidentally offers the opportunity to observe the way the scientific community responded to the crisis. We emphasize criteria such as conceptual innovations, transparency and reproducibility of both methods and results, as well as the availability of online open-source material such as code or web demonstrations. Indeed, the ability to audit, challenge and reproduce quickly has shown to be key in order to bring the state of knowledge closer to the settings in which policy-makers operate. Lastly, a comparison of either the numerical results or the conclusions of each article is not provided.
The remainder of the document is organized as follows. In Section 2, generalities on the three main epidemic models – phenomenological, transmission and individual-level – and on the key epidemic parameters are provided. The Section 3 exposes the screening methodology to select the articles to be included in the review and the attributes used to describe each article are defined. In Section 4, a synthetic summary of the epidemic models is proposed, as well as the methods used to estimate the parameters, the various approaches to take into account possible intervention strategies are discussed with some insights on data-related aspects. Finally, Section 4.7 gathers a set of tabular views with complete mapping of the reviewed articles, followed by a conclusion in Section 5. A digital companion of the present review including an online sheet, a Kibana interface and a markdown document is accessible on the github page: https://github.com/MyrtoLimnios/covid19-biblio.
2 Generalities on epidemic spreading models
In this section we present a synthetic view of the main characteristics and attributes of epidemic models, trying to adopt a nomenclature as faithful as possible as the one used in epidemic reviews [1, 5, 3, 11, 4, 2]. In particular, Section 2.1 recalls the main model types used to characterize an epidemic, and in Section 2.2, a list of the key epidemic parameters is defined.
2.1 Modeling the spread of diseases
Different models for epidemics progagation exist, which differ according to the scale of analysis, complexity of the parameterization, and the practical implications of the results.
More precisely, models can be classified into two main categories: phenomenological and transmission models (see the review therein [2]).
Phenomenological models
In phenomenological models, the curve of a time series representing the epidemic propagation (e.g. time series of confirmed cases or deaths) and is assumed to have a specific shape, based on empirical data. The transmission mechanisms that give rise to the observed pattern are not explicitly modeled. For fitting and simulation, these models are usually discretized through numerical schemes. For instance, patterns can be estimated using (generalized) regression modeling formulation, offering a direct statistical framework to learn from empirical data. One classical model of this type is the logistic growth model [12] where the evolution of the number of cumulative cases at time t is given by: where r and K are positive parameters. This equation characterizes a first period where the new number of cases at each step strictly increases, followed by a strict decreasing regime until the cumulative number of cases converges to its maximum value at K. This maximum can represent for example the total size of the population. When C(t) is small with respect to K, the rate of growth is essentially dominated by rC(t), which corresponds to an exponential growth. The growth will decrease when the population reaches the threshold of K/2.
Transmission models
Transmission or mechanistic models explicit the process of transmission involved in the spreading of the disease in a given population. Models of this class can be further divided in two main categories, according to their scale of analysis: compartmental models and individual-level models.
Compartmental models express the transmission dynamics at the population-level. The population is aggregated into compartments corresponding to particular health states (usually Susceptible, Infected, Removed compartments, in the classical SIR model introduced in [13]). The temporal evolution of the size of each compartment is given by a system of differential equations. In the classical SIR model for example, at time t and given a population of n individuals, S(t) represents the number of susceptible individuals, I(t) the number of infected individuals and R(t) the number of removed individuals. The following equations describe the evolution of the system:
Individuals’ heterogeneity and inter-individuals’ heterogeneous interactions may be modeled through the partitioning into subpopulations. Usually, numerical schemes are used to fit and/or simulate the model.
Individual-level models explicitly define a state for each individual in the population. Therefore such models can incorporate more refined heterogeneity and stochasticity than population-level models, possibly at the expense of computational complexity, collecting appropriate datasets, and the increased number of parameters. As a result, parameter estimation may be difficult [14] and simulation running times may be slow compared to other models [15]. Given their high flexibility, understanding and describing these models through a unified taxonomy is still an important open research area [16, 17]. These models are sometimes called agent-based or individual-based models. One example of individual-level model, which is a simplified version of the model in [18], is the following. Individuals I1, … In are assigned a household in {h1, …, hk} and can be infected inside their household (to model transmission during lockdown for instance) and a infectiousness ρi coefficient. The probability that individual Ii is infected between t and t + dt is given by: with where τk is the time where individual Ik becomes infectious and f defines the infectiousness in function of the time elapsed from the end of the latent period.
2.2 Classical epidemic parameters
Some classical parameters are used to analyze a disease propagation by characterizing the virus’ features or its spread in a given population. They are usually learned on empirical data and plugged in propagation models. We refer to the following definitions.
Basic reproduction numbers R0
Nonnegative real number that quantifies the number of secondary infected cases by one individual (during his infection period), when considering all the population as susceptible. The infection will spread and may become an epidemic if R0 > 1 and will decline if R0 < 1. It can broadly be quantified through the formula: where β ≥ 0 is the average of infected individuals contaminated by the infectious population by unit time and τ > 0 the average period of infection. Note that in the aforementioned SIR model (2), τ = 1/γ.
Effective reproduction numbers Re
Nonnegative real number that quantifies the number of secondary infected cases by one person, when considering as susceptible the current state of the population. It is time-dependent and can therefore be estimated by multiplying R0 by the proportion, denoted by s(t), of the susceptible population at a given time, i.e. Re(t) = s(t)R0.
Key time-to-events intervals
Set of periods of time between each clinical state of the disease, referred to as event. In particular for the following: the incubation period as the time-delay between exposure and the onset of clinical symptoms; the infectious period as the time-delay of infectivity between the beginning to the end of the infection; the latent period as the time-delay between infection and infectivity; the generation time as the time-delay of the symptoms onset between the couple infector-infectee and the introduction date as the date of the first infection in a fixed/given population.
Key time-to-events rates
Set of rates relating a clinical state of the disease to another. Especially used in compartmental models to quantify the compartment’s transitions (see Subsection 2.1) in a given population through the following rates: the transmission rate from the susceptible population to the infected one; the recovery rate from the infected population to the recovered one; mortality rate from any given state to death.
3 Review methods
3.1 Search strategy
The initial step relied on a search over article bases using specific and predefined keywords. We decided to include articles regardless of their submission type and and/or status to provide an historical perspective of the proposed models. We intentionally refer to the first version available of the articles even though updated releases could exist and/or have been published.
The search strategy encompasses two main source types. First, to establish the main corpus of the review, the process was based on an extended search based on a set of keywords applied on the three main online open access archives: arXiv, biorXiv, medrXiv. Precisely, motivated by mathematical-based models and as both biorXiv and medrXiv do not provide such filters, our main focus led to the database arXiv. Nonetheless, our search on the two others came naturally through related articles citations.
The search methodology on arXiv is described as follows. We looked into the special directory “COVID-19 SARS-CoV-2 preprints” until the 11th of April 2020, with the following filters:
(“include cross list: True”) AND
(“terms:” “title=COVID-19” OR “abstract=COVID-19” OR “abstract=SARS-CoV-2”OR “title=SARS-CoV-2” OR “title=coronavirus” OR “abstract=coronavirus”) AND
(“classification: Computer Science (cs)” OR “classification: Mathematics (math)” OR “classification: Statistics (stat)”)
Secondly, since our reviewing effort is publicly available on github ([19]), some other articles were identified thanks to external contributors leading to heuristic search from centers of excellence for instance. In particular, major centers collecting references and information about the epidemic were listed, such as: the MRC Centre for Global Infectious Disease Analysis Team leading the COVID-19 Response Team from Imperial College in the UK, the Institute of Health Metrics and Evaluation (IHME) from the University of Washington from the US and the Research and Action Team (REACTing) from Inserm from France. The selected articles were added up to the the 3rd of May 2020.
3.2 Eligibility criteria
Subsequently, the screening process with a more in-depth selection followed. The first batch of articles was hence reviewed based on the criteria: (i) the depth of theoretical background, (ii) the standards of reproducibility, (iii) the originality of method and parameters introduced, (iv) the quality of exposition of the methodology, (v) the capacity to encompass intervention strategies and (vi) the availability of data and code to test the model.
3.3 Data extraction
Once the eligible articles were selected, a data extraction protocol was used. We defined a list of categories such that it encompasses the diversity of the contributions while highlighting the similarities among them, that are listed below. Notice incidentally that these categories are not necessarily mutually exclusive and possible conflicts were solved thanks to discussion.
Global approach. General overview of the article through the following characteristics: (i) estimation of epidemic parameters: based on computation or inference from data, (ii) evolution forecast: preiction of future values of key indexes, e.g. the number of infected and/or deaths, (iii) modeling of various intervention strategies after governmental decisions, (iv) reference to economic indicators to measure the impact of the epidemic and/or intervention strategies, (v) optimization of intervention strategies (stochastic control).
Data used. Information about the nature and the source of the numerical data used.
Model nature. Whether the model is deterministic or stochastic.
Model category. Type of modeling: statistical estimation if the model is purely statistic and not a spreading model, else one of the categories described in 2.1: either phenomenological, compartmental or individual-level model. Additional attributes of the models are also reported and are specific to each category – for instance, the difference types of compartments in compartmental models.
Modeling of intervention strategies. How the interventions strategies are mathematically incorporated in the model: (i) addition of compartments, (ii) modification of the contact matrix, (iii) addition of predictive variables, (iv) modification of model parameters, (v) integration of strategies in the structure of the network.
Epidemic parameters. Epidemic-related input parameters introduced in the formulation of the problem, e.g. the transmission rate, the incubation period.
Estimation method for the input parameters.Whether parameters are inferred from a statistical framework or from the literature.
Code availability. Whether the source algorithmic code is available.
In the present article we chose to describe the articles with respect to the four main attributes: the model category, the type of modeling of the intervention strategies, the estimation method for the input parameters and the data used. Nevertheless, all information is gathered at length through these categories in the online companion tools ([20], [21]), see subsection 4.6.
4 Results
The first subsection 4.1 reports the number of articles that have been found during the search procedure. We then sequentially present a comprehensive analysis of the articles reviewed as follows. Sub-section 4.2 presents the propagation models proposed in the selected papers for the disease, in particular categorized through either phenomenological, compartmental or individual-based models. In Subsection 4.3, the intervention strategies are summarized when introduced in the aforementioned modelings. Then the estimation methods for the set of epidemic parameters, defined in Subsection 2.2, are discussed in Subsection 4.4. Finally, Subsection 4.5 gathers the characteristics of the real datasets used.
4.1 Search results
The selection procedure resulted to a total of 41 articles from the online open source archives, whereas 39 were obtained thanks to external contributions. Regarding the open source arXiv, 150 articles were selected from the classification ‘Computer Science (cs)’, 60 from ‘Statistics (stat)’ and 25 from ‘Mathematics (math)’. From both medrXiv and biorXiv, approximately 35 were selected. Finally, we selected around a half of the external contributions. The diagram in Fig. 1 illustrates the search results.
4.2 Epidemic propagation models
A variety of different models have been used to model the spread of the virus, from phenomenological models to more explicit transmission models. We synthetically describe below the main designs found in the reviewed articles.
4.2.1 Phenomenological models
Phenomenological approaches were used in 10 articles. Mainly, the models are based on generalized regressions of the time dependent curve of interest (e.g. time series of confirmed cases or deaths) on time. Also, two of the articles derive auto-regressive models of the daily number of infections over time [22, 23].
Authors use S-shaped curves such as logistic or Grompertz curves [24, 25, 26, 27, 28, 29] as well as exponential curves for the early data [30, 26] to fit a cumulative count of cases over time, and bell-shaped curves such as the ERF function [29] to fit the daily count over time. In some models, a Poisson or negative binomial distribution is used to model the stochasticity and the uncertainty of the prediction [31, 22, 23, 30].
In the simplest design, parameters are commonly shared by all the population and are constant over time [24, 26]. In more complex designs, heterogeneity between states or regions is modeled through the use of mixed-effects models [28, 31, 23]. We characterize such models as spatially-structured. Time-varying covariates have also been introduced to account for the non-stationarity associated with the time-varying availability of tests [23] or changes in behavior due to the implementation of intervention strategies [28, 31].
4.2.2 Compartmental models
In the vast majority of the included studies (53 articles), the classical SIR, SEIR (E: exposed) and SEIRD (D: deceased) models are adopted to analyze the spread of the virus, as well as a wide variety of extensions. In its extensions, additional health states are added. The Infected state was divided into different disease stages, e.g. pre-symptomatic (before symptoms) and symptomatic (after symptoms onset) [32, 33, 34, 35] and refinements related to symptoms and clinical conditions are precised, e.g. asymptomatic, mild or severe states [36, 37, 38, 39].
Moreover, hospitalization and admission to ICU compartments are introduced to predict resources needs of the healthcare system [32, 34, 40]. In order to account for the difficulty of measuring the exact size of the contaminated population, some articles divide the infected compartment into reported and unreported cases [41, 42, 43]. Lastly, to model the control measures introduced to mitigate the propagation, non-working, confined or quarantine states have been introduced [44, 37, 45].
Time-delayed and non-stationary dynamics
For most of the models, the dynamic of the population evolution, at a given time, only depends on the previous time step. However, some articles introduce dependence on multiple past time steps of the dynamics to account for realistic delays induced by key time-to-event variables such as the incubation period or the generation time in [46, 45, 47], see Section 2.2 for more details on these parameters. Also, non-constant parameters are introduced to capture time-varying aspects, see [48, 49, 50].
Stochastic models
A large proportion of the compartmental models, 17 articles, introduce stochasticity through stochastic transitions between compartments [46, 34, 51, 52, 53, 54, 55, 56, 48, 57], using for instance Bernoulli distributions [34], Poisson processes [51], or diffusion terms in the system of differential equations to account for volatility in the propagation [54].
Age-structured models
In some models, the population is divided into age-stratified subpopulations with specific transition dynamics (10 articles). These distinct dynamics may be due to different diseaserelated characteristics, such as infectivity, susceptibility or vulnerability, see [32, 40]. In most of the models, these age-groups are not considered equally-likely to interact. This heterogeneous mixing is modeled using contact matrices gathering the average frequencies of inter-groups contacts [32, 40, 34, 58, 51, 39, 38, 55, 59, 33].
Spatially-structured models
Multi-level models have been developed to gather several cities, regions or even countries in a common evolution model (9 articles). Inter-region disparities are modeled through region-levelled epidemic parameters [32, 49, 33]. Some models consider independent mixing between regions [49, 33], while others spatial interactions. These interactions may depend on the size of the populations and the distance between cities [53] or directly be measured as population flows between cities [60, 61, 38, 29].
Disease-related heterogeneity
Other subpopulations are considered to account for individually-disease-related characteristics. Some models separate symptomatic to asymptomatic subpopulations upon infection or between infected with different degree of disease severity (10 articles) [32, 40, 34, 37, 62, 63, 36, 39, 38, 64]. Such models are characterized as Symptoms/severity structured. To account for the possible significant level of non-reported infected individuals due to the absence of testing or substantial symptoms, the population can be divided into reported/unreported groups (6 articles) [41, 42, 43, 61, 65, 66].
4.2.3 Individual-level models
The individual-level models analyze the transmission dynamics at the scale of each individual, considered as an entity in itself, (6 articles). This modeling can incorporate heterogeneity and stochasticity in the individual’s temporal evolution. The three following types of this class are identified and ordered according to the degree of individuality in the model: branching processes, network-based models and individualized models.
Branching processes
One article in the review [67] introduces a two-type branching process to model the growth of the epidemic. Infected individuals evolve separately and may independently transmit the disease according to a certain distribution. In this design, the infected population is divided into two types of infected individuals: discovered (type 1) and non-discovered (type 2). Discovered individuals do not take part in the further evolution of the process because they are isolated. Non-discovered individuals can either give rise to other non-discovered cases, be transformed into type 2 or leave the reproduction process.
Network-based models
Two articles use random networks to simulate the propagation of the virus [68, 69]. In both models, a population is assimilated to a network drawn at random, where nodes represent individuals and are connected by edges corresponding to social connections. In [68], individual characteristics such as the location, the gender or the age, are modeled through the nodes, where connections are sampled from a contact kernel depending on the inter-type similarity. In both articles, random heterogeneity in the host characteristics is modeled, for example through different initial immunity buffer [68], or different levels of infectiousness drawn from a long tailed distribution to account for super-spreader individuals [69]. At each time step, nodes are categorized according to the similar health states of the compartmental models. Then, nodes states are updated with respect to transition rates, depending on the current characteristics of the node and its connections. In [68], for example, each node experiences an accumulation of viral load and becomes infected when the viral load exceeds its immunity buffer. To model the COVID-19 outbreak, these studies propose distinct approaches: in [69] a population of a given size is simulated, whereas in [68] a network represents a senior’s residential centre inside the town, in order to simulate the vulnerability of the centre to contagion imported from the outer population.
Individualized models
Three articles introduce models where individuals are completely unique and identifiable, in particular through their geographic localization [63, 70, 71]. In this design, a population is simulated to reproduce for instance: a realistic geographical location of each individual; some realistic attributes proper to each individual, such as age or gender; some realistic contact patterns via the population distribution through households, workplaces and schools. In particular, the characterization of a global population is possible, for example, the population of the UK [70], the US [70, 63] and the entire world [71]. Spatial interactions are represented by kernels defined between locations or by a network representing travel flows between subpopulations, centered around major transportation hubs [71]. These configurations require very rich data and parameterization to draw individuals, places and connections from realistic distributions. Similar to network-based models, each individual is in a particular health state at each time. At any time-step, individuals have a probability of transition between states depending on their characteristics and social interactions. Lastly, other realistic elements are modeled, such as a variable infectiousness in time for an infected individual [70], or the introduction of different viral strains [63].
4.3 Modeling of intervention strategies
In the wake of the emergence of COVID-19, many countries responded through public measures to limit its spread. Broadly, the aim was both to contain the number of people affected by the disease and to reduce the risk of exceeding the healthcare system threshold capacity (e.g. to flatten the curve). To this end, various intervention strategies can be deployed: social-distancing recommendations, isolation of infected or susceptible individuals, school/university closure, global lockdown, encourage telework, mask-wearing requirements, business closures, restriction of group gatherings, random testing campaigns, air traffic suspensions, etc. As a result, behavioural changes within the population led to a modification of the disease propagation. In this section, we will review the several approaches carried out to embed these strategies in the models.
Particularly, some articles focus on assessing the impact of intervention strategies. Among others, in [72], the effects are quantified through a counterfactual model. In [73], authors consider the difference between predictions of the model trained before lockdown and actual data (with lockdown) to evaluate its effectiveness. The articles pertaining to this perspective do not include intervention strategies into their model. As for the agent-based ones (4 articles), social interaction patterns are inherent to the model. The degree of complexity handled by these designs has therefore enabled the modelization of interventions without the need for model modification. For this reason, we will not go into more details on the matter for both of these cases.
4.3.1 Incorporation in compartmental models
The intuitive approach considered by the vast majority of articles (43 of them) consists in a modification of the epidemic parameters. The parameter most likely to be impacted by social distancing is the transmission rate. The basic reproduction number, R0, depends on the transmission rate and the infectious period (often proportionally). The latter parameter is intrinsic to the disease and is hence less subject to variation. No model considers a change in the infectious period, therefore a modification of the transmission rate is equivalent to a modification of the R0 (or of the Re). This is why we can refer to one or the other in an equivalent way.
Scaling of the transmission rate
The transmission rate can basically be multiplied by a constant to reflect the decrease in contacts induced by social distancing (13 articles). In [74], this constant can be either estimated or defined as a function of three parameters indicating the degree of isolation. This enables the simulation of various intervention strategies. Let recall that the emergence of the disease goes on into an outbreak according to the sign of R0 -1. Hence, the scaling of the transmission rate can therefore directly reflect the epidemic changeover. A frequently adopted procedure is to set the transmission rate to two values, before and after the lockdown, in order to grasp a behavioural response from the population to the strategies. An introduction of a factor of isolation strength per age group is embedded in [40, 51]. Nevertheless, in [69] authors propose an alternation of cycles applied to all the population. The cycles are composed of both a working period (regular transmission rate) and a lockdown/self-isolation period (modified transmission rate). The objective of this article is to find the optimal durations of these two phases. A similar approach is also proposed by [54].
Piecewise time-dependent transmission rate
Articles of the previous category only allow an alternation of the reproduction number through two fixed values (i.e. regular/lockdown). The articles based on this approach (18 articles) tend to integrate more flexibility and complexity. In [46], the time-varying reproduction number is defined as the true R0 (without interventions) multiplied by a function of six indicators representing non-pharmaceutical interventions. These indicators are activated when such a measure is put in place in a country. With a similar approach, in [49], the transmission rate is a time-varying function of non-pharmaceutical interventions time-span. Another example is provided in [64], where both the transmission rate and the recovering rate are functions of the time. Additionally, in [75], R0 is multiplied by a piecewise function, depending on both the intensity and the duration of policies. A modeling of testing policies is proposed by a time-dependent parameter of the same shape. This parameter represents the portion of the tested population from which infected individuals can be isolated. The purpose of the article is articulated on the search for an optimal combination of quarantine and testing policies.
Modification of the contact matrix
In models in which interactions between sub-populations are modeled by a contact matrix (7 articles), social distancing measures can directly be implemented. We refer to [39] for an estimation of age-specific and location-specific (home, work, school, other) contact matrices built under various physical distancing scenarios. Additionally, a reconstruction of the contact matrix is implemented by a combination of social distancing interventions in [34]. Finally, five different contact matrices are considered in [40].
Addition of compartments
One way to account for the impact of behaviour patterns on the disease dynamics can be directly shaped by the addition of compartments (7 articles). In this sense, two articles divide the susceptible population into two sub-compartments: respectively working and confined in [44] and submitted to low and high isolation recommendation populations in [74]. Furthermore, isolation of individuals recorded as infected is one of the most widespread interventions. Some articles have therefore directly added compartments in the model. See [76] where the infected population is either quarantined or not. Also, [77] includes a home quarantined compartment and distinct compartments for symptomatic, asymptomatic and reported infected.
Spatially network-based models
For spatially structured models (4 articles), intervention strategies directly modify the structure itself. Indeed, in [29], authors introduce a multimodel ODEs neural network. In particular, each node of the network is a compartmental model and links between layers can simulate the inter-provincial disease transmission using mobility data. Finally, in both [71, 78], the world map is divided using the Voronoi method, centered on the major transportation hubs. The transmission dynamics are modeled through agent-based epidemic model for the mobility layers. In this framework, authors implemented travel restrictions by a decrease of mobility flow.
4.3.2 Incorporation in phenomenological models
Considering the phenomenological models, only three articles derive a framework for mobility reduction. Two articles are based on regressions of the temporal curve of infections or deaths over time and contain covariates as weighted average of social-distancing metrics [28, 31]. In [31], time-varying metrics capture the visiting variations in public spaces and of the time spent at home versus at work. The last article incorporates social distancing by a time-varying scaling factor of the growth curve parameter [30].
4.3.3 Optimization of intervention strategies
A significant part of the models aims to inform on decision making. Usually the method consists of learning retrospectively about the effects of the strategies or predicting the future propagation under different scenarios. The latter leads to a comparison of different scenarios with respect to the health cost measured by the number of deaths or the hospital saturation [32, 40] or the economic cost induced by the lockdown [44]. In order to automatically predict the best strategy to implement, a few articles introduced optimization frameworks. All these models are built on propagation models, compartmental [79, 75, 53, 36, 80, 63] or individual-based [63].
In [80], the government directly controls the transmission rate and the threshold of confirmed cases by implementing its strategy, which directly minimizes the infection peak. Three articles proposed deterministic optimal control methods to analyse how to reach the optimal trade-off between the direct economic costs and the ones implied by the healthcare system [79, 75, 36]. The optimal strategy, represented by the lockdown percentage over time [75, 79, 36] and/or the level of testing over time [75], optimizes an objective function which integrates all the future costs.
In the network-based framework of [53], where nodes represent districts, the optimal strategy outputs the nodes that must be locked down each week i.e. for which edges should be modified. For this purpose, nodes are assigned to a set of features (e.g. the proportion of symptomatics within the district) that quantifies its current state. Also, a cost function is defined and integrates the future health and economic costs. A reinforcement learning algorithm is finally used to predict each week the best decision to take for each node, using a deep Q-network trained to predict the reward of each action given the current state of the node.
In [63], the strategy is optimized in the three following independent frameworks: a compartmental model, a stochastic compartmental model and an individual-based model. The result of a given strategy is binary (e.g. if the proportion of infected individuals is below a certain threshold at time t). For the compartmental models, the strategy is defined by a controlled parameter which reduces the transmission. For the individual-based model, a strategy is defined by a a set of more refined parameters (e.g. the isolation rate or the length of time a social distancing policy must be in place. The posterior probability of the controlled parameter conditionally on the success is estimated through bayesian inference. This estimation is repeated at each time step to select the best policy, conditioned on the new information.
4.4 Models and estimation methods of the epidemic parameters
A vast majority of the referenced articles compute or learn the epidemic parameters implied, and if not, point to already published studies on the topic. Indeed, 47 of the articles are based on some parameters arbitrarily fixed or derived from the literature, whereas 63 estimate a part, or the totality, of the parameters implied in the model. Also, among the 80 articles, a large part (53 articles) estimate at least one epidemic parameter. Hence, this section is at first, an attempt to classify the main computation and estimation methods regarding the epidemic parameters and secondly, a focus on the key parameters introduced in Subsection 2.2.
Following Section 4.2, the development of advanced structured models and the availability of data enable the derivation of the parameters for particular subpopulations. Precisely, specifications w.r.t. categorised groups are refined through: the age, the geographical location/community (household, school, etc.) and position (region, state, etc.), the possible hospitalization and the healthcare capacity (e.g. [75]), the documented/undocumented (i.e. unconfirmed and/or unreported), the ability of transmission (e.g. to model super-spreader individuals) and reciprocally the susceptibility of being infected and lastly, the scenario of intervention. It is therefore interesting to introduce covariate matrices, also known for a particular case as contact matrices, to quantify the interrelations/interactions between each subpopulation. Readers can also refer to [81] where the Distance Correlation method is used to measure dependence between countries.
4.4.1 Main data-driven estimation methods
Most of the articles indexed in this review are based on simulations learned from clinical datasets, and when clearly mentioned, three main categories of methods are outlined through: (exact) deterministic methods (13 articles), estimators obtained either by descriptive statistical (5 articles) or by inferential methods (42 articles) combined with sampling techniques. Briefly, the following paragraphs give details on each of these categories. Note that there are at least as many methodologies as papers, considering that often multiple techniques are employed.
First, the deterministic methods enclose the derivation of close-formed systems of equations, e.g. the next-generation method [76, 77] or the Euler-Lotka equation [26, 82] for the computation of R0. Also, six articles use deterministic optimization algorithms, mainly gradient-based ones such as gradient descent [62, 56], Lavenberg-Marquad [45] and dubbed iterative Nelder-Mead [50] algorithms.
About the descriptive methods employed, usually when no (Bayesian) sensitivity analysis is derived, articles often resume some of the parameters by the empirical mean, without necessarily any precision on the estimation. Nonetheless, at least 5 articles detailed the descriptive techniques, e.g. refer to the deterministic compartmental model [40] for a detailed approach.
Along with these procedures, a majority of papers (42 articles) use inferential methods to estimate some parameters and build a sensitivity analysis. The main methods are inherited from regression analysis (12 articles), either frequentist or Bayesian, and likelihood-based formulations (12 articles). More precisely, regression models are derived through (log-)linear or non-linear settings. Also, likelihood-based formulations, mainly through maximum likelihood estimation, are explored (10 articles). Additionally, Bayesian statistical models, hierarchical or sequential, as well as Information Criteria for model selection can be found.
Lastly, parametric estimation of the epidemic parameters via curve fitting are broadly explored (6 articles) where assumptions on the distributions are explicit, but generally the optimization criterion is not. Nevertheless, articles [83, 84] present different types of distribution patterns for some typical durations or rates. Finally, two articles propose stochastic optimization algorithms [85, 86].
4.4.2 Key epidemic parameters
Basic and effective reproduction numbers
The reproduction number, either R0 or Re, is a major epidemic parameter, which is not only used to model the spread of the virus but also the effectiveness of an intervention strategy (refer to Section 4.3 for details). Hence, the parameter is mainly fixed from the beginning. Nevertheless, the estimation techniques are either from direct computation or from the estimation of the transmission rate. If estimated, it can be prior or posterior to the modeling of the spread of the virus. Actually, the articles from the same authors [26, 82] describe several possible methods that range from deterministic optimization to advanced Bayesian sampling algorithms. More generally, readers can find deterministic methods computed ahead of the propagation model in [26, 45, 82, 87]; using for example the classic next-generation matrix method in [77, 34, 76]. Random-based estimations are also derived, consider for instance regression and likelihood-based estimators in [88, 45, 82]. Regarding estimation once the spread model is developed, [67] proposes to compute the reproduction number by three specific statistical estimators. Sampling methods are derived to estimate the posterior distribution of the dynamic of Re, mainly through Monte-Carlo-based algorithms, e.g. [89, 82], refer also to [71, 78] with the assumption of a uniform prior distribution on R0, or a normal prior distribution in [46].
Key time-to-event intervals
The set of characteristic periods of time related to the virus or its propagation is mainly estimated through inferential methods. Undoubtedly, these techniques are of major interest in this context: simulating the distribution profile of the intervals (or rates in the following paragraph) from a state to another is key to an in-depth comprehension of the phenomenon. It also leads to sensitivity analysis. Therefore, the priors chosen for the specific epidemic time-periods are the following. The main result is that almost all the chosen priors are special cases to the gamma distribution, if not the gamma. Particularly, for the incubation time [23, 55, 90, 83], the generation time [89, 30, 83], and also for some of the transition times between states, depending of the article [25, 55, 91, 83]. Additionally, weibull and lognormal distributions are considered in [83] for the incubation, generation and onset-to-hospital times; as well as in [90] for the first one. Also, the following time variables are supposed to follow an exponential distribution: the symptoms-to-report time [91, 48] and the susceptible-to-infection [69]. In this sense, refer to [25] for an account of various priors combined with different estimation methods for the age-structured population, authors also provide a sensitivity analysis of some parameters.
Key time-to-event rates
Following the last paragraph, various approaches are used for the time-to-event key rates. Indeed, the modelings broadly range from (i) constant parameters, to (ii) time-dependent and can also be randomised to perform sampling methods (iii). Note that the aforementioned assumptions are independent from a parameter to another, refer for example to [50] for an account of different models. First, if the assumption (i) of constant parameters is assumed for the basic rates inherent to the models, deterministic approaches can be found for example in [36, 50, 87]. Of course, empirical mean estimators are useful and often employed, e.g. [40]. Also, the work in [61] presents the impact on the epidemic spread of the undocumented infected population, quantified by advanced likelihood-based estimators. Under the assumption (ii), for example both compartmental models introduced in [34, 84] fit the transmission rate dynamic to a decreasing exponential function, but also to piecewise constant and to rational functions in the latter [84]. Lastly, consider the framework (iii). Note that the rates are less randomised than the corresponding key periods. In [69], prior distributions of both the exposure and the infection rates are supposed to follow an Erlang distribution. Nevertheless, the authors in [84] propose a modeling of the transmission rate through three different function types, namely piecewise constant, rational and exponential. In this sense, refer to [25] for an account of various priors combined with different estimation methods for the age-structured population, authors also provide a sensitivity analysis of some parameters. Lastly, for more complex propagation models, refer to the individual-based method introduced in [69], where a neural-network is defined such that the total infectivity of a node has a long-tailed distribution to enclose possible super-spreaders.
4.5 Characteristics of the data used
In this section, we identify the types of data used for each article. We only consider real datasets and do not report simulated data. The variety of data types in the categories used reflects the diversity of the approaches.
Clinical data
Most of the articles use clinical data such as the number of reported infected or death cases and some models are validated on simulated data,. In many cases, they are used to evaluate the epidemic parameters implied in the model or to tune the key dates of the spreading dynamics (e.g. arrival, peak, end). Typical clinical datasets are the daily recorded numbers of infections, recoveries, deaths, admissions in hospital, transfers to intensive care units. Those records can be considered at different scales: region, country, world; and at different levels of details: age, gender. These data often come from either the World Health Organization, or the Johns Hopkins University or the Centers for Disease Control. More disease-related specific datasets, often gathered in hospitals, are sometimes used such as temporal recordings of viral shedding [92], time from symptom onset and reporting, and from symptoms onsets to death. Finally, to forecast health service needs, data relating to hospital resources are used [28], e.g. ICU beds capacity [32, 74].
Mobility data
In many articles, various datasets related to geographical mobility are exploited to model individual/population flows. It is of intereset as to quantify possible impact of behavioural change on the spread of the disease. A wide spectrum of sources was employed, that includes information and timing of intervention strategies [49], time spent at home versus at work, changes of influence of public places [31], day-night locations [41] and estimated reduction in mobility from GPS data [28]. This data often comes from SafeGraph or Baidu. Additionally, some articles deal with the effect of airline suspensions or the role played by exported cases. They involve data containing the travel history of infected people [23] and more generally a large volume of information related to air traffic [78] (e.g. from International Air Transport Association). Moreover, in [77] meteorological data are processed to assess its impact on migration. Lastly, individual-based models required high-resolution population density data, distribution of workplace sizes [70] or the locations of major transportation hubs [78].
4.6 Companion tools
The results are displayed in a synthetic mapping for fast access to the key articles of the literature of interest. A tabular form appeared to be a suitable support for the description of the articles (see [21]). In this tabular view, each row represents the information related to one article and article attributes are then categorized through the different columns. A Kibana link (see [20]) was created to enable fast search using various filters on the features of the google sheet to select the desired attributes. We aim to facilitate the identification of articles matching certain specific criteria such as the type of model, the estimation of epidemic parameters, the integration of intervention strategies, etc. Finally the sheet tabular was written in the form of a markdown document accessible on the github page of the project (see [19]).
4.7 Tabular view
In this section we present a synthetic view of the global index table of the articles. Only a subset of characteristics are displayed for the sake of clarity. The complete indexing is divided into five subtables: (1) Phenomenological models, (2) Compartmental deterministic models, (3) Compartmental stochastic models, (4) Individual-level models, (5) Statistical estimation models. The subtables (1), (2), (3), and (4) are gathered following the nomenclature adopted in Section 2. The subtable (5) contains the articles that proposed methods of statistical estimation for epidemic parameters linked to the COVID-19, without developing a model of propagation.
4.7.1 Phenomenological models
4.7.2 Compartmental deterministic models
4.7.3 Compartmental stochastic models
4.7.4 Individual-level models
4.7.5 Statistical estimation models
5 Conclusion
This manuscript reports the modeling choices of several international teams to respond to the urgency of the first months of the outbreak. It transcripts the amazing ability of the scientific community from different fields to react to such a globalized and unprecedented event with so many diverse approaches. Indeed, we highlight the tremendous amount of new publications in such a short time-period (2-3 months) as well as the capacity of the scientists to integrate problematics from various origins (e.g. political, societal, economical, etc.). Also, some papers that were published online kept the results updated with respect to the latest data or knowledge of the virus. In particular, both modelings and forecasts were so much up-to-date that specificities of the virus were immediately taken into account, such as the important proportion of asymptomatic cases or the heterogeneity between regions. The same holds for the particularities of the outbreak according to the resources and political decisions, e.g. the possible difficulty to test the population or monitor the diffusion networks, and the resulting strong uncertainty regarding the number of contaminations. Finally, all the debates that were being held in the society, crucial to the management of the epidemic, were in-detail examined with many realistic elements, for instance the efficiency of wearing masks and the effects of various intervention strategies. In many countries, we saw the importance of these publications for the policies that were adopted as many governments were advised, at a certain level, by scientific communities.
Of course, we need to shade this explosion of articles by a weakened quality regarding the following aspects. Briefly, some standard article components were sometimes poorly developed or even lacked, e.g. the related-work section, the explanation of the modeling choices and their interpretation, the simulations and associated statistical analysis such as the sensitivity analysis, a discussion section contrasting the results, etc.. We note that some authors acknowledged these possible shortcomings through a disclaimer at the beginning of their article. It actually questions the compatibility between the urgency of political action and the necessary hindsight inherent to the scientific research.
Finally, we would like to address the global issue related to the accessibility, the aggregation and the comparison of the data in different countries. Indeed, data acquisition was and is still a region/country-dependent process that evolves in time. Even though the geographical provenance of the data or the a-priori knowledge on the parameters were clearly mentioned, the implication of deriving results for a different region were rarely discussed. Especially as it could entail possible misinterpretations or inappropriate uses of the numerical forecasts. To conclude on a positive note, we would like to highlight the prolific production of open source articles, blogs and codes that were provided and helped collective and reproducible projects in such a critical time.
Data Availability
A digital companion of the present review including an online sheet, a Kibana interface and a markdown document is accessible on the github page.
Acknowledgment
This work was supported by a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, and the Région Ile-de-France.
Footnotes
Contact: {marie.garin{at}ens-paris-saclay.fr,myrto.limnios{at}ens-paris-saclay.fr,alice.nicolai{at}ens-paris-saclay.fr}
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵
- [91].↵
- [92].↵
- [93].↵
- [94].↵
- [95].
- [96].
- [97].
- [98].
- [99].