Abstract
Vaccination strategy is crucial in fighting the COVID-19 pandemic. Since the supply is still limited in many countries, contact network-based interventions can be most powerful to set an efficient strategy by identifying high-risk individuals or communities. However, due to the high dimension, only partial and noisy network information can be available in practice, especially for dynamic systems where contact networks are highly time-variant. Furthermore, the numerous mutations of SARS-CoV-2 have a significant impact on the infectious probability, requiring real-time network updating algorithms. In this study, we propose a sequential network updating approach based on data assimilation techniques to combine different sources of temporal information. We then prioritise the individuals with high-degree or high-centrality, obtained from assimilated networks, for vaccination. The assimilation-based approach is compared with the standard method (based on partially observed networks) and a random selection strategy in terms of vaccination effectiveness in a SIR model. The numerical comparison is first carried out using real-world face-to-face dynamic networks collected in a high school, followed by sequential multi-layer networks generated relying on the Barabasi-Albert model emulating large-scale social networks with several communities.
1 Introduction
The world is still in the midst of the COVID-19 pandemic. The World Health Organization (WHO) and partners are working together on the response, tracking the pandemic, providing recommendations on critical steps, delivering necessary medical supplies to those in need and, finally, racing for the development and introduction of safe and reliable vaccines. Every year, vaccines save millions of lives. Vaccines work to identify and fend off the viruses and bacteria they attack by training and preparing the body’s natural defences, the immune system. If the body is eventually exposed to such disease-causing germs, it is ready to kill them instantly, avoiding illness. By the end of July 2021, nearly 300 vaccine candidates for COVID-19 are currently in trials1, and several of them, such as AstraZeneca, Pfizer, Moderna and Gamaleya, have already been distributed in all countries to protect individuals. No other vaccine in human history has been so eagerly anticipated, especially given that until now no drugs are demonstrated to be available to treat COVID-19. By July 30th 2021, almost or above 50% of the population has been fully vaccinated in North America and European countries, including the USA(50.2%), the UK(57.2%) and Canada(59.6%). However in some less developed nations the vaccination rate is worryingly low such as India(7.6%) and Peru(14.7%), both having experienced a major COVID crisis recently. Since the vaccination capacity in these countries remains limited until now, people who are most at risk, such as healthcare workers and older population [47], are given priority [39]. The effectiveness of the current vaccinations in addressing newly developed virus variants (e.g.,B.1.617.2 (Delta) and C.37 (Lambda)) has also been challenged [43], leading to the possibility of requiring new vaccinations or doses2. Vaccination strategies play an essential role in preventing the rapid diffusion of COVID-19. Clustering analysis has investigated transmission cascades in local social communities. Among all connecting clusters, particular attention has been given to educational settings, including high schools and universities [34]. Much effort has been devoted to maintaining the possibility of face-to-face teaching during the pandemic. However thousands of clusters and outbreaks of COVID-19 have been reported in educational establishments. As mentioned in [39], the Delta variant has become the dominant strain in the UK, spreading rapidly in schools since May 2021. Hence, finding an optimal vaccination strategy for students and staff has become vital to protecting children and young people since many countries, including India and the UK, plan to reopen colleges and schools, either in full or in part, from September 2021.
Continuous effort has been made for several decades to develop the simulation of infectious diseases based on observed social networks[9], including, for instance, H1N1 influenza (face-to-face contact network) [11] and HIV (sexual contact network) [35]. Social networkbased analysis for disease spread modelling has been widely implemented since the outbreak of COVID-19 [44, 24], with the help of SIR (Susceptible-Infected-Recovered) or SEIR (Susceptible-Exposed-Infected-Recovered) models. When the network structure of contacts is (at least) partially observable, network-based interventions are most helpful in determining an optimal vaccination strategy under a limited capacity, which has been proved in a variety of infectious diseases [46], [56]. These strategies are usually based on some individual-level measures, such as node degree or graph centrality, which require knowledge of the full network. Furthermore, significant variance of COVID infection probability is also observed [17] according to ages and activities. Meanwhile, many connecting clusters of COVID-19 have been identified in schools and workplaces[64], where individuals share similar characteristics. Thus the infectious probability of intra-connections inside these clusters could be considered homogeneous. This fact leads to the idea of multi-layer network modelling where the infectious probability may vary from layer to layer.
Much effort has been given to using network-based information for formulating optimal policy responses to COVID-19 [41, 20], including social distancing and countrywide lockdown [7]. However, the observation of social networks is often noisy (with either missing connections or mistaken edge weights), and, most of the time, incomplete [56]. Obtaining precise knowledge is particularly challenging since face-to-face contact networks are strongly time-variant. The noise-level could be up to 74% (missing edges) for observed connection networks, as mentioned by [37]. On the other hand, as pointed out by [3], contact tracing applications can significantly reduce the rate of infection in the studied population when the participation rate is above 60%. In other words, it is critical to maintaining an error level inferior to 40%. Therefore, a considerable gap can be found between the required precision and the available data on the temporal networks. real-time updatings of prior network knowledge is thus essential to improving vaccine efficiency.
In this paper, by investigating how the accuracy of network data could impact vaccination effectiveness, we propose a real-time network updating approach based on sequential data assimilation (DA) techniques. Originally developed in the field of meteorological and environmental science, DA has been applied to a wide variety of industrial domains, including geophysical modelling [10], hydrology [14] and economics [49]. Recently, sequential DA algorithms have also been used for real-time parameter identification in the SIR model for COVID spread simulation [61, 50, 22]. An important advantage of using DA, compared to other statistical models for network reconstruction(e.g [54]) is that DA is widely used for large-dimension problems with noisy and limited prior data. As an example, Graph Neural Networks (GNN) [62] have been demonstrated to have high accuracy in network reconstructions with missing data [65]. However, this approach requires retraining for each temporal graph, leading to difficulties in real-time predictions. DA and dynamic network data have been combined in [15] where the authors propose a graph clustering approach for the efficient localization of error covariances within an ensemble-variational DA framework. [33] presents a relationship between statistical inference using graphical models and optimal sequential estimation algorithms such as Kalman filtering. In this work, DA is employed for real-time updating of the network, including novel information from dynamic observations. This contributes to leveraging the information embedded in different noisy/incomplete observations using an optimisation process to reconstruct the current network. This is computationally feasible for large-scale problems thanks to the sparsity of the contact networks. Here, we propose two DA models for different parametrizations:
The first consists of reconstructing the complete contact network structures by observing the edges in temporal sub-networks (as described in Sect. 4);
The second adjusts inhomogeneous infectious probabilities in a multi-layer network modelling (as described in Sect. 5).
These two models are respectively applied to
A real-world dynamic network dataset describing the contacts of French high school students in a week [29], collected using wearable sensors;
Generated scale-free multi-layer networks, where each layer represents a social community/cluster, determined by individual characteristics such as age or activity.
Preliminary analysis is performed to understand the data structure (clustering, classes, grades) of the high school contact networks and to demonstrate the time-variance. The same data set, collected in a high school in Lyon, has been used to simulate a COVID outbreak and estimate the reproductive ratio R0 in [44]. It is also shown in their work that the study of contact networks in schools or workplaces could lead to more optimal contact-limiting strategies, such as self-isolation or countrywide lockdown. In this work, we make similar assumptions to in terms of infection rate (slightly higher regarding new SARS-CoV-2 variants) in the contact network. However, since the availability of the temporal network data is limited, we set a small value for the average recovery period (5 days) to simulate the highest number of infected in the SIR model. With regard to multi-layer systems, the dynamic networks are generated using the Barabasi-Albert model [2], with a power law degree distribution. The latter exists widely in real social networks. Since mutations of SARS-CoV-2 have continuously arisen, the infection probability in each network layer is supposed to be time-variant, following an additive stochastic process. In both cases, the SIR simulation is carried out with realistic assumptions of COVID-19 to simulate the SARS-CoV-2 propagation, while real-time observations are generated synthetically based on preliminary network analysis. The DA models proposed in this paper are general, and could be applied to various scenarios with different types of real-world dynamic networks and observation data.
In summary, in this work we
simulate the COVID-19 propagation and vaccination impact using real or generated multilayer networks with the SIR model.
propose a DA framework, with two different network parametrizations, to sequentially update the network structure based on noisy prior information and real-time observations.
compare different graph measures, such as node degree and betweenness centrality for vaccination prioritization criteria of prior and assimilated networks.
The paper is organized as follows. Sect. 2 introduces the graph-based diffusion modelling and vaccination strategies. Data assimilation principle and adaptation of graph data are presented in Sect. 3. Sect. 4 shows numerical experiments in real-world social contact networks, and Sect. 5 shows experiments with multi-layer networks. Sect. 6 closes the paper with conclusions and future work.
2 Graph-based diffusion modelling and vaccination strategies
2.1 SIR model
The analysis of the diffusion is conducted using a standard SIR model [4] with an additional state describing the number of vaccinated people, as shown in Fig. 1. For each individual, S, I, R denote the susceptible, the infected and the recovered (patients who are not infectious anymore). The SIR assumption has been widely adapted to simulate COVID-19 propagation [16, 61, 60] since reported COVID reinfection cases (e.g [59]) are still rare compared to the total number of reported cases thus far. The SIR model has also been broadly used in network-based disease simulations via random-walk-based simulations [35]. Each node symbolizes an individual in the social network, whose status can alter from susceptible to infected (S-I), or infected to recovered (I-R), according to the random walk through temporal edges [21]. The transition from susceptible (S) to vaccinated (L) only takes place when required according to chosen vaccination strategies. In contrast to classical disease modelling, since recent research[43] shows that current COVID vaccinations can be significantly less effective when facing new variants (e.g.,B. 1.617. 2 (Delta)), the L-S and L-I transitions can be activated as shown in Fig. 1. More details about the transition probabilities are given in Sect. 2.2. In view of the fact that until these days the infection probability after vaccination is still unclear, L-S and L-I transitions are not considered in this study. Nevertheless, the developed model can easily incorporate these types of transitions when required.
2.2 Graph-based vaccination strategy
Both disease spread simulation and optimal vaccination modelling based on social networks have been receiving increasing interest for different types of infectious diseases [51]. We consider an undirected graph 𝒢 that is a pair of sets 𝒢 = (V, E), where V = {v1, v2…vn} represents the set of individuals (graph nodes) and the set E contains the edges, each connecting a pair of individuals. Each graph edge e ∈ E is represented by a triple e = (vi, vj, wi,j) where vi, vj are the two endpoints and wi,j ∈ ℝ is the edge weight. For unweighted graphs wi,j ∈ {0, 1}, while for weighted graphs wi,j could represent the frequency or the intimacy of the contact. In epidemic spread modelling, the infectious probability pi,j from the individual i to j (and vice versa) is often in function of wi,j, pi,j = ℐ 𝒫 (wi,j). We also note that pi,j may depend on individual-level characteristics of vi and vj, such as age or activities. The connecting graph can be fully represented by the associated adjacency matrix A = {Ai,j }i,j=1,…,n. We use three Boolean vectors {It, Lt, Rt}∈{ { 0, 1} n}3 to indicate the status of each individual, either infected, vaccinated or recovered in the SIR model, at time t. The recovery period Tγ ∈ N is an uniform distributed random variable generated individually for each individual.
If we adopt the edge-wise function ℐ 𝒫(.) in the whole network, the infectious probability vector at time t in this SIR model reads where 1n = [1, 1…1]T and ⊙ denotes the vector-wise Hadamard product.
Following a uniform probability distribution, the vector of infections It is simulated using and It−1. The only controllable variable in Eq. 2 is the vaccination vector Lt.
Different graph-based vaccination strategies can be employed to enhance the immunization impact with a limited vaccination capacity. The state of the art approaches are usually determined by observed individual- or community-level social connections, often involving classical graph measures, for instance, graph degree, betweenness centrality [26] or community links[12]. Much efforts have also been made to use these strategies in practical settings where significant positive impacts have been observed [31]. Since the available graph data often include non-negligible uncertainties (missing vertices or edges), statistical models are commonly employed to provide an optimal estimation of these graph measures. Practical approaches involve, for example, fixed choice designs (FCD) [45] and the nomination strategy [23], both based on an estimation of the graph degree. Even with partially observed dynamic networks, the vaccination strategy could be significantly improved in terms of reducing the maximum infected number and delaying the disease propagation, compared to a random choice [63]. Nevertheless, precise knowledge of the network structure is crucial to determining an efficient vaccination strategy. It is essential to use community-based approaches (e.g [30], [12]), since graph clustering algorithms can be sensitive to noises. However, the data collection of dynamic social networks remains cumbersome, especially for large dimensional problems. In this paper, we conducted our analysis based on three classical strategies, considered less sensitive to data noise, compared to community-based approaches.
Random
The individuals to be vaccinated are randomly chosen according to the number of doses limited, where no network knowledge is used.
Highest degree
For each temporal network, we choose to vaccinate people with the most contacts based on prior knowledge. Only observable individuals are taken into account. The degree d(v) of node v in a network is simply defined as the sum of the column (or the row for undirected graphs) of the adjacency matrix,
Highest Centrality
The betweenness centrality [26] g(v) of node v is defined as the number of shortest paths of all pairs of nodes in the graph that pass by the node v, where represents the total number of shortest paths from node u to node q and is the number of those paths that pass through v.
Other graph measures relying on detailed understandings of the network (e.g [12]) could also be used to establish a vaccine strategy. However, in real applications precise knowledge of the network is often out of reach. Here, our criteria for choosing graph-based vaccination strategies are two-folds: computationally efficient and non-sensitive to observation noise. The latter ensures the “validity” of the methodology even when working with incomplete networks. To enhance our estimation of dynamic contact networks, we make use of data assimilation algorithms.
3 Data assimilation principle and adaptation of graph data
In this section we introduce the variational data assimilation concept and the resolution using a linear estimator. We also introduce the novel approach which combines DA techniques with dynamic network data.
3.1 Variational assimilation and BLUE
DA algorithms aim to combine different sources of noisy information in order to provide a more reliable estimation of the current system. The state variables could be either a physical field or a sequence of parameters. The true state, denoted by xtrue, stands for the theoretical value of the state at some given coordinates/time, often out of reach in real-world applications. The objective of the assimilation is to gain an optimal approximation xa of the true state xtrue, based on the prior information which are two parts: an initial state estimation xb (so-called the background state) and an observation vector y. The former is often issued from prior numerical simulations/predictions while the latter can be obtained via physical measures of some control variables. Their tolerances, regarding theoretical values, are quantified by ϵb and ϵy, where the observation operator ℋ from the state space to the observable space is supposed to be known. The probability distributions of the prior error are supposed to be centred Gaussian, characterized respectively by the covariance matrices B and O.
The key idea in variational methods is to find a balance between the background and the observations using maximum a posteriori (MAP) method [13]. This leads to the loss function weighted by the inverse of B and O, The optimisation problem defined by the objective function of Eq. (7) is called three-dimensional variational method (3D-VAR), which can also be considered as the general equation of variational methods without considering the transition model error. The output of Eq. 7 is denoted as xa, i.e. If ℋ can be approximated by some linear operator H, Eq. 8 can be solved via BLUE (Best Linearized Unbiased Estimator) formulation, where PA = Cov(xa − xtrue) is the analyzed error covariance and the K matrix, given by is so called the Kalman gain matrix. In the rest of this paper, we denote H as the linearized transformation operator. The case when ℋ is non-linear is more challenging for finding the minimum of Eq. 6, especially for high-dimensional problems. The resolution often involves gradient descent algorithms (such as “L-BFGS-B” or adjoint-based numerical techniques).
3.2 Online assimilation with graph data
The essential idea is to perform real-time updating of the partially observed dynamic networks based on other available information, such as sub-graph structures or the current number of those infected. To this end, the prior observed network at time t is considered as the background state (i.e.,), while other information is embedded in the observation vector yt.
Once the current contact network is updated based on Eq. 7, vaccination strategies can be implemented on the analyzed network (i.e.,step 1 → step 2 in Fig. 2) which is a more accurate approximation of the true state. The degree and the betweenness centrality of the assimilated network is given by
where denotes the element (k, v) of the adjacency matrix . Similar expressions of and gb(v) on the background state can be given using Ab and . The principle of real-time assimilation with graph data is illustrated in Fig. 2 where the virus propagation is simulated using the SIR model, as described in Sect. 2.2 between two vaccination steps. Compared to the overlapped graph, the advantage of working with temporal networks is that the temporal correlation could be considered. In fact, an individual can be active for a relatively short period of time only, as shown below in Sect. 4.1. Therefore, instead of using an overlapped graph (if available), analysing temporal networks can result in an efficient real-time vaccination strategy.
A major challenge of implementing DA algorithms with graph data is the computational cost since the adjacency matrix At, considered as the state variable, is a two-dimensional vector. We can rely on the assumption of graph sparsity and appropriate parameterization to reduce the computational burden. In this work, we propose two DA frameworks for dynamic networks updating, respectively introduced in Sect. 4 and 5. The former aims to reconstruct the full network with observations of sub-graphs, while the latter attempts to adjust the parameterized community-wise infectious probability, relying on multi-layer modelling. These two modellings, relatively at the local and global scale, also show the flexibility of this data assimilation framework.
4 Numerical experiments in real-world social contact networks
4.1 Assumptions and preliminary analysis
This study is based on recently (before the COVID outbreak) collected face-to-face contact data from a French high school [29], which has been used to simulate a COVID outbreak [44]. The connection networks of 329 students (coverage of 86% of the students) in a high school in Lyon are available for 7374 time steps in a week. For the sake of simplicity, we condense the dynamic graph to 78 time steps by overlapping every 100 consecutive networks. Each time condensed time step symbolizes 30∼ 60 minutes. The temporal networks remain sparse since the average graph density (i.e. number of non-zero edges divided by the number of node pairs) is equal to 0.76%. All contact networks are assumed to be undirected, which means the associated adjacency matrices are all symmetric (i.e.,) and the virus could spread in both directions of an edge. According to [44], the infectious probability (of a 20-second contact) in this network can be estimated as p ≈ 0.1%∼ 1%. However, this estimated probability might be contested for the newly discovered SARS-CoV-2 variants [32]. In this paper, in order to adequately investigate the optimality of different vaccination strategies, we fix the infectious probability to p = 2%. The average recovery period in the SIR model is set to 60 time steps (around 4 to 5 days), following a uniform probability distribution, i.e. Tγ ∼ unif(55, 65).
We begin by performing some preliminary analysis of the network data in order to better understand the underlying graph structures. The overlapped network (i.e. ) of all the time steps is shown in Fig. 3(a) where a clear community structure can be observed. Identifying these communities is crucial to simulating the disease spread [36], especially for a highly infectious virus like SARS-CoV-2 [55], and to determining optimal vaccination strategies. Much effort has been given to developing community-detection algorithms in social networks [1, 53]. In this work, we make use of the Fluid community detection algorithm proposed by [53], which is advantageous for sparse graphs since the algorithm complexity is linear to the number of non-zero edges in the network, i.e. 𝒪(|E |).
In real applications, specifying the number of communities is usually difficult. Here, we apply several times the community detection algorithms against different assumed community numbers kc, before evaluating the performance rate pr(𝒞) [25] of the obtained partition 𝒞. The latter is defined as where indicate the number of edges of intra- and inter-clusters respectively. The performance rate is commonly used as an indicator for finding the optimal community number, which is a standard approach for graph clustering problems. According to the result presented in Fig. 3(b), where we clearly observe a stationary performance rate starting from kc = 4, we choose to proceed with the optimal number of clusters . The final clustering result is displayed in Fig. 3(a) where clusters/communities are shown in red, green and blue. The three detected communities are equivalently distributed, as shown by the reordered adjacency matrix (Fig. 3(c)), with 106, 110 and 111 nodes respectively. From a practical perspective, these communities could be considered as different grades or classes in the high school, with a similar structure to the graph data presented in [27].
4.2 DA modelling and numerical results
Since it is infeasible to collect contact networks via wireless equipment in all educational settings post lockdown, the objective of this study is to enhance the vaccination strategy when only partial/noisy information is available, for instance, via tracing applications. For this reason, the full contact networks are supposed to be out of reach. In terms of background states and observations, we suppose that the temporal network is only partially observable a priori where 50% to 70% of nodes are missing in the background estimation of the network . The missing nodes are selected randomly and kept invariant at all time steps. In reality, the missing nodes could refer to, for example, people who haven’t installed the tracing application on their smartphone. We also use an observation vector yt, which contains the sub-networks for each of these three detected clusters. Thus, we suppose that the intra-community contacts of students in each class/grade are fully observable with yt. The objective is to perform DA algorithms sequentially to correct the knowledge of the background network relying on the observed sub-networks. The transformation operator H is thus linear (sub-Identity matrix) and the DA problem is solved via BLUE, as shown in Eq. 7. and yt are vectorized with Identity error covariances B and O, as demonstrated in Fig. 4.
After each vaccination, the SIR model is applied to simulate the virus propagation until the next time step, as summarized in Eq. 2. An essential advantage of BLUE-type formulation with invariant prior covariances is that the Kalman gain matrix can be computed offline a priori since it is invariant to the current xb and y. The computational cost of DA can thus be considerably reduced. The vaccination capacity is fixed 2%(= 6 individuals of all students for all strategies (random, highest degree, highest centrality) presented in Sect. 2.2, based on prior or assimilated graphs.
The evolution of the number of infected |It|, according to different vaccination strategies, is displayed in Fig. 5, where the percentage of missing nodes in the background state is fixed as 50%, 60% and 70% respectively. To acquire robust numerical results, each type of simulation with or without vaccinations is repeated 10 times and the average values are drawn in solid or dashed curves in Fig. 5. Standard deviations of the simulations (except dashed lines) are also displayed in transparent shades to ensure the robustness of the comparison. The averaged maximum number of infected for each strategy is shown in Table 1. We note that vaccinations take place at every time step for 6 selected students (≈ 2% of the population) after the simulation of virus propagations with a infectious probability of 2% for each temporal edge. The initial infected It=0, commonly used for all simulations, is randomly simulated with a probability of P ((It=0)k) = 10% for k = 1, …, 329.
From Fig. 5, we observe that almost all averaged curves rise to a high point and peak around t = 50− 60 when all individuals are either infected or vaccinated. Since the vaccination process takes place in a relatively short period (a week), we suppose that the infected individuals are not detected in real-time. As a consequence, a student can be vaccinated after being infected by the virus, leading to vaccine failure. This fact emphasizes the importance of the vaccination strategy chosen. What can be clearly observed from Fig. 5 is the decreasing infected number according to the vaccination strategy in the order of free (no vaccination) → random →background→ assimilated (DA). This order is globally consistent regardless of time. First, all vaccination strategies manage to significantly reduce the number of infected and delay virus propagation compared to the free simulation (green curve). In terms of maximum infected number, for all three cases, the peak value is reduced on average by 26%, 34%, 34%, 40% and 37%, respectively for random, background with highest degree, background with highest centrality, assimilated with highest degree and assimilated with highest centrality. All other strategies are dominated by the assimilated curves, especially when proceeding with the highest degree strategy. The difference, in particular between background and assimilated curves, is more significant when working with large-scale networks. On the other hand, for background-network-based strategies, a growth of maximum infected number against prior error level is noticed in Table 1 while the results based on assimilated networks remain robust. This fact promotes the use of data assimilation on network data when prior error level can not be precisely specified. We note that the missing nodes at each time step are generated independently with no temporal correlation, explaining why reasonably good results can be obtained with 70% missing nodes. In summary, numerical results show that the DA-based real-time updating of networks considerably improves the impact of vaccination, resulting in reducing virus spread.
In these experiments, the use of node degree (solid curves) and betweenness centrality, for both background (red) and assimilated (blue) cases, exhibits a similar performance. Such fact suggests a high-level (non-negligible) inter-clusters connections where a contrary case can be found in Sect. 5.
5 Experiments with multi-layer networks
5.1 Multi-layer modelling of scale-free networks
As stated in recent research [42], the infectious probability of COVID-19 can differ significantly for different populations, based on their age, gender, activities and so on. For example, both the transmissibility and the mortality rate is reported to be higher for aged people, necessitating appropriate strategies to protect this fraction of the population. SARS-CoV-2 variants may also vary geographically [5], leading to inhomogeneous transition probabilities. Since the outbreak of the COVID-19 pandemic, continuous effort has been made to understand the behaviour of the virus infection with respect to individual-level (e.g. aged people [48]) and community-level (e.g. healthcare workers [58]) characteristics. These phenomena have led to the idea of using multi-layer networks, where different types of connections exist between graph nodes (see Fig. 6(a)) to simulate the virus spread in social networks. In general, multi-layer (also known as “multiplex”) networks [18] are widely used to study graph diffusion problems [28] and define generalized versions of Pagerank [19]. Recently, multi-layer modelling has also been applied to COVID-19 spread simulation [57] where each layer refers to a potential contamination community, such as school, workplace or transport. Appropriate use of the information on these layers can optimise vaccination strategies as mentioned in [8], by prioritising the populations with high risk and high transmissibility.
Since the collection of large-scale face-to-face contact multi-layer dynamic networks is extremely complicated, we rely on conceptual modelling in this work to further examine the performance of the novel approach. Dynamic contact networks of 1000 individuals and 5 layers (each of 200 nodes) are synthetically generated, where each layer suggests a specific group in the population, according to their age or activities (e.g. students, healthcare workers). Assuming all the edges in the temporal networks are fully observable, our objective is to calibrate the time-variant infection probabilities {pi,t} i=1,…,5 based on the observation of infected number in each of the layers {Ii,t}i=1,…,5. The temporal variance of {pi,t}i=1,…,5 can be a consequence of SARS-CoV-2 mutations. More precisely, the values of {pi,t}i=1,…,5 update every 5 time steps, following a stochastic process, where δp,m∼ unif (−0.04%, 0.04%) and the observation vector consists of incremental infected numbers ΔIi,t = Ii,t — Ii,t−1. For inter-layer connections, the infectious probability is determined by the layer of the receiving nodes, i.e. as shown in Fig. 6(a). It is worth mentioning that the associated adjacency matrix At is no longer symmetric under this assumption. Nevertheless, the network virus spread modelling in Sect. 2.2 remains valid.
As for the generation of temporal networks, we depend on the concept of scale-free networks [52] where the degree distribution follows a power law, where Psf(k) stands for the probability of a node to have k connections while 2≤ γ≤ 3 is a chosen parameter. To simulate intra-connections in each layer, we use the Barabasi-Albert (BA) model [2], which is scale-free with γ = 3, incorporating two important concepts in graph theory: growth and preferential attachment [38], which exist widely in social networks. Therefore, the BA model is a reference tool to generate real-world-like networks, including web connections or citation networks. To generate a BA network, nodes are added to the network consecutively where the probability of the new node to be connected with the existing node v writes The denominator in Eq. 17 represents twice the current number of edges in the network. Individuals with a higher degree have a stronger ability to grab links added to the BA network, which is an adequate assumption for social networks. Moreover, the inter-layer connections are generated randomly with a density of 0.5%, much sparser than intra-layer edges. Eventually, an example of a complete temporal network is drawn in Fig. 6(b) where the five layers are shown in different colors.
Since temporal edges are supposed to be known in this modelling, we aim to estimate {pi,t}i=1..5 based on the evolution of the infected number in all five layers. In fact, we can predict Δ {Ii,t}i=1..5 via a prior estimation of {pi,t}, establishing a state-observation mapping H ∈ ℝ5×5 for DA algorithms. The DA problem could be addressed as where The simulation/vaccination framework is similar to the one in Sect. 4 with a vaccination rate of ≈2% of the population at each time step. This means that all people will be vaccinated before t = 50. For all assimilations, the error covariances are set to be identity matrices, as in Sect. 4. Our goal is to determine an optimal vaccination order based on available noisy information. In order to cover more possible scenarios, we set various initial probabilities {pi,0}, as shown in Table 2, denoted as CIa, …, CIf. For the sake of simplicity, {pi,0} always follow a decreasing order from layer 1 to layer 5. Typically, the initial probabilities in CIf are more homogeneous compared to CIa or CIe. To give an example, CIa could be used to simulate, for instance, a scenario in the department of computing at Imperial College where nearly 800 students plus faculty members can be found. The layer with high infectious probability may consist of professors, (senior) researchers and HR officers, while the other four layers can represent graduate or undergraduate students of different grades. The former community has a much higher average age, in contrast to the latter. Furthermore, each community holds a dense intra-connections, coherent with our model assumption. The diversity of the initial conditions (CIa, …, CIf) ensures the robustness of the proposed approach.
The experiments set-up is similar to the one in Sect. 4. While computing the node degree and the betweenness centrality, the graph edges are weighted by either the background or the analyzed layer probabilities. Since the layer information is unattainable a priori, background networks are set to be homogeneous (i.e., ). The evolution of the infected number, issued from a Monte Carlo test of 10 simulations, is illustrated in Fig. 7. The stand deviation is represented by colored transparent zones. We also display the result of using exact {pi,t} (instead of (red) or (green)) for vaccination in yellow. This curve is thus considered as the optimal target for the assimilation-based strategy. When vaccinating the nodes with the highest degree, a substantial advantage of the DA approach (solid green line) compared to the background one (solid red line), can be noticed in all 6 sub-figures of Fig. 7. In fact, both the maximum infected number and the average standard deviation have been significantly reduced, as confirmed in Table 3. On the other hand, DA has much less impact when selecting the individuals with the highest centrality, as shown by the dashed lines in Fig. 7. A reasonable explanation for this could be the phenomenon of brokerage [40]. The endpoints of the few inter-layer edges play an essential role in virus spread. These nodes, also known as “broker”, do not necessarily have a high degree in the graph. However, since many of the shortest paths pass by them from one layer to another, the betweenness centrality may peak at these nodes with or without adjusting {pi,t}. This fact shows that when precise knowledge about inhomogeneous infectious probability is out of reach, proceeding with the highest centrality might be a robust choice. Nevertheless, both the dashed green line and the dashed red line are dominated by the solid green line (assimilated networks with the highest degree) in all 6 sub-figures.
We also note that for Fig. 7(a,b,d) where the five layers exhibit more variance for the initial probabilities, the assimilated curve is much closer to the optimal one. In fact, optimally vaccinating an inhomogeneous network requires less accurate knowledge of layer probabilities so long as the most infectious layers can be identified. For example, proceeding with (5%, 1%, 1%, 1%, 1%) and (7%, 0.5%, 0.5%, 0.5%, 0.5%) for vaccine priorities may lead to similar results.
The evolution of the normalized true layer probabilities is , while their posterior (analyzed) estimation is . The gap between the estimated and the true ratio of probabilities is rapidly reduced with the increasing of , which results in a more optimal vaccination strategy. Since vaccinating infected individuals is ineffective, the early phase (around the first 20 time steps) of the outbreak is crucial to delaying the COVID spread because the most active individuals (either in terms of degree or centrality) can be infected very quickly. Therefore, the DA correction at the start of the vaccination process plays an essential role in reducing the propagation speed. On another note, we also observe that a strong oscillation in the values of which implies high instability of the observation vector yt = [ΔIi,t]i=1..5 due to sampling uncert ainties.
In summary, the assimilation-based vaccination strategy shows competitive performance in this multi-layer modelling even though the assimilated layer probabilities are just approximations. Using the assimilated temporal networks with “highest degree” dominates other approaches, with a smaller average infected number and lower standard deviation.
6 Conclusion and Future Work
Despite the continuous efforts, including vaccination and countrywide lockdown, it remains unclear how the COVID-19 pandemic will play out. Determining an efficient vaccination strategy is essential for combating the COVID long-term, especially with arising numbers of SARS-CoV-2 mutations. New vaccination types can be considered to fight an increasing number of SARS-CoV-2 variants. For the moment, it remains difficult to vaccinate the entire population in many countries. Using temporal contact network information can significantly improve the vaccination impact on slowing down disease propagation. This is crucial to alleviating the burden on hospitals and emergency clinics. This may also allow for the loosening of some restrictions, which is crucial to saving economies from the current pandemic. In this paper, we propose a data assimilation framework to monitor the evolution of social contact networks based on different information sources. The assimilated networks are used to govern vaccination strategies by prioritising high-risk individuals. An important strength of this framework compared to other network reconstruction methods, is the flexibility of dealing with available data and the efficiency for large-scale networks. We have applied the proposed approach to real high school contact networks with synthetic observations and real-world-like dynamic multi-layer networks generated using the Barbasi-Albert model. The latter is used to simulate virus propagation with inhomogeneous community-level infectious probabilities. In both applications, the proposed method exhibits a significant advantage in terms of effectiveness (smaller infected number) and robustness (lower deviation). The choice of graph measures for identifying high-risk individuals, such as node degree or betweenness centrality, has also been discussed through numerical results in this study. We note that some recent work focuses on establishing data-driven models to predict individual- or community-level infection probability by learning personal data, including height, weight and health records. Computational fluid dynamics (CFD) simulations are also being developed to simulate SARS-CoV-2 transmission in schools and offices. Future work can be considered to improve individual-level modelling by incorporating these features in the contact networks. Our work opens promising perspectives on governing efficient vaccination strategies, especially for countries with a relatively low vaccination rate, or, if new vaccinations (e.g., against specific SARS-CoV-2 variants) are disseminated. The current modelling could be extended when more network information (e.g. from tracing applications[6]) becomes available.
Data Availability
All data in this manuscript is available
Code and data availability
Code for the proposed approach and the generated data is available at https://github.com/DL-WG/network_COVID
Conflict of interest statement
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Acknowledgements
This work is supported by the EP/V036777/1 Risk EvaLuatIon fAst iNtelligent Tool (RELIANT) for COVID19 and the EP/T000414/1 PREdictive Modelling with QuantIfication of UncERtainty for MultiphasE Systems (PREMIERE). This research was partially funded by the Leverhulme Centre for Wildfires, Environment and Society through the Leverhulme Trust, grant number RC-2018-023.
Footnotes
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵