Assessing the Global Tendency of COVID-19 Outbreak ================================================== * Qinghe Liu * Zhicheng Liu * Junkai Zhu * Yuhao Zhu * Deqiang Li * Zefei Gao * Liuling Zhou * Yuanbo Tang * Xiang Zhang * Junyan Yang * Qiao Wang ## ABSTRACT COVID-19 is now widely spreading around the world as a global pandemic. In this report, we estimated the global tendency of COVID-19 and analyzed the associated global epidemic risk, given that the status quo is continued without further measures being taken. Based on official data of confirmed and recovered cases until May 21, 2020, the results showed that the global *R*, excluding China, was estimated to be 2.76 (95% CI: 2.57 – 2.95). The United States, Germany, Italy, and Spain have peak values over 100,000. Using dynamical model and cluster analysis, we partition the globe into four regional epicenters of the outbreak: Southeast Asia extending southward to Oceania, the Middle East, Western Europe, and North America. Among them, Western Europe would become the major center of the outbreak. The peak values in Germany, Italy, and Spain were estimated to be 228,000, 291,000, and 298,000, respectively. Based on the current control measures by May 21, 2020, the peak value in the United States will reach 2,114,000. The cumulative number of 51 mainly researched countries’ patients might finally attain 6,542,000 (95% CI: 4,772,000 – 40,735,000). We also estimated the diagnosis rate, recovery rate, and infection degree of each country or region, and used clustering algorithm to retrieve countries or regions with similar epidemic characteristics. Several suggestions have been proposed for countries or regions in different clusters. Keywords * COVID-19 * SARS-Cov-2 * SEIR model * Global epidemic * t-SNE; BIRCH algorithm ## 1. INTRODUCTION On Dec. 8, 2019, the first case of the novel coronavirus pneumonia (NCP) was confirmed in Wuhan, Hubei Province, China, caused by a new type of coronavirus named ‘2019 new coronavirus [1] (SARS-CoV-19)’ [2]. The disease spread across China during the traditional Chinese Spring Festival. At 24:00 on March 14, China had passed the peak of the epidemic, and started to recover gradually, with a total of 80,844 confirmed cases reported [3]. However, new cases of disease began to appear in other parts of the world and increased rapidly. Because of the rapid spread of the disease, the World Heath Organization (WHO) has assessed COVID-19 as a pandemic [4], which is the first pandemic caused by a coronavirus. As of March 14, 2020, the cases of COVID-19 have been reported in 135 countries and regions worldwide, with a total of 142,539 confirmed cases (61,518 outside of China) and 5,393 deaths (2,199 outside of China) [5]. Among them, a total of 17,660 cases were confirmed in Italy, 11,364 in Iran, 8,086 in the Republic of Korea, 1,678 in the United States. Furthermore, Europe became the epicenter of the pandemic, with more reported cases and deaths than the rest of the world combined, apart from China [6]. With the outbreak of COVID-19, countries or regions have been taking different measures to cope with the spread of the pandemic, but the number of infected people is still increasing. Meanwhile, the spread of the pandemic has also caused a huge impact on the trade flows and economic affairs of the world. These situations raised many urgent problems. How will this epidemics spread in countries or regions around the world? When will the spread of epidemic arrive at the peak or turn to stabilize? How many people will be at risk of infection? Therefore, it is of great significance to analyze the trend of COVID-19 and predict the arrival of peaks for the prevention and control of the pandemic all over the world. Based on global data [7], we perform an analysis of epidemic status in 51 countries, which had the number of confirmed cases over 100 in Mar. 27, 2020. Section 2 of this paper briefly discusses the most up-to-date literature related to the COVID-19 epidemic status. Section 3 describes the transmission dynamics model in detail. Section 4 presents the simulation and analysis of the estimated trend. Section 5 analyzes the four epidemic transmission clusters with different characteristics and give error analysis for confirming the rationality of the results. Finally, the conclusion and discussion are respectively presented in Sections 6 and 7. ## 2. RELATED WORK With the further spread of the epidemic, many scholars analyzed the characteristics of covid-19. On March 1, 2020, Li Y *et al* [8], used the outside-China diagnosis number released by the WHO and built a mathematical model to capture the global trend of epidemics outside China. They found that 34 patients outside China were not found. The worldwide epidemic trend is approximately exponential, and may grow 10 folds every 19 days. Zhuang Z *et al* [9], used a stochastic model to simulate the transmission process of South Korea and Italy under two corresponding assumptions of exponential growth periods. The results indicated that the reproductive number of South Korea and Italy were 2.6 (95% CI: 2.3-2.9) and 3.3 (95% CI: 3.0-3.6), respectively. On March 17, Darwin R *et al* [10], used the progression of the epidemic curve and defined frame work to estimate the instantaneous reproductive number. They divided the epidemic into different serial interval scenarios, and estimated the *R* values, average number of secondary transmissions per patient. The *R* value in Japan, Germany, Spain, Kuwait, and France are over 2; in Italy, Iran, and South Korea are even over 10, however, the R will be low after the social distancing intervention. The correlation analysis of COVID-19’s various factors is also an important topic. Cody Carroll, Satarupa Bhattacharjee, Yaqing Chen, *et al* [11] established a framework for quantifying and comparing cases and deaths across countries longitudinally. They found that decreased workplace mobility is associated with lower doubling rates with a roughly two week delay, and case fatality rates exhibit a positive feedback pattern. Alessio Notari, Giorgio Torrieri [12] analyzed risk factors correlated with the initial transmission growth rate of the recent COVID-19 pandemic in different countries on May 8, 2020 and found moderate evidence for positive correlation with: CO2 (and SO) emissions (p-value 0.015), type-1 diabetes in children (p-value 0.023), vaccination coverage for Tuberculosis (BCG) (p-value 0.028). In addition, many scholars have also discussed different models. Andrey Gerasimov, Georgy Lebedev, Mikhail Lebedev and Irina Semenycheva proposed a heterogeneous model [13] on May 7, 2020 to assess the effects of different measures for infection risk control. They found that after this heterogeneity is incorporated in the model, several characteristics of the epidemic are estimated more accurately. For higher accuracy, Sina F. Ardabili, Amir Mosavi, Pedram Ghamisi, *et al* [14] used multi-layered perceptron and adaptive network-based fuzzy inference system to analyze COVID-19, which got better results than traditional methods. On May 1, 2020, Kevin L *et al* [15] proposed a dynamic SEIR epidemiology model with a time-varying reproduction number, which they identify using machine learning and uncertainty quantification. This new dynamic SEIR model provides the flexibility to simulate various outbreak control and exit strategies to inform political decision making and identify safe solutions in the benefit of global health. The prediction of pandemic was also a hot spot. Zhang Z *et al* [16] estimated that the peaks of infectious cases in South Korea, Italy, and Iran are expected to occur at the end of March, and the percentages of population infections will less than 0.01%, 0.05%, and 0.02%, respectively. On March 10, 2020, Zhang Z *et al* [17] estimated that the inflection point arrival time will be March 6-12 for South Korea, March 10-24 for Italy and March 10-24 for Iran, and the cumulative number of cases will reach 20k in South Korea, 209k in Italy and 226k in Iran, respectively. Li L *et al* [18] thought that the epidemic in South Korea will be basically under control at the end of March, and the inflection day is January 7, before the control the reproductive number is 4.2, and 0.1 after the control. Epidemic size in Italy and Iran will reach 200,000 and 20,000, respectively at the end of March. On May 8, 2020, Athanasios S *et al* [19], found that the rate of deaths reaches a maximum, their models provide estimates for the time that a plateau will be reached signifying that the epidemic is approaching its end, as well as for the cumulative number of deaths at that time. The plateau is defined to occur when the rate of deaths is 5% of the maximum rate. In our previous work [20], we predicted the time and value of the inflection point and peak point in both Hubei and outside Hubei of China. Inspired by it, we conducted a worldwide prediction of the value of the peak point and analyzed the main characteristics and appearances of COVID-19 in the world. The predicted results are based on the current medical situation, if there is specific medicine for COVID-19 in the future, the time and value of peak point of all countries will be advanced and shrinking compared with non-medicine situations. ## 3. COVID-19 TRANSMISSION MODEL: TRANSMISSION DYNAMICS ### 3.1. Transmission Dynamics Model We constructed a transmission dynamics model to infer the epidemiological characteristics and the peak size and trend of COVID-19 based on the existing infectious data and recovered data about various countries or regions. The population of this paper are divided into four main categories based on the SEIR model, i.e. *S, E, I*, and *R* referring to [21]: the Susceptibles, Exposed, Infectious, and Recovered, respectively [22][23]. In the study of China’s epidemic situation, many factors such as Spring Festival and control policies and so on are taken into account, but in contrast, this paper will analyze the heterogeneity of each country or region on the basis of the common spread of infectious diseases. Although these countries have different policies and customs, their basic spread pattern of the epidemic still meets the principle of dynamics [24]. S, the susceptibles, is equal to the total population (N in equation (3.1)) in the research of COVID-19 due to the general susceptibility of the population. The main assumptions of this model are as follows: 1. Because the difference in infectivity between the exposed and the infectious population is unknown, both groups of *E* and *I* are set to follow the same coefficient *β* to represent the average infection level of COVID-19. 2. In the predicted time scale, the existing influences of policies or culture of the target object will be unchanged. 3. During the transmission of virus, the probability of infection in contact with each person is equal in the same target group. ![Formula][1] where *t* means time, and *S, E, I, R* represent four different state variables, respectively. Class *S* represents a healthy population. Once transferred to class *E*, it means the population has been infected until transferred to class *R*, recovered. People in class *I* have been diagnosed due to symptom detection. Formula (3.1) can be represented by the state transition diagram in Figure 1. The direction of the arrow in Figure 1 represents the transition relationships between different states with time. All parameters and physical interpretations of the model are shown in Table 1. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F1) Figure 1. SEIR system state transition diagram. View this table: [Table 1.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/T1) Table 1. List of Symbols ### 3.2 Practical Consideration We divided the spread of the COVID-19 into three phases according to the evolution of the size of the confirmed cases, and the schematic diagram of each stage for the number of patients is shown in Figure 2. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F2) Figure 2. Phase diagram of confirmed cases changing with time in a certain region. Phase one: the time range of this phase starts from the beginning of COVID-19 transmission and ends when the number of patients begins to increase significantly. COVID-19 spreads during the incubation period at this phase, but the confirmed cases officially counted remain at a low level, and the public’s awareness of defense is light. Phase two: the main feature is that a large number of patients have been diagnosed, and the statistical data of confirmed cases increase significantly. But there still has no policy intervention and large-scale quarantine measures. Phase three: As the intensity of policy intervention increases, the spread of COVID-19 is effectively suppressed, and growth rate of confirmed cases is gradually controlled until the end of the epidemic. The main feature of this phase is strong human intervention. It is crucial to get as much information as possible from the data in three stages. Because this paper aims at the analysis of the peak size and its arrival time in each country or region, two latter transmission phases will be our focus in this paper. In phase two, SARS-CoV-2 spread rapidly. The infection rate remains at a high level, and the recovery level is low because the government had not carried out obviously effective policy control. From phase three, the infection rate decreases and the recovery rate begins to increase. Through the study of reported data of COVID-19 in China, we used the power law to simulate the trend of infection rate and recovery rate in phase three, as shown in equation (3.2). Since the coefficient c plays a role in inhibiting the growth of the number of confirmed cases, we define c as the incremental inhibition ratio. ![Formula][2] where *β* and *γ* are the initial values of β and γ, and t is time from the beginning of phase three. ### 3.3. Estimation of *R* The basic reproduction number, *R*, refers to the average value of how many people an infected person can transmit the virus through natural transmission without external intervention [25]. From the second generator approach in [26], the equation (3.3) is derived as follows: ![Formula][3] where *α*, *β*, and *γ*1 are the transfer variables of the SEIR model. (1*/α + γ*1) represents duration of virus transmission. Considering that the calculation of *R* is in the case of natural transmission, so we use the parameters of the starting time of epidemic spread in each country or region to estimate *R* to avoid the impact of human intervention. ### 3.4. Data Source In this study, we used open dataset from Johns Hopkins University ([https://github.com/CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19)). The details of the data source are shown in Appendix 1. *Johns Hopkins University*. Johns Hopkins University shared their data on GitHub for academic and scientific research. We have compiled their open data into a chronology of confirmed data for prediction and visualization use. ## 4. MAIN RESULTS OF THE SEIR MODEL ### 4.1. Estimate of *R* While estimating the *R* of the epidemic globally, we ignore the countries whose ultimate cumulative number of confirmed cases are less than 5000 by May 21, 2020 and focus on the remaining 51 countries (please refer to appendix). Parameters were estimated by minimizing the least square loss function of the estimated number of confirmed cases and the observed number of confirmed cases. The average basic reproduction number of these countries was 2.76 (95% CI: 2.57 – 2.95). As we can notice, COVID-19 is lashing a great part of the world, and the sufferings are not likely to end soon. By estimating the average value of *R*, we obtain the objective information of current situations as well as the importance of epidemic prevention. Among these 51 mainly researched countries, the *R* of Switzerland, Austria, China and Brazil are higher than 3.5, suggesting the great potential of a destructive outburst. France is one of these countries with a basic reproduction number of 3.61, much higher than the global average(See Appendix 2 for more details). The number of the United States is 2.59, which is slightly below the global average, revealing the urgency of the current outbreak in the United States. The lowest value of *R* is 1.77 in Korea, still greater than 1. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F3) Figure 3. *R* in 51 countries or regions over the world, which has the number of confirmed cases exceeding 100 by May 21, 2020. Blue means the higher the value of *R*, while white means the lower the the value of *R*. ### 4.2. Estimate of the Peak of Confirmed Cases in Country-Level The peak value is defined as the cumulative number of confirmed cases. After this peak point, the temporary number of cases gradually decrease or slightly fluctuate, and the epidemic subsides [20]. The estimated trends of 51 countries are all listed in Appendix 2 for details. Thirteen countries will have a peak value over 100,000, including the United States, Russian Federation, Brazil, the United Kingdom, Spain, Italy, France, Germany, Turkey, Iran, India, Peru and Canada. And 34 of the 51 countries including Belgium, Saudi Arabia, Mexico will have peak values between 10,000 and 100,000. We also compared peak arrival time in countries whose peak values are higher than 10,000, and find that most of the countries will reach their peak value in June. The United States is the country with the highest exposure to the COVID-19 risk of the coronavirus. According to our estimation, the peak number of real-time confirmed cases of the United States is predicted to reach 2,114,000 after June 18, 2020, provided the present coping strategy is not changed. It is the only country in our prediction that the number of confirmed cases will exceed 2,000,000. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F4) Figure 4. Simulated trends of the cumulative number of confirmed cases in 51 countries or regions, which has the number of confirmed cases exceeding 5000 by May 21, 2020. The horizontal axis is the date, and the vertical axis is the number of confirmed cases. View this table: [Table 2.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/T2) Table 2. Sum of peak values of different continents Spain, Italy, and Germany monopolize one category, out of all 51 countries, due to its high risk of coronavirus. The peak number of real-time confirmed cases of Italy, Spain are respectively predicted to reach 291,000, and 298,000. The trend of COVID-19 in Italy is expanding, though efforts had been taken by the government of Italy. The other countries whose peak values are estimated to reach over 50,000, such as the Bangladesh, Pakistan, Netherlands, Chile, etc., are all in severe situation at present. ### 4.3. Geographical Distribution Analysis of the COVID-19 The estimated final epidemic scale is visualized on the world map to show the geographical characteristics of the predicted epidemic peak of confirmed cases. The cumulative number of 51 countries’ patients might finally attain 6,542,000 (95% CI: 4,772,000 – 40,735,000). According to [27], geographic proximity transmission is very prominent in epidemic transmission, which means that the regional transmission is popular. This opinion can also be confirmed on the map shown in Figure 5. The outbreak shows multiple epicenters. It can be found that North Korea is geographically close to South Korea, Japan, and China, which has the worst situation. ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F5) Figure 5. The peak value of confirmed cases in 51 countries or regions over the world, which has the number of confirmed cases exceeding 5000 by May 21, 2020. Red means the higher the diagnosis cases peak, while white means the lower the diagnosis cases peak. The choropleth map indicates that the spread of COVID-19 will form four major regional clusters. The first epicenter ranges from East Asia to Oceania, including China, South Korea, Japan, Indonesia, the Philippines, Australia, etc. Norway, Australia, Malaysia, Finland will have peak values over 7,000. The second outbreak epicenter occurs in Western Europe, including Spain, Italy, The United Kingdom, France, Germany, and many other neighboring countries, as well as Turkey. Belgium, Switzerland, Netherlands, Ireland are all expected to have peak values of over 30,000. Considering the number and density of European countries, there would be a serious cross-border phenomenon. Under the current transmission environment and control measures as of May 21, 2020, the estimated number of patients is likely to be at least 2,984,000, with 740 million population in Europe. The third epicenter was in the Middle East. According to the current trend, the peak number of confirmed cases in Iran will reach to 104,000. According to the current trend, the epidemic situation in Iran has gradually slowed down. In addition, the peak value of Saudi Arabia is estimated to be over 25,000. Estimated by the current situation, the peak of diagnosis in the Middle East will come in early May this year. The last epicenter occurred in North America. Given that the epidemic has spread in the United States (1,634,000 cases in total, as of May 21, 2020), if the current situation is maintained, the peak number of cases in the United States will reach 2,114,000, which may have a huge impact on American people’s livelihood and social economy. If the United States follows the existing policy as of May. 21, 2020, the peak arrival time of COVID-19 in the United States will be after Jun. 18, 2020. In addition, Canada is expected to have a peak value of 106,000, making it the second most affected country in North America. ### 4.4. Error Analysis of Proposed Model In this section, the normalized root mean square error (NRMSE) was used to test the accuracy of prediction level of the model. We used the confirmed data as of April 25, 2020 to estimate the data for the following week. The error between predicted value *ŷ* and real value *y* is calculated according to the equation (4.1): ![Formula][4] Figure 6 shows the visual display of the error. From a quantitative perspective, the nrmse of the target country or region is 12.85% on average when t = 7. The minimum is 0.42%, while the maximum is 36.97%. And the average error will increase with the increase of prediction time, which is mainly because the longer the time is, the less relevant information is needed for prediction, the more difficult it is, and the higher the cumulative error is. ![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F6.medium.gif) [Figure 6.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F6) Figure 6. The comparison between the predicted cumulative diagnosis data of 6 countries, including Arabia, Russia and Qatar, and the data reported by the government. The data before April 25, 2020 is used for model training, and the data in the next seven days are predicted. ## 5. CLUSTER ANALYSIS OF EPIDEMIC CHRACTERISTICS To further analyze the characteristics of epidemic transmission among different countries or regions, we took the five parameters, i.e. *α*, *β, γ*1*, γ*2*, c*, of the SEIR model obtained in the simulation as feature vectors. Then, the BIRCH (Balanced Iterative Reduction and Clustering Using Hierarchies) algorithm was used to perform clustering based on these feature vectors to retrieve countries or regions with similar epidemic patterns, see Figure 8, and Appendix 3. The average of parameters in each cluster is shown in Table 4. To visualize the proximity relationship of the epidemic characteristics of these countries or regions, t-SNE (t-distributed stochastic neighbor embedding) algorithm is used to reduce the dimension of data to a 2-dimensional plannar, while preserving sample proximity in the 5-dimentional parameter space, see Figure 8. ![Figure 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F7.medium.gif) [Figure 7.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F7) Figure 7. Clustering of the 51 countries or regions with the five SEIR parameters: *α*, *β, γ*1*, γ*2*, c*,. In this world map, white represents those countries not participating in the clustering. ![Figure 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F8.medium.gif) [Figure 8.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F8) Figure 8. Illustration of SEIR parameter-proximity, i.e. *α*, *β, γ*1*, γ*2*, c*, of the 51 countries using t-SNE dimension reduction. Owing to the proximity preservation property of t-SNE, the proximity relationship in the 5-dimensional parameter space can be visualized in the above 2-dimensional plannar. Colors indicate the clusters resulted from the clustering algorithm. View this table: [Table 4.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/T3) Table 4. Attributes list of cluster centers The number of clustering countries or regions indicates the universality or particularity of the epidemic pattern reflected by the cluster. Cluster one is represented by the United States, Italy, Spain, Germany, etc. Their ![Graphic][5] are of the highest level, 1.59. However, ![Graphic][6] was the lowest at 0.65%, which shows that the epidemic situation in such countries or regions is the most serious, but has not yet got an effective recovery level. It is necessary to remind them to take timely prevention and control measures for the flow of people and invest in medical treatment, otherwise, the epidemic will be likely to further spread. Cluster two contains only three countries: China, Austria, and Switzerland. Because of the high ![Graphic][7], their epidemic spread quickly. But the three countries are excellent at diagnosing and treating patients. ![Graphic][8] and ![Graphic][9] are the highest at 1.14% and 1.16% respectively, which makes their epidemic situation be better suppressed. For example, intervention measures such as traffic blockade and quarantine taken in time at the beginning of the outbreak in China are very effective and have successfully restrained the further growth of confirmed cases. This is also recommended for other countries to learn from. Cluster three accounts for 59% of the total number of studies, and had the lowest risk among the four clusters. ![Graphic][10] indicates the lowest level of virus transmission, while ![Graphic][11] and ![Graphic][12] are both higher than the average. Their epidemic spread is very light, and is more easily suppressed. The diagnosis rate, ![Graphic][13], is the most close to the overall average. This includes Australia, Japan, and Belgium, etc. Cluster four has only one country, South Korea. The diagnosis rate, ![Graphic][14], is the highest, which indicates the high demand for diagnosis level. Other parameters were basically very close to the overall average, although the transmission degree is on average, in order to prevent the further spread of COVID-19, the policy of intervention and medical treatment for the confirmed cases still needs to be improved. It should be noted that clustering is only used to help us find out similar epidemic patterns. It is not that any country or region is fixed as the description of the above four clusters. Figure 8 shows that some countries are at the edge of clustering. The cluster assignments of these countries might be sensitive to the estimated SEIR parameters. We have reason to believe that Peru located at the edge of Cluster Two and Cluster Three also have good epidemic control status. Canada, Ireland, Netherlands, and other countries in cluster three are very close to the first cluster, which indicates the possibility of further outbreaks of COVID-19. The high risk areas of the outbreak are concentrated in the Americas, Europe, and the Middle East (cluster one). The severity of the epidemic in South Africa is very prominent on the geographical map. On the one hand, we once again call for timely treatment and isolation measures, on the other hand, it is necessary to remind Southern African countries of the prevention of COVID-19 geographic proximity transmission. ![Figure 9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/26/2020.03.18.20038224/F9.medium.gif) [Figure 9.](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/F9) Figure 9. Frequency histograms and scatters of *α*, *β, γ*1*, γ*2*, c*, which show that there is a linear relationship between *α* and *β*, and there is a linear relationship between *γ*1 and *γ*2. This shows that countries with rapid COVID-19 transmission are also more efficient in diagnosis, so as to reduce the burden of increasing patients. China not only keep the diagnosis rate at a high level, but also effectively reduce the spreading speed of the epidemic. Most of them have adopted a more timely travel restriction and contact precaution in face with COVID-19. ## 6. CONCLUSION We performed a simulation analysis of COVID-19 using SEIR model assuming that the current policies by May 21, 2020 remain. From the analysis, we draw the following conclusions: 1. We simulated the transmission of the SARS-CoV-2 in 51 countries. The average basic reproductive number *R* of these countries was estimated to be 2.76 (95% CI: 2.57 – 2.95). 2. Thirteen countries will have a peak value over 100,000, including the United States, Russian Federation, Brazil, the United Kingdom, Spain, Italy, France, Germany, Turkey, Iran, India, Peru and Canada. And 34 of the 51 countries including Belgium, Saudi Arabia, Mexico will have peak values between 10,000 and 100,000. 3. Western Europe, Southeast Asia to Oceania, the Middle East, and North America will be the four epicenter with the most severe situation. The epidemic situation in North America and Western Europe is estimated to far exceed that in China. In addition, North America will be the worst-hit epicenter. 4. The 51 countries or regions were divided into four clusters by using clustering algorithm based on epidemiology related parameters *α*, *β, γ*1*, γ*2*, c*. The discovered four clusters represent low risk, medium risk, high risk, and effective control respectively. Based on the similarity of epidemic characteristics, we gave early warnings to many countries, including Canada, Ireland, etc. ## 7. DISCUSSION In this paper, we mainly estimated and analyzed the transmission trend of COVID-19 based on the daily updated data of confirmed cases and recovered cases. In fact, the trend of COVID-19 is related to multiple factors. China’s prior data were used in the third phase of the epidemic (see Figure 2); however, the testing policies and standards in each country and region are different, which is highly related to the governance system and safety consciousness. In addition, in the second phase of epidemic transmission, community transmission plays an important role. The intensity of people flow directly affects the infection degree of the virus. Some factors, such as subway traffic, catering facilities, and customs or habits of each region remains to be studied. All the analyses above are based on the assumption that the official statistics of confirmed cases and of recovered cases reflect the real status quo of epedemics. As a matter of fact, there may be deviations in the norms of statistical data in various countries, which may also lead to bias in our results. ## Data Availability The datasets and code used for the current study are available from the corresponding author on reasonable request. Example code used for the analysis is available on GitHub. [https://github.com/epimath/cm-dag](https://github.com/epimath/cm-dag) Abbreviations : COVID-19 : Coronavirus Disease 2019 SARS-CoV-2 : Severe Acute Respiratory Syndrome Coronavirus 2 CI : Confidence Interval SEIR : Susceptible-Exposed-Infectious-Recovered. ## Declarations ### Ethics approval and consent to participate Not applicable. ### Consent for publication Not applicable. ### Availability of data The datasets used and analyzed during the current study is available from open resources. ### Competing Interests The authors declare that they have no conflicts of interest. ### Funding This work was not supported by any funding. ### Authors’ Contributions Conceived and designed the experiments: Qinghe Liu, Junkai Zhu, Junyan Yang, and Qiao Wang. Performed the mathematical modelling: Qinghe Liu, Junkai Zhu, and Qiao Wang. Analyzed the data: Qinghe Liu, Junkai Zhu, Zhicheng Liu, and Yuhao Zhu. Collect the data: Zefei Gao, and Deqiang Li. Performed the computations: Yuanbo Tang, and Xiang Zhang. Wrote the paper: Junkai Zhu, Qinghe Liu, Deqiang Li, Liuling Zhou, Zefei Gao, Zhicheng Liu, and Yuhao Zhu. All authors read and approved the final manuscript. ## Acknowledgments Not applicable. ## Appendix 1 Data Source (All data used in the paper is public.) [https://github.com/CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19) ## Appendix 2 Simulation Results in 51 Main Countries or Regions Data used as of May 21, 2020. All countries or regions are in alphabetical order. View this table: [Table4](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/T4) ## Appendix 3 Clustering Results and the Estimated SEIR Parameters in 51 Main Countries or Regions Data used as of May 21, 2020. All countries or regions are in alphabetical order. View this table: [Table5](http://medrxiv.org/content/early/2020/05/26/2020.03.18.20038224/T5) * Received March 18, 2020. * Revision received May 25, 2020. * Accepted May 26, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/) ## Reference 1. [1].‘Definition of suspected cases of unexplained pneumonia’, the National Health Commission of the People’s Republic of China (in Chinese) /[[http://www.nhc.gov.cn/](http://www.nhc.gov.cn/)] 2. [2].Li Q, Guan X, Wu P, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia[J]. The New England Journal of Medicine, NEJM. 3. [3].‘Update on COVID-19 as of 24:00 on March 14’ (in Chinese)/[[http://www.nhc.gov.cn/](http://www.nhc.gov.cn/)] 4. [4].‘WHO Director-General’s opening remarks at the media briefing on COVID-19 - 11 March 2020’/[[https://www.who.int/](https://www.who.int/)] 5. [5].WHO, Novel coronavirus (2019-nCoV) situation report 54. World Health Organization, 2020. /[[https://www.who.int/](https://www.who.int/)] 6. [6].‘WHO Director-General’s opening remarks at the media briefing on COVID-19 - 13 March 2020’/[[https://www.who.int/](https://www.who.int/)] 7. [7].WHO, Novel coronavirus(2019-nCoV) situation report. World Health Organization, 2020. /[[https://www.who.int/](https://www.who.int/)] 8. [8].Li Y, Liang M, Yin X, et al. COVID-19 Epidemic Outside China: 34 Founders and Exponential Growth. medRxiv 2020. 9. [9].Zhuang Z, Zhao S, Lin Q, et al. Preliminary estimating the reproduction number of the coronavirus disease (COVID-19) outbreak in Republic of Korea and Italy by 5 March 2020. medRxiv 2020. 10. [10].Bandoy DJDR, Weimer BC. Pandemic dynamics of COVID-19 using epidemic stage, instantaneous reproductive number and pathogen genome identity (GENI) score: modeling molecular epidemiology. medRxiv 2020. 11. [11]. Cody Carroll, Satarupa Bhattachaijee, Yaqing Chen, Paromita Dubey, Jianing Fan, Alvaro Gajardo, Xiner Zhou, Hans-Georg Muller, Jane-Ling Wang. Time Dynamics of COVID-19, medRxiv, 2020. 12. [12]. Alessio Notari, Giorgio Torrieri. COVID-19 transmission risk factors. medRxiv, 2020. 13. [13]. Andrey Gerasimov, Georgy Lebedev, Mikhail Lebedev, Irina Semenycheva. COVID-19 DYNAMICS: A HETEROGENEOUS MODEL. medRxiv, 2020. 14. [14].Sina F. Ardabili, Amir Mosavi, Pedram Ghamisi, Filip Ferdinand, Annamaria R. Varkonyi-Koczy, Uwe Reuter, Timon Rabczuk, Peter M. Atkinson. COVID-19 Outbreak Prediction with Machine Learning. medRxiv, 2020. 15. [15].Kevin L, Mathias P, Ellen K. The reproduction number of COVID-19 and its correlation with public health interventions. medRxiv, 2020. 16. [16].Zhan C, Tse CK, Lai Z, et al. Prediction of COVID-19 Spreading Profiles in South Korea, Italy and Iran by Data-Driven Coding. medRxiv 2020. 17. [17].Zheng Z, Wu K, Yao Z, et al. The Prediction for Development of COVID-19 in Global Major Epidemic Areas Through Empirical Trends in China by Utilizing State Transition Matrix Model. medRxiv 2020. 18. [18].Li L, Yang Z, Dang Z, et al. Propagation analysis and prediction of the COVID-19. medRxiv 2020. 19. [19].Athanasios SF, Nikolaos D, George AK. COVID-19: Predictive Mathematical Models for the Number of Deaths in South Korea, Italy, Spain, France, UK, Germany, and USA. medRxiv, 2020. 20. [20]. Qinghe Liu, Zhicheng Liu, Deqiang Li, Zefei Gao, Junkai Zhu, Junyan Yang, Qiao Wang, Assessing the Tendency of 2019-nCoV (COVID-19) Outbreak in China, medRxiv, 2020. 21. [21].Aron JL, Schwartz IB. Seasonality and period-doubling bifurcations in an epidemic model[J]. Journal of theoretical biology, 1984, 110(4): 665–679. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0022-5193(84)80150-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=6521486&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F26%2F2020.03.18.20038224.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1984TT57500010&link_type=ISI) 22. [22].Li MY, Muldowney JS. Global stability for the SEIR model in epidemiology[J]. Mathematical biosciences, 1995, 125(2): 155–164. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0025-5564(95)92756-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7881192&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F26%2F2020.03.18.20038224.atom) 23. [23].Li MY, Smith HL, Wang L. Global dynamics of an SEIR epidemic model with vertical transmission[J]. SIAM Journal on Applied Mathematics, 2001,62(1):58–69. 24. [24].Hethcote HW. The Mathematics of Infectious Diseases[J]. SIAM Review, 2000, 42(4):599–653. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1137/s0036144500371907&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00016567&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F26%2F2020.03.18.20038224.atom) 25. [25].Dietz, K. The estimation of the basic reproduction number for infectious diseases[J]. Statistical Methods in Medical Research, 1993, 2(1):23–41. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/096228029300200103&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8261248&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F26%2F2020.03.18.20038224.atom) 26. [26]. G. Chowell, P.W. Fenimore, M.A. Castillo-Garsow, C. Castillo-Chavez. SARS outbreaks in Ontario, Hong Kong and Singapore: the role of diagnosis and isolation as a control mechanism [J]. Journal of Theoretical Biology, 2003, 224(1): 1–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0022-5193(03)00228-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12900200&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F26%2F2020.03.18.20038224.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000184849200001&link_type=ISI) 27. [27].Riley S, Fraser C, Ca. D, et al. Transmission dynamics of the etiological agent of SARS in Hong Kong: Impact of public health interventions[J]. Science, 2003, 300(5627):p. 1961-1966. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMDAvNTYyNy8xOTYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDUvMjYvMjAyMC4wMy4xOC4yMDAzODIyNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) [1]: /embed/graphic-1.gif [2]: /embed/graphic-5.gif [3]: /embed/graphic-6.gif [4]: /embed/graphic-11.gif [5]: /embed/inline-graphic-1.gif [6]: /embed/inline-graphic-2.gif [7]: /embed/inline-graphic-3.gif [8]: /embed/inline-graphic-4.gif [9]: /embed/inline-graphic-5.gif [10]: /embed/inline-graphic-6.gif [11]: /embed/inline-graphic-7.gif [12]: /embed/inline-graphic-8.gif [13]: /embed/inline-graphic-9.gif [14]: /embed/inline-graphic-10.gif