Analysis of the Number of Tests, the Positivity Rate, and Their Dependency Structure during COVID-19 Pandemic
=============================================================================================================

* Babak Jamshidi
* Hakim Bekrizadeh
* Shahriar Jamshidi Zargaran
* Mansour Rezaei

## Abstract

**Background** Applying recent advances in medical instruments, information technology, and unprecedented data sharing into COVID-19 research revolutionized medical sciences, and causes some unprecedented analyses, discussions, and models.

**Methods** Modeling of this dependency is done using four classes of copulas: Clayton, Frank, Gumbel, and FGM. The estimation of the parameters of the copulas is obtained using the maximum likelihood method. To evaluate the goodness of fit of the copulas, we calculate AIC. All computations are conducted on Matlab R2015b, R 4.0.3, Maple 2018a, and EasyFit 5.6, and the plots are created on software Matlab R2015b and R 4.0.3.

**Results** As time passes, the number of tests increases, and the positivity rate becomes lower. The epidemic peaks are occasions that violate the stated general rule –due to the early growth of the number of tests. If we divide data of each country into peaks and otherwise, about both of them, the rising number of tests is accompanied by decreasing the positivity rate.

**Conclusion** The positivity rate can be considered a representative of the level of the spreading. Approaching zero positivity rate is a good criterion to scale the success of a health care system in fighting against an epidemic. We expect that if the number of tests is great enough, the positivity rate does not depend on the number of tests. Accordingly, the number and accuracy of tests can play a vital role in the quality level of epidemic data.

**Key messages**

*   - In a country, increasing the positivity rate is more representative than increasing the number of tests to warn about an epidemic peak.

*   - Approaching zero positivity rate is a good criterion to scale the success of a health care system in fighting against an epidemic.

*   - Except for the first half of the epidemic peaks, in a country, the higher number of tests is associated with a lower positivity rate.

*   - In countries with high test per million, there is no significant dependency between the number of tests and positivity rate.

Keywords
*   Dependence
*   Number of tests
*   Copula
*   Positivity
*   Peak
*   Correlation

## Introduction

Dr. Li Wenliang, a 34-year-old ophthalmologist, warned his colleagues and set the alarm to the society about a new infection caused by a type of coronavirus in December 2019 in Wuhan, China [1]. Shortly after his warning, all over the world encountered this epidemic. WHO declared this fast speeding infection (COVID-19) in March 2020. As of January 27, 2021, over 100 million cases, and around 2200 K deaths involving COVID-19 have been reported around the world.

The epidemic COVID-19 is the most informative pandemic throughout history. These unprecedented recorded data give rise to some unprecedented concepts, relationships, analyses, discussions, and models [2]. Modeling the dependence between the number of tests and the proportion of positivity (positivity rate) is one of these new issues.

The proportion of positivity is a critical measure because it gives us an indication of how widespread infection is in the area of interest. The proportion of positivity helps public health officials answer questions such as:

*   - What is the current level of SARS-CoV-2 (coronavirus) transmission in the community?

*   - Are we doing enough testing for the people who are getting infected? [3]

According to the ratio nature, the high proportion of positivity is due to the high number of positive tests or the low number of total tests. Based on the first possibility, a higher positivity rate suggests higher transmission and that there are likely more people with coronavirus in the community who have not been tested yet. On the other hand, according to the second possibility, a high percentage of positivity means that more testing should probably be done. Accordingly, for policymakers, the high value for this parameter suggests either it is not a good time to relax restrictions aimed at reducing transmission, or it is a good time to add restrictions to slow the spread of disease [3]. In this regard, an analytic report segregated by regions in the UK was presented by the Office for National Statistics [4].

This study aims to investigate the time series of positivity rates individually and together with the time series of the number of tests. This investigation is conducted in two analytic methods: regional and temporal. The individual analysis is mainly undertaken based on the peaks of the spreading of the pandemic (Table 3). For the regional aspect, among the 221 countries, we selected twelve countries: the USA, India, the UK, Italy, Iran, the UAE, Bolivia, Guatemala, Nigeria, Australia, South Korea, and South Africa. The reasons for selecting these twelve countries are They are the top countries in the influential indices (Table 1).

View this table:
[Table 1.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T1)

Table 1. The information on the influential indicators of COVID-19 in the twelve countries of interest

*   - Some of them are widely different from the others in some indices (Table 1).

*   - Their positivity rates are greatly dispersed (Table 2).

*   - The numbers and time of peaks are different about them (Table 3).

*   - Their quality of health care systems are of different levels.

*   - Their data, especially about the number of tests are relatively well recorded.

*   - They are selected from all continents: the USA and Guatemala from North America, Bolivia from South America, India, Iran, the UAE, and South Korea from Asia, the UK and Italy from Europe, Nigeria and South Africa from Africa, and Australia from Australia.

View this table:
[Table 2.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T2)

Table 2. The properties of the datasets of the countries of interest

Finally, to illustrate the dependency of the number of tests and the positivity rates, we apply copulas.

Sklar introduced the concept of copulas in 1959 [5]. A copula –mainly parametric, partially semi-parametric, and rarely non-parametric-is a function that completely describes the dependency structure. It contains all the information to link the marginal distributions to their joint distribution. Accordingly, to obtain a valid multivariate distribution function, it suffices to combine several marginal distribution functions with any candidate for the copula function. Thus, for the purposes of statistical modeling, it is desirable to have a large collection of copulas at one’s disposal. Copula is widely applied in diverse fields, including environmental studies [6 –7] finance [8 –9], hydrology [10], and medical studies [11 –16].

### Data

The main data sources of the paper are the website Worldometers [17] and Our World in Data [18]. We summarize and illustrate all the relevant information about the twelve countries in three (twelve-row) tables and three (twelve-partitioned) figures created on Matlab R2015b.

Table 1 includes the key general indicators up to January 25, 2021. It is worth saying that the total indicators or even per-million indicators do not determine the quality of health care systems because there are observable underreported statistics about the countries Bolivia, Guatemala, Nigeria, Iran, and even India. Despite the mentioned reality, we consider the indicator of the number of tests per one million (the 7th column of Table 1) as a criterion representing the level of facilities, therefore the quality of health care systems. Based on the information about this criterion, we define the lags (the distance between the test and diagnosis) for the different health care systems.

Table 2 represents the underlying properties of any country. As mentioned before, lag is the difference between the time of testing and the time of receiving the results of the tests, positive or negative, in days. The more facilities a health care system has, the more tests that system can do –therefore the lower positivity rate it has. Also, the more facilities a health care system has, the lower distance is between the tests and results. Based on the concept of lag, we pair the number of tests on the *n*th day with the number of results on the (*n* + *lag*)th day to obtain the dependency structure by using the copulas. The last column is calculated based on the start date and end date of the period of recording data (the fourth and fifth columns) and the lag (the sixth column), and it displays the number of pairs that we use to obtain the dependency structure for each country.

Generally, during an epidemic wave, the number of new infected individuals increases rapidly to an epidemic peak and then falls more gradually until the epidemic wave is over, and the number of new cases be stabilized. Roughly speaking, the epidemic peaks are the -neighborhood of-time points that corresponds the local maximum of the number of newly infected cases.

### Change point detection

We define the epidemic peak as the time neighborhood -or the time point-that *X**t*:the number of new confirmed cases on the *t*th day, exceeds the mean plus three times standard deviation of the last three weeks for at least a week, that is, ![Formula][1]</img>  These epidemic peaks are local maximums. In addition, it is noticeable that the distance between two successive epidemic peaks must be at least one month. This definition is derived from the definition of outlier in regression analysis. According to this definition, the peaks of Table 3 are obtained for the countries of interest. It is remarkable that except for the peaks of Bolivia –which are almost the same-, the later waves are more acute than previous ones. We must add this point that the more acute peak means the more number of new confirmed cases, therefore the more intense spreading. Finally, it is possible that because of the lack of information at the beginning, this definition misses the epidemic peaks in the initial days.

View this table:
[Table 3.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T3)

Table 3. The epidemic peaks of COVID-19 in the countries of interest

View this table:
[Table 4.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T4)

Table 4. Kendall’s tau of copula function

Mathematically and logically, the number of positive tests (confirmed cases) is affected by the number of tests and positivity rate. The number of cases equals the number of tests multiplied by the positivity rate. Therefore, the increment of the number of cases (as a multiplication) equals the sum of these two:

*   - The number of tests multiplied by the increment of positivity rate, and

*   - The positivity rate multiplied by the increment of the number of tests.

Consequently, the intense changes in the count of cases are due to at least a remarkable change in one of these multiplications. About the countries with a regular increase in the number of tests like the USA, the increment of the proportion of positivity plays the principal role in the peaks.

Table 3 shows that the proportion of positivity is significantly better than the frequency of tests to indicate the peaks of the pandemic. The positivity rate is more associated with the number of cases than the number of tests (90% versus 45%). After moving average, these proportions reach 100% and 50%, respectively.

Countries of the southern and northern hemispheres faced a peak around July and November, respectively, possibly due to falling temperatures.

Figures 1, 3, and 5 consist of twelve subfigures, each of them belonging to one country. The arrangement of the subfigures in all three figures is identical. The horizontal axes in Figures 1 and 3 represent time in days from the start to the end of the period of study for the studied countries (the fourth and fifth columns of Table 2). The vertical axis of Figures 1 and 3 display the number of new tests –conducted on that day-and the proportion of positive tests –reported that day-, respectively. Figure 5 is the plot of the joint distribution of the number of tests on a day and the proportion of positivity on the *lag* days later.

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/23/2021.04.20.21255796/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/F1)

Figure 1. 
The time series of the number of new tests (daily) USA (r1 c1), India (r1 c2), UK (r2 c1), Italy (r2 c2), Iran (r3 c1), UAE (r3 c2), Bolivia (r4 c1), Guatemala (r4 c2), Nigeria (r5 c1), Australia (r5 c2), South Korea (r6 c1), and South Africa (r6 c2) r : row & c : column

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/23/2021.04.20.21255796/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/F2)

Figure 2. 
The time series of the number of new tests (daily)

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/23/2021.04.20.21255796/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/F3)

Figure 3. 
The time series of the positivity rate (daily) USA (r1 c1), India (r1 c2), UK (r2 c1), Italy (r2 c2), Iran (r3 c1), UAE (r3 c2), Bolivia (r4 c1), Guatemala (r4 c2), Nigeria (r5 c1), Australia (r5 c2), South Korea (r6 c1), and South Africa (r6 c2) r : row & c : column

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/23/2021.04.20.21255796/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/F4)

Figure 4. 
The time series of the positivity rate (daily)

![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/23/2021.04.20.21255796/F5.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/F5)

Figure 5. 
Scatterplots of the relationship between the number of tests and the positivity rate USA (r1 c1), India (r1 c2), UK (r2 c1), Italy (r2 c2), Iran (r3 c1), UAE (r3 c2), Bolivia (r4 c1), Guatemala (r4 c2), Nigeria (r5 c1), Australia (r5 c2), South Korea (r6 c1), and South Africa (r6 c2)

Figure 1 shows that the peaks of the number of tests coincide with the epidemic peaks of COVID-19 in different countries. For example, in the USA, there are two peaks of the number of tests simultaneous with the second and the third epidemic peaks –mentioned in Table 3-. Also, it is obvious that Bolivia has experienced two peaks for the number of tests around 150th and 300th days -from March 19, 2020-which coincide with the epidemic peaks in Table 3.

The USA, the UK, and the UAE experienced some regularly rising time series. Except for some overruns in epidemic peaks, the patterns of Italy and South Africa are increasing too. The number of tests in Guatemala is increasing, accompanied by an increasing fluctuation. Owing to the restriction by the limited capacity of tests, Iran and Nigeria followed a stepwise trend. Apart from the peaks, one for each of them, the plots of Australia and South Korea are stationary. In the case of Bolivia, the time series is proportional to the peaks. India is the only country whose time series is initially increasing, then stable, and after that decreasing. Generally, the counties have an increasing trend.

Figure 2 gives us a clustering about the countries from the viewpoint of the number of tests: 1. The USA, 2. India, 3. The UK, 4. Italy, Australia, and the UAE, 5. South Korea, South Africa, and Iran, and 6. Nigeria, Guatemala, and Bolivia.

Figure 3 illustrates the time series of the positivity rate of the tests (the ratio of the number of positive tests on a day to the number of taken tests on *lag* days ago). It is interesting that the subfigures of Figure 3 are more in accordance with the epidemic peaks than their analogous in Figure 1. For example, it is clear that the USA has encountered three peaks. It is worthwhile that the graph of Iran has three peaks while the first of them is missed in Table 3 because of the lack of information at the beginning. A similar situation (being missed by investigation of either the number of tests or the number of confirmed cases while discovered by the analysis of the positivity rate) happens to the epidemic peak in India in late March, the first and the second peaks of the UK, and the epidemic peaks in middle May and the November for the UAE.

Figure 4 illustrates a clustering of the countries based on the positivity rate: 1. Nigeria, Guatemala, and Bolivia, 2. South Africa, and Iran, 3. The USA, India, the UK, and Italy, and 4. Australia, the UAE, and South Korea.

The horizontal and vertical axes of Figure 5 display the number of new tests and the proportion of positivity of them, respectively. Generally, as the number of new tests increases, the positivity rate falls. Since the epidemic peaks are opposing this general rule, it is not very clear to see the opposite direction of the changes. Guatemala, due to lack of epidemic peak, is a good example of this diversely proportional relationship.

If the reason for an increase be the rising number of tests, we expect not to return the previous channel in short term. In addition, the positivity rate does not undertake a remarkable change. On the other hand, it is normal to assume that entering a peak is accompanied by increasing the number of negative tests as well. Consequently, the lack of the growth of negative test results (rising the positivity rate while continuing the previous trend for the frequency of tests) is only reasonable if at least one of the factors of tests accuracy, testing policy, or the viewpoint of the population were changed around that time. Otherwise, there are a remarkable number of un-reported cases belonging the peak. It is noticeable that this company of risings causes the observed acceleration in growth regarding epidemic peaks.

## Methods

### Copulas

Copulas are functions that connect multivariate distribution functions to their one-dimensional marginal distribution functions -uniform on the interval [0,1]. Mathematically speaking, if *H* is a bivariate distribution function with margins *F*(*X*) and *G*(*Y*), there must exist a copula *C* such that *H**θ* (*X, Y*) = *C*(*F*(*X*), *G*(*Y*);*θ*), where *θ* is introduced as the dependence parameter [5]. Accordingly, Copula is mostly defined as a function *C* : [0,1]2 → [0,1] that satisfies boundary conditions:

(P1) *C*(*x*,0) = *C*(0, *x*) = 0 and *C*(*x*,1) = *C*(1, *x*) = *x*, ∀*x* ∈[0,1],

(P2) ∀(*s*1, *s*2, *t*1, *t*2)∈[0,1]4, such that *s*1 ≤ *s*2 and *t*1 ≤ *t*2, ![Formula][2]</img>  Eventually, for twice differentiable function *C*, 2-increasing property (P2) can be replaced by the condition ![Formula][3]</img>  , where *c*(*s,t*) is the so-called copula density. A copula *C* is *symmetric* if *C*(*s,t*) = *C*(*t, s*)

, for every(*s,t*) ∈[0,1]2, otherwise *C* is asymmetric. The most well-known, powerful, and applicable copulas are:

*   - FGM copula [19-20];

![Formula][4]</img> 
*   - Clayton copula [21];

![Formula][5]</img> 
*   - Frank copula [22];

![Formula][6]</img> 
*   - Gumbel copula [23];

![Formula][7]</img>  The parameters of the marginal and copula distributions are estimated using the maximum likelihood method. The computations and illustrations regarding copula theory are conducted in software Maple, R, and R 4.0.3, Maple 2018a, and EasyFit 5.6.

### Copula vs Correlation Coefficient

Measures of dependence are common instruments to summarize a complicated dependence structure in the bivariate case. Pearson’s, Spearman’s rho, and Kendall’s tau correlation coefficients are common statistical measures of dependence structure [24-26]. The correlation comes in trouble when the random variables are not elliptically distributed. The performance of the copula does not depend on the fact that if you are dealing with elliptical distributions or not. The Pearson’s linear correlation measure (−1≤ *r* ≤1) is the most popular and well-known measure between pairwise random variables. Despite its simplicity and plain rationale, Embrechts et al. [27] noted that *ρ* is simply a measure of the dependency of elliptical distributions, such as the binormal distribution (the marginals are normally distributed, linked by the Gaussian copula). Moreover, *ρ* measures a linear relationship itself and does not capture a non-linear one on its own, as noted in [28]. These properties constitute obvious limitations for modeling the dependency structure. In addition, copulas could be useful to define nonparametric measures of dependence between random variables. Since the values of Kendall’s tau are easy to calculate, this measure is used for observation dependencies. If *F*(*X*) and *G*(*Y*) are continuous then *C*(*s,t*) is unique, else *C*(*s,t*) is uniquely determined on the range of *F*(*X*) × range of *G*(*Y*).

One standard non-parametric dependence measures Kendall’s *τ* *k* is expressed in the copula form as: ![Formula][8]</img>  The parameter copula is estimated and the relationship between parameter copula and *τ* *k* is given in the last column of Table 1. The parameter copula in each case measures the degree of dependence and controls the association between two variables. When the parameter approaches 0 there is no dependence, and if the parameter tends to infinity there is a perfect dependence. Schweizer and Wolff [29] showed that the dependence parameter copula, which characterizes each family of copulas can be related to Kendall’s *τ**k*. Therefore, copulas allow modeling both linear and non-linear dependence. Using copulas, regardless of marginal distributions, can model extreme endpoints.

### Copula vs Regression

Regression analysis is a statistical method for investigating the relationships between some dependent variables and some independent variables. The basic form of the regression analysis, ordinary least squares is not suitable for some applications because the relationships are often nonlinear and the probability distribution of the response variable may be non-Gaussian.

The major advantage of copula regression is that there are no restrictions on the probability distributions that can be used. The copula regression is the most appropriate method in non-Gaussian (no need for normality assumption) regression model fitting. Copula functions, connecting the marginal distributions to their joint distributions, are useful in simulating the linear or nonlinear relationships among multivariate data. Copula is a multivariate distribution function with marginally uniform random variables on [0, 1] (the PDF of the CDF). Copula functions have some appealing properties such as they allow scale-free measures of dependence and are useful in constructing families of joint distributions.

## Results

The presumptions to apply copula theory for a couple of variables are the existence of continuous marginal distributions accompanied with their correlation. Table 5 investigates whether the pair of the frequency of the tests and positivity rate meets the presumptions. The marginal distributions were obtained in EasyFit. It is observable that the generalized Pareto and Weibull distributions had good performance to fit the positivity rates. It is observable that the correlation in countries with the highest number of tests is negative and it is commonly between −0.2 and - In countries lacking enough tests, the correlation coefficient is significantly greater –possibly due to the low quality of data and under-reporting. It is noticeable that calculation over the data of Bolivia, Iran, and South Africa, lead even to positive correlations.

View this table:
[Table 5.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T5)

Table 5. The results of fit distribution to data

Based on Table 5, we are allowed to look for the suitable copula functions to connect the marginal distributions to find the desired joint distributions for nine of the countries. Notice that the countries without meaningful correlation (Italy, South Korea, and the UAE) were of the countries with the least proportion of positivity of the tests. These countries have involved with tracing the infected cases.

Table 6 represents the results of comparing the best candidates from the FGM, Clayton, Frank, and Gumbel families.

View this table:
[Table 6.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T6)

Table 6. The obtained copula to fit the dependency and their performances

According to Table 6, Clayton copulas are suitable candidates for the countries with low tests per million. In addition, Frank copulas can describe a wide variety of countries. Finally, the Gumbel family seems not to be a good option to couple the variables of the frequency of tests and the positivity rate.

We now discuss the simulation of data from the obtained copula models and perform comparisons between correlations in the simulated data and in the observed data based on 1000 simulations. We follow the simulation method proposed by Johnson (1987, Ch.3) and later Nelson (2006, page 41).

Figure 6 illustrates the scatter plots of the transformed observed data versus simulated samples of the CDFs of the frequency of tests and positivity proportion variables taken from the fitted copula models in Table 6. It can be seen that the simulated data and the original data have similar dependence patterns. To settle this concern, Table 6 shows the rank correlations between the frequency of tests and positivity proportion variables calculated from the original data and the simulated data of size 1000 taken from the fitted copula models. By comparing these correlations, we can conclude that the results show strong consistency of the estimated and real correlations.

![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/23/2021.04.20.21255796/F6.medium.gif)

[Figure 6.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/F6)

Figure 6. 
Scatter plots of the transformed observed values (•) versus simulated samples (∗) variables from subfamilies of the copula model USA (r1 c1), India (r1 c2), UK (r1 c3), Iran (r2 c1), Bolivia (r2 c2), Guatemala (r2 c3), Nigeria (r3 c1), Australia (r3 c2), and South Africa (r3 c3) r : row & c : column

Finally, we want to investigate the structure of dependency between the number of tests and positivity rate totally. By collecting the data of the twelve countries, 3877 pairs are obtained whose Kendall’s correlation is −0.1434 (P-value: 2.8464*10^ −19). In addition, we split the data into two parts: peaks and otherwise. This split restricted us to applying marginal distributions –then copulas_ because it causes the gap in the number of tests. Table 7 represents the Kendall’s correlations for the countries of interest. It is worth saying that the correlation coefficient for the variables (the number of tests and positivity rate) is negative in both peaks and otherwise.

View this table:
[Table 7.](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T7)

Table 7. The correlation between the number of tests and the positivity rate regarding all countries separated based on the peaks

## Discussion

Generally, at the beginning of an epidemic, the number of tests is low and the proportion of positivity is high. As time passes, the number of tests rises. Also, as the number of new tests increases, the positivity rate falls. The correlation in countries with high number of tests, higher quality of data, is negative and it is commonly between −0.2 and −0.3. By considering all the data as a set, the Kendall’s coefficients are −0.1434, −0.2132, and −0.1381 for total, peaks, and total after removing peaks, respectively. The positivity rate of the tests is significantly better than the frequency of tests to indicate the peaks of the pandemic. The positivity rate is more associated with the number of cases than the number of tests (90% versus 45%).

The proportion of positivity is more proportional than the number of tests to the number of infected cases. Approaching zero positivity rate is a good criterion to scale the success of a health care system in fighting against an epidemic. The number and accuracy of tests can play a vital role in the quality level of epidemic data. The policymakers can consider the factors affecting the positivity rate such as the testing policy, restricted facilities, peaks, fluctuations, and so on, and make decisions to prevent misleading because of them.

The first limitation is the low quality of data for some countries because of the restricted facilities, the low number of tests, and non-organized data collection program. Also, some interpolation and moving average methods were applied to find some missing data regarding the countries of interest and calculating the correlation for the countries with poor data. Out of the twelve countries, Iran, South Africa, Nigeria, Bolivia, and Guatemala have been restricted by the number of tests. The data of Italy, the UAE, and South Korea showed no significant correlation. The lack of dependency is a good criterion to show that there is no shortage of facilities. The highest quality and most significant correlations belong to the USA, India, the UK, and Australia.

The present approach using copulas is promising since it allows to take into account a wide range of correlation, frequently observed in medical. In fact, the classical multivariate models cannot reproduce all type of correlations. Moreover, the standard models are limited, especially because the choice of the marginal distributions is restricted. The crucial step in the modeling process is the choice of the copula function, which best fits the data. Further work is needed to choose the best copulas able to reproduce the dependence structure of bivariate medical variables. In clinical trials or medical studies, sample size is often an important consideration and is relatively small. The copula-based methodology overcomes this limitation as well, because the algorithm can be used to replicate data for any number of patients. The suggested copula-based methodology presented in this paper is simple and easy to implement.

## Data Availability

The main data can be found on https://www.worldometers.info/coronavirus/countries and https://ourworldindata.org/coronavirus-testing

## Ethics declarations

### Declaration of interest statement

The authors declare that they have no conflict of interest.

### Ethical statement

The methodology for this study was approved by the Human Research Ethics committee of the Kermanshah University of Medical Sciences.

### Informed consent

It is not applicable. This study did not deal with human and animal subjects.

### Consent on publication

This is not applicable. The manuscript includes no case study.

## Funding

There was no specific funding for this study.

## Contributorship

**BJ:** Idea, Literature, Data, Methods, Programming, Interpretation, First draft. **HB:** Literature, Methods, Interpretation, Revision. **SJZ:** Data, Literature, Programming. **MR:** Design, Final manuscript.

## Data availability

View this table:
[Table8](http://medrxiv.org/content/early/2021/04/23/2021.04.20.21255796/T8)

## Acknowledgment

We are grateful to Azad Sheikhi for helping us to write better in English.

## Footnotes

*   bekrizadeh{at}pnu.ac.ir, shahriarjamshidy{at}gmail.com, mrezaei{at}kums.ac.ir

*   Received April 20, 2021.
*   Revision received April 20, 2021.
*   Accepted April 23, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/)

## References

1.  1.Jamshidi B, Rezaei M, Jamshidi Zargaran S et al. Mathematical modeling the epicenters of coronavirus disease-2019 (COVID-19) pandemic. Epidemiologic Methods 2020; 9(1), 20200009. [https://doi.org/10.1515/em-2020-0009](https://doi.org/10.1515/em-2020-0009)
    
    
2.  2.Jamshidi B, Bekrizadeh H, Jamshidi Zargaran S et al. Comparing Length of Hospital Stay during COVID-19 Pandemic in the USA, Italy, and Germany, International Journal for Quality in Health Care 2021
    
    
3.  3.Dowdy D, D’souza G. Covid-19 testing: understanding the “percent positive”, august 10, 2020, [https://www.jhsph.edu/covid-19/articles/covid-19-testing-understanding-the-percent-positive.html](https://www.jhsph.edu/covid-19/articles/covid-19-testing-understanding-the-percent-positive.html)
    
    
4.  4.Coronavirus (COVID-19) Infection Survey, UK: 8 January 2021, [https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/8january2021](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/8january2021)
    
    
5.  5.Sklar A. Fonctions de répartition à n dimensions et leurs marges, Publications de L’Institute Statistical University Paris 1959; 8:229–231.
    
    
6.  6.Corbella S, Stretch DD. Simulating a multivariate sea storm using Archimedean copulas, Coastal Engineering 2013; 76: 68–78, [https://doi.org/10.1016/j.coastaleng.2013.01.011](https://doi.org/10.1016/j.coastaleng.2013.01.011)
    
    
7.  7.Zhang L, Singh VP. Bivariate rainfall frequency distributions using Archimedean copulas. Journal of Hydrology 2007; 332: 93–109.
    
    
8.  8.Wang GJ, Xie C, Zhang P, Han F, Chen S. Dynamics of Foreign Exchange Networks: A Time-Varying Copula Approach, Discrete Dynamics in Nature and Society 2014, 170921: [https://doi.org/10.1155/2014/170921](https://doi.org/10.1155/2014/170921)
    
    
9.  9.Boubaker H, Sghaier N. Portfolio optimization in the presence of dependent financial returns with long memory: A copula based approach, Journal of Banking & Finance 2013; 37(2): 361–377.
    
    
10. 10.Bekrizadeh H, Parham GA, Zadkarmi MR. Weighted Clayton Copulas and their Characterizations: Application to Probable Modeling of the Hydrology Data, Journal of Data Science 2013; 11: 293–303.
    
    
11. 11.Wienke A, Frailty models in survival analysis, Chapman & Hall/CRC biostatistics series, 2011,
    
    
12. 12.Roman M, Louzada F, Cancho VG, Leite JG. A new long-term survival distribution for cancer data [Internet]. Journal of Data Science 2012; 10(2): 241–258: [http://www.jds-online.com/volume-10-number-2-april-2012](http://www.jds-online.com/volume-10-number-2-april-2012)
    
    
13. 13.Li X, Fang R. A new family of bivariate copulas generated by univariate distributions. Journal of Data Science 2012; 10: 1–17.
    
    
14. 14.Bekrizadeh H, Jamshidi B. A New Class of Bivariate Copulas: Dependence Measures and Properties. Metron 2017; 75:31–50.
    
    
15. 15.Bekrizadeh H, Parham GA, Zadkarmi MR. A new asymmetric class of bivariate copulas for modeling dependence, Communications in Statistics— Simulation and Computation 2017; 46(7): 5594–5609.
    
    
16. 16.Bekrizadeh H. Generalized Family of Copulas: Definition and Properties, Thailand Statistician 2021; 19(1): 163–178.
    
    
17. 17.Worldometer website: [https://www.worldometers.info/coronavirus/#countries](https://www.worldometers.info/coronavirus/#countries)
    
    
18. 18.Hasell J, Mathieu E, Beltekian D, et al. A cross-country database of COVID-19 testing. Sci Data 2020; 7, 345: [https://ourworldindata.org/coronavirus-testing](https://ourworldindata.org/coronavirus-testing)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41597-020-00688-8&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F23%2F2021.04.20.21255796.atom) 

19. 19.Farlie DGJ. The performance of some correlation coefficients for a general bivariate distribution, Biometrika 1960; 47: 307–323.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/47.3-4.307&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1960XF49700009&link_type=ISI) 

20. 20.Morgenstern D. Einfache beispiele zweidimensionaler verteilungen. Mitteilungsblatt fürMathematische Statistik 1956; 8: 234–235.
    
    
21. 21.Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 1978; 65 (1): 141–151. doi:10.1093/biomet/65.1.141
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/65.1.141&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1978ET58800020&link_type=ISI) 

22. 22.Genest C. Frank’s family of bivariate distributions. Biometrika 1987; 74:549–555
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/74.3.549&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1987J985100010&link_type=ISI) 

23. 23.Gumbel EJ. Bivariate exponential distributions, J. Am. Stat. Assoc. 1960; 55: 698–707.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2281591&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1960WX10300006&link_type=ISI) 

24. 24.Pearson K. Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London 1895; 58: 240–242.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rspl.1895.0041&link_type=DOI) 

25. 25.Spearman C. The proof and measurement of association between two things. American Journal of Psychology 1904; 15 (1): 72–101. [https://doi.org/10.2307/1412159](https://doi.org/10.2307/1412159)
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/1412159&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=3322052&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F23%2F2021.04.20.21255796.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000200130600005&link_type=ISI) 

26. 26.Kendall MG. Rank Correlation Methods. London: Griffin, 1970.
    
    
27. 27.Embrechts P, Lindskog F, McNeil A. Modelling dependence with copulas and applications to risk management. Department of Mathematics, ETH Zurich, 2001.
    
    
28. 28.Priest C. Correlations: what they mean and more importantly what they do not mean. The Institute of Actuaries of Australia Biennial Convention, 2003.
    
    
29. 29.Schweizer B, Wolff EF. On nonparametric measures of dependence for random variables, Annals of Statistics 1981; 9: 879–885.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1214/aos/1176345528&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1981MC70500017&link_type=ISI)

 [1]: /embed/graphic-3.gif
 [2]: /embed/graphic-11.gif
 [3]: /embed/graphic-12.gif
 [4]: /embed/graphic-13.gif
 [5]: /embed/graphic-14.gif
 [6]: /embed/graphic-15.gif
 [7]: /embed/graphic-16.gif
 [8]: /embed/graphic-17.gif