Abstract
Opisthorchiasis is an overlooked danger to Southeast Asia. High-resolution disease risk maps are critical for targeting control interventions to priority areas; however, they haven’t been available for Southeast Asia. A Bayesian spatial-temporal joint model was developed, where both point-referenced and area-level data were assumed to follow a binomial distribution, modeling with a logit regression in combination of potential influencing factors and spatial-temporal random effects. The model-based risk mapping identified area of high prevalence (>25%) of opisthorchiasis in northeastern Thailand, central-southern parts of Laos, and some eastern areas of Cambodia from 2010 onwards. And a total of 13.13 million (95% BCI: 11.43-15.21) people were infected with O. viverrini in 2018 in four major endemic countries (ie, Thailand, Laos, Cambodia, and Vietnam) of Southeast Asia, with a prevalence of 6.96% (95% BCI: 6.06%-8.06%). These model-based high-resolution risk maps can provide important information to guide the spatial targeting of opisthorchiasis control interventions.
Introduction
End the epidemics of Neglected Tropical Diseases (NTDs) by 2030 embodied in the international set of targets for the Sustainable Development Goals (SDGs) endorsed by the United Nations empowers the efforts made by developing countries to combat the NTD epidemics (UN,2015). To date, 20 diseases have been listed as NTDs, and opisthorchiasis is under the umbrella of food-borne trematodiasis (Ogorodova, et al.,2015). Two species of opisthorchiasis are of public health significance, that is Opisthorchis felineus (O. felineus), endemic in eastern Europe and Russia, and Opithorchis viverrini (O. viverrini) endemic in Southeast Asian countries (Petney, et al.,2013). The later species is of our interest in the current manuscript.
According to WHO’s conservative estimation, an overall disease burden due to opisthorchiasis was 188,346 disability-adjusting life years (DALYs) in 2010 (Havelaar, et al.,2015). Fürst and colleagues estimated that more than 99% of the burden worldwide attribute to O. viverrini infection in Southeast Asia (Fürst, et al.,2012). Five countries in Southeast Asia, Cambodia, Lao PDR, Myanmar, Thailand and Vietnam, are endemic for opisthorchiasis, with an estimated 67.3 million people at risk (Keiser and Utzinger,2005). It is well documented that chronic and repeated infection with O. viverrini leads to the development of fatal bile duct cancer (cholangiocarcinoma) (International Agency for Research on Cancer,1994).
The life cycle of O. viverrini involves freshwater snails of the genus Bithynia as the first intermediate host, and freshwater cyprinoid fish as the second intermediate host. Humans and other carnivores (e.g., cats and dogs), the final hosts, become infected by consuming raw or insufficiently cooked infected fish (Andrews, et al.,2008; Saijuntha, et al.,2014). Behavioral, environmental and socio-economic factors affect the transmission of O. viverrini (Grundy-Warr, et al.,2012; Phimpraphai, et al.,2018; Phimpraphai, et al.,2017; Prueksapanich, et al.,2018). Raw or insufficiently cooked fish consumption is the cultural root in endemic countries, shows a strong relationship with the occurrence of the disease (Andrews, et al.,2008; Grundy-Warr, et al.,2012). Poorly hygienic conditions increase the risk of infection, especially in areas practicing raw-fish-eating habit (Grundy-Warr, et al.,2012). In addition, environmental and climatic factors, such as temperature, precipitation and landscape, affecting either snail/fish population or growth of the parasites inside the intermediate hosts, can potentially influence the risk of human infection (Forrer, et al.,2012; Suwannatrai, et al.,2017). Important control strategies of O. viverrini infection include preventive chemotherapy, health education, environmental modification, improving sanitation, as well as comprehensive approaches with combinations of the above (Saijuntha, et al.,2014). For purposes of public health control, WHO recommends implementing preventive chemotherapy once a year or once every two years depending on the levels of prevalence in population, with complementary interventions such as health education and improvement of sanitation (WHO,2009).
Understanding the geographical distribution of O. viverrini infection risk at high spatial resolution is critical to prevent and control the disease cost-effectively in priority areas. Thailand conducted national surveys for O. viverrini prevalence in 1981, 1991, 2001, 2009, and 2014 (Echaubard, et al.,2016; Suwannatrai, et al.,2018), but the results of these surveys were presented at the province-level, which is less informative for precisely targeting control interventions. Suwannatrai and colleagues, based on climatic and O. viverrini presence data, produced climatic suitability maps for O. viverrini in Thailand using the MaxEnt modeling approach (Suwannatrai, et al.,2017). The maps brought insights for identifying areas with a high probability of O. viverrini occurrence; however, they did not provide direct information on prevalence of O. viverrini in population (Elith, et al.,2011). A risk map of O. viverrini infection in Champasack province of Lao PDR were presented by Forrer and colleagues (Forrer, et al.,2012). To our knowledge, high-resolution, model-based risk estimates of O. viverrini infection are unavailable in the whole endemic region of Southeast Asia.
Bayesian geostatistical modeling is one of the most rigorous inferential approaches for high-resolution maps depicting the distribution of the disease risk (Karagiannis-Voules, et al.,2015). Geostatistical modeling relates geo-referenced disease data with potential influencing factors (e.g., socioeconomic and environmental factors) and estimates the infection risk in areas without observed data (Gelfand and Banerjee,2017). Common geostatistical models are usually based on point-referenced survey data (Banerjee, et al.,2014). In practice, disease data collected from various sources often consists of point-referenced and area-aggregated data. Bayesian geostatistical joint modeling approaches provide a flexible framework for combining analysis of both kinds of data (Moraga, et al.,2017; Smith, et al.,2008). In this study, we aimed (1) to collect all available survey data on the prevalence of O. viverrini infection at point- or area-level in Southeast Asia through systematic review; and (2) to estimate the spatial-temporal distribution of the disease risk at a high spatial resolution, with the application of advanced Bayesian geostatistical joint modeling approach.
Results
A total of 2,690 references were identified through systematically reviewing peer-review literatures, and 13 additional references were gathered from other sources. According to the inclusion and exclusion criteria, 170 records were included, resulted in a total of 593 area-level surveys in 174 areas and 524 point-level surveys at 396 locations in five endemic countries (ie, Cambodia, Lao PDR, Myanmar, Thailand and Vietnam) of Southeast Asia (Figure 1). Around 70% and 15% of surveys were conducted in Thailand and Lao PDR, respectively. Only two relevant records were obtained from Myanmar. To avoid large estimated errors, we did not include this data in the final geostatistical analysis. All surveys were conducted after 1970, with more than two third done after 2000. Most surveys (95%) are community-based. Around 40% of surveys used the Kato-Katz technique for diagnosis, while another 41% did not specify diagnostic approaches. Mean prevalence calculated directly from survey data was 16.82% across the study region. A summary of survey data is listed in Table 1, and survey locations and observed prevalence in each period are shown in Figure 2. Area-level data cover all regions in Thailand and Lao PDR, and most regions in Cambodia and Vietnam, whilst point-referenced data are absent in most areas of Vietnam, the western part of Cambodia and southern part of Thailand.
(A) Before 1990, (B) 1990-1999, (C) 2000-2009, and (D) from 2010 onwards.
Five variables were selected for the final model through the Bayesian variable selection process (Table 2). The infection risk was significantly high in the community population in comparison to that of school-aged children. Positive associations were identified between the prevalence of O. viverrini and human influence index as well as travel time to the nearest big city. Annual precipitation was negatively correlated to the infection risk. People living in areas with distance to the nearest open water bodies longer than 1.5 km showed a negative risk with a posterior probability of 97.2%, compared to that with distance smaller than 0.5 km. The AUC was 0.79, indicating the Bayesian joint model had a reasonable capacity of prediction accuracy. The model performance was much better than that of the model only based on point-referenced data (appendix p 10).
The estimated risk maps of O. viverrini infection for the four periods (ie, before 1990, 1990-1999, 2000-2009, and from 2010 onwards) are presented in Figure 3. From 2010 onwards, the high infection risk (with prevalence >25%) was mainly estimated in regions of the southern, the central and the north-central parts of Lao PDR, some areas in the east-central and the southern parts of Cambodia, and some areas of the northeastern and the northern parts Thailand. The southern part of Thailand, the northern part of Lao PDR, and the western part of Cambodia showed low risk estimates (with prevalence <5%) of O. viverrini infection. The central and several southern parts of Vietnam showed low to moderate risk of O. viverrini infection, while there was no evidence of O. viverrini in other parts of Vietnam. High estimation uncertainty was mainly present in the central part of Lao PDR, the northern and the eastern parts of Thailand, and the central part of Vietnam (Figure 4).
Estimated prevalence based on the median of the posterior estimated distribution of infection risk for periods (A) before 1990, (B) 1990-1999, (C) 2000-2009, and (D) from 2010 onwards.
(A) Before 1990, (B) 1990-1999, (C) 2000-2009, and (D) from 2010 onwards.
In addition, the infection risk varies over time (Figure 5). Most areas of northern Thailand showed an obvious increasing trend from the period before 1990 to 1990-1999, while many areas of the country presented a considerable decrease of infection risk after 2000. The infection risk kept increasing in most areas of the central and the southern parts of Lao PDR, while the north-central part firstly decreased and then increased. The east-central part of Cambodia showed an increasing trend between 2000-2009 and 2010 onwards, while the infection risk changed slightly in central parts of Vietnam.
Changes were calculated by the median of the posterior estimated distribution of infection risk for the former time period minus that for the latter time period divided by that for the former time period. The risk changes (A) between the period before 1989 and from 2010 onwards; (B) between the period before 1989 and 1990-1999; (C) between the period 1990-1999 and 2000-2009; and (D) between the period 2000-2009 and from 2010 onwards.
The population-adjusted estimated prevalence over the study region presents a trend down after 2000 (Figure 6 and appendix pp 13-17). At the country-level, the estimated prevalence in Thailand showed a fast decline from 2000 onwards, and took on a gradually decreasing change in Cambodia, while it maintained quite stable in Lao PDR and Vietnam during the whole study period. Particularly, we estimated that in 2018, the overall population-adjusted estimated prevalence of O. viverrini infection in the whole study region was 6.96% (95% BCI: 6.06%-8.06%), corresponding to 13.13 million (95% BCI: 11.43-15.21) infected individuals (Table 3). Lao PDR showed the highest prevalence (32.47%, 95% BCI: 28.75%-37.08%), followed by Thailand (10.63%, 95% BCI: 9.48%-11.76%), Cambodia (8.57%, 95% BCI: 5.83%-12.85%), and Vietnam (2.14%, 95% BCI: 0.89%-3.95%). Thailand had the largest numbers of individuals estimated to be infected with O. viverrini (7.34 million, 95% BCI: 6.55-8.13), followed by Lao PDR (2.26 million, 95% BCI: 2.00-2.58), Vietnam (2.06 million, 95% BCI: 0.86-3.81), and Cambodia (1.39 million, 95% BCI: 0.95-2.08).
Population-adjusted estimated prevalence and number of individuals infected with O. viverrini in endemic countries of Southeast Asia*
Discussion
In this study, we produced model-based, high-resolution risk estimates of opisthorchiasis across endemic countries of Southeast Asia through systematic reviewing and the advanced Bayesian geostatistical joint modeling approach. The disease is the most important foodborne trematodiasis in the study region (Sripa, et al.,2010), taking into account most of the disease burden of opisthorchiasis in the world (Fürst, et al.,2012). The estimates were obtained by systematically reviewing all possible geo-referenced survey data and applying a Bayesian geostatistical modeling approach that jointly analyzes point-referenced and area-aggregated disease data, as well as environmental and socioeconomic predictors. Our findings will be important for guiding control and intervention cost-effectively and serve as a baseline for future progress assessment.
Our estimates suggested that there was an overall decrease of O. viverrini infection in Southeast Asia, which may be largely attributed to the decline of infection prevalence in Thailand. This decline was probably on account of the national opisthorchiasis control program launched by the Ministry of Public Health of Thailand from 1987 (Jongsuksuntigul and Imsomboon,2003; Jongsuksuntigul, et al.,2003). Our high-resolution risk estimates in Thailand from 2010 onwards showed similar pattern as the climatic suitability map provided by Suwannatral and colleagues (Suwannatrai, et al.,2017). In this case, we estimated the prevalence of the population instead of the occurrence probability of the parasite, which arms decision makers with more direct epidemiological information for guiding control and intervention. The national surveys in Thailand reported a prevalence of 8.7% and 5.2% in 2009 and 2014, respectively (MOPH,2014; Wongsaroj, et al.,2014). However, we estimated higher prevalence of 17.81% (95% BCI: 16.04%-21.94%) and 10.63% (95% BCI: 9.48%-11.76%) in the periods 2000-2009 and from 2010 onwards, respectively. Even though the national surveys covered most provinces in Thailand, estimates were based on simply calculating the percentage of positive cases among all the participants (Wongsaroj, et al.,2014), and the remote areas might not be included (Maipanich, et al.,2004). Instead, our estimates were based on rigorous Bayesian geostatistical modeling of available survey data with environmental and socioeconomic predictors, accounting for heterogeneous distribution of infection risk and population density when aggregating country-level prevalence.
Our findings suggested that the overall infection risk of O. viverrini did not decline in Lao PDR during the study periods, consistent with conclusions drawn by Suwannatrai and colleagues (Suwannatrai, et al.,2018). We estimated that a total number of 2.26 (95% BCI: 2.00-2.58) million people living in Lao PDR was infected with O. viverrini, equivalent to that estimated by WHO in 2004 (WHO,2002). Besides, our risk mapping for Champasack province shares similarly risk map pattern produced by Forrer and colleagues (Forrer, et al.,2012). A national-scaled survey in Cambodia during the period 2006-2011 reported infection rate of 5.7% (Yong, et al.,2014), lower than our estimation of 8.57% (95% BCI: 5.83%-12.85%) from 2010 onwards. The former may underestimate the prevalence because more than 77% of participants were schoolchildren (Yong, et al.,2014). Another large survey in five provinces of Cambodia suggested a large intra-district variation, which makes the identification of endemic areas difficult (Miyamoto, et al.,2014). Our high-resolution estimates for Cambodia help to differentiate the intra-district risk. However, the estimates should be taken cautious due to large district-wide variances and a relatively small number of surveys. Indeed, O. viverrini infection was underreported in Cambodia (Khieu, et al.,2019), and further point-referenced survey data are recommended for more confirmative results.
Although an overall low prevalence was estimated in Vietnam (2.13%, 95% BCI: 0.89%-3.95%) from 2010 onwards, it corresponds to 2.06 million (95% BCI: 0.86%-3.81%) people infected, comparable to the number in Lao PDR, mainly due to a larger population in Vietnam. The risk mapping suggested moderate to high risk areas presented in central Vietnam, with the highest risk in Phu Yen province, particularly. This agreed with previous studies considering the province a “hotspot” (Doanh and Nawa,2016). Of note, even though there was no evidence of O. viverrini infection in the northern part of the country, Clonorchis sinensis, another important liver fluke species, is endemic in the region (Sithithaworn, et al.,2012). We didn’t provide estimates for Myanmar in case of large estimated errors. Indeed, only two relevant papers were identified by our systematic review, where one shows low to moderate prevalence in three regions of Lower Myanmar (Aung, et al.,2017), and the other found low endemicity of O. viverrini infection in three districts of the capital city Yangon (Sohn, et al.,2019). Nation-wide epidemiological studies are urgent for a more comprehensive understanding of the disease in Myanmar.
We identified several important factors associated with O. viverrini infection in Southeast Asia, which may provide insights for the prevention and control of the disease. The infection risk was higher in the entire community than that in schoolchildren, consistent with multiple studies (Aung, et al.,2017; Forrer, et al.,2012; Miyamoto, et al.,2014; Van De,2004; Wongsaroj, et al.,2014). HII, a measure of human direct influence on ecosystems (Sanderson, et al.,2002), showed a positive relationship with O. viverrini infection risk. A similar association has been found on HII and A. lumbricoides infection in South Asia (Lai, et al.,2019). People living in areas with a longer distance to the nearest open water bodies tended to have lower risk, consistent with the previous finding (Forrer, et al.,2012). We found annual precipitation negatively associated with O. viverrini infection, consistent with the previous study conducted in Southern Lao PDR (Forrer, et al.,2012). Higher precipitation might decrease the disease transmission by creating fast-flowing water unsuitable for intermediate host snail survival (McCreesh and Booth,2013). A negative association was found between O. viverrini infection and travel time to the nearest big city, suggesting the disease was more likely to occur in rural areas. The habit of eating raw or insufficiently cooked fish was more common in rural areas than that in big cities, which could partially explain our findings (Grundy-Warr, et al.,2012; Keiser and Jennifer,2019). Indeed, this culturally rooted habit is one of the determinants for human opisthorchiasis (Kaewpitoon, et al.,2008; Ziegler, et al.,2011). However, the precise geographical distribution of such information is unavailable and thus we could not use it as a covariate in this study.
Our estimate of the number of people infected with O. viverrini is higher than that of the previous study (13.1 million vs 8.6 million (Qian and Zhou,2019)) emphasizing the public health importance of this neglected disease in Southeast Asia, and suggesting that more effective control interventions should be conducted, particularly in the high risk areas. The successful experience in the intervention of Thailand may be useful for reference by other endemic countries of the region. The national opisthorchiasis control program, supported by the government of Thailand, applied interrelated approaches, including stool examination and treatment of positive cases, health education aiming at the promotion of cooked fish consumption, and environmental sanitation to improve hygienic defecation (Jongsuksuntigul and Imsomboon,2003). In addition, for areas with difficulties to reduce infection risk, a new strategy was developed by Sripa and colleagues, using the EcoHealth approach with anthelminthic treatment, novel intensive health education on both communities and schools, ecosystem monitoring, and active participation of the community (Sripa, et al.,2015). This “Lawa model” shows good effectiveness in Lawa Lake area, where the liver fluke was highly endemic (Sripa, et al.,2015). Furthermore, common integrated control interventions (eg, combination of preventive chemotherapy with praziquantel, improvement of sanitation and water sources, and health education) are applicable not only for opisthorchiasis but also for other NTDs, such as soil-transmitted helminth infection and schistosomiasis, which are also prevalent in the study region (Dunn, et al.,2016; Gordon, et al.,2019). Implementation of such interventions in co-endemic areas could be cost-effective (Baker, et al.,2011; WHO,2012).
Frankly, several limitations exist in our study. We collected data from different sources, locations of which might not be random and preferential sampling might exist. Nevertheless, we performed a risk-preferential sampling test and no significant preferential sampling was found, indicating the data good representative across the study region (appendix p 11). We assumed similar proportions of age and gender in different surveys, as most of which only reported prevalence aggregated by age and gender. Nevertheless, considering the possible differences in infection risk between the whole population and schoolchildren, we categorized survey types to the community- and school-based. We set clear criteria for selection of all possible qualified surveys and didn’t exclude surveys that only reported prevalence without numbers of examination or those reported prevalence in intervals without exact observed values. Data imputation was undertaken and sensitivity analysis showed that the setting of the imputation had little effects on the final results (appendix pp 18-22). Our analysis was based on survey data under different diagnostic methods. The sensitivity and specificity of the same diagnostic method may differ across studies (Charoensuk, et al.,2019; Laoprom, et al.,2016; Sayasone, et al.,2015), while different diagnostic methods may result in different results in the same survey. Due to the lack of enough information on the assessment of the quality and procedure of the diagnostic approach in each survey, we assumed similar diagnostic sensitivity and specificity across surveys (Lu, et al.,2017). In addition, most of the diagnostic methods in the surveys were based on fecal microscopic technique on eggs, which could not effectively distinguish between O. viverrini and minute intestinal flukes of the family Heterophyidae (eg heterophyid and lecithodendriid) (Charoensuk, et al.,2019; Sato, et al.,2010). Thus, our results may overestimate the O. viverrini infection risk in areas where heterophyid and lecithodendriid are endemic, such as Phongsaly, Saravane and Champasak provinces in Lao PDR (Chai, et al.,2013; Chai, et al.,2010; Sato, et al.,2010), Nan and Lampang provinces in Thailand (Wijit, et al.,2013), and Takeo province in Cambodia (Sohn,2011). There is an urgent need for the application of more powerful diagnostic practices with higher sensitivity and specificity to better detect the true O. viverrini prevalence, such as PCR (Lovis, et al.,2009; Lu, et al.,2017; Sato, et al.,2010). Nevertheless, because of the similar treatment and the prevention strategies of O. viverrini and minute intestinal flukes (Keiser and Utzinger,2010), our risk mapping is valuable also for areas co-endemic with the above flukes.
In conclusion, this study contributes to better understand the spatial-temporal characteristics of O. viverrini infection in major endemic countries of Southeast Asia, providing valuable information guiding control and intervention, and serving as a baseline for future progress assessment. Estimates were based on a rigorous geostatistical framework jointly analyzing point- and areal level survey data with potential predictors. The higher number of infected people we estimated highlights the public health importance of this neglected disease in the study region. More comprehensive epidemiological studies are urgently needed for endemic areas with scant survey data.
Materials and Methods
Search strategy, selection criteria and data extraction
We did a systematic review (registered in the International Prospective Register of Systematic Reviews, PROSPERO, No.CRD42019136281) following the PRISMA guidelines (appendix pp 43-44) to collect relevant publications reporting prevalence data of opisthorchiasis in Southeast Asia (Moher, et al.,2010). We searched PubMed and ISI Web of Science from inception to February 9, 2020, with search terms: (liver fluke* OR Opisthorchi*) AND (Southeast Asia OR Indonesia OR (Myanmar OR Burma) OR Thailand OR Vietnam OR Malaysia OR Philippines OR Lao PDR OR Cambodia OR Timor OR Brunei OR Singapore). We set no limitations on language, date of survey, or study design in our search strategy.
We followed a protocol for inclusion, exclusion and extraction of survey data (appendix pp 4-5). Firstly, we screened titles and abstracts to identify potentially relevant articles. Publications on in vitro studies, or absence of human studies or absence of disease studies were excluded. Quality control was undertaken by re-checking 20% of randomly selected irrelevant papers. Second, the full-text review was applied to potentially relevant articles. We excluded publications with following conditions: absence of prevalence data; studies done in specific patient groups (eg, prevalence on patients with specific diseases), in specific population groups (eg, travelers, military personnel, expatriates, nomads, displaced or migrating population), under specific study designs (eg, case report studies, case-control studies, clinical trials, autopsy studies); drug efficacy or intervention studies (except for baseline data or control groups), population deworming within one year, the survey time interval more than ten years, data only based on the direct smear method (due to low sensitivity) or serum diagnostics (due to unable to differ the past and the active infection). During the full-text review, the potential relevant cited references of the articles were also screened. Studies were included if they reported survey data at provincial level and below, such as administrative divisions of level one to three (ADM1: province, state, etc.; ADM2: city, etc.; ADM3: county, etc.), and at point-level (village, town, school, etc.). Duplicates were checked and removed. The quality assessment of each individual record included in the final geostatistical analysis was performed by two independent reviewers, based on a quality evaluation list (appendix pp 23-40).
We extracted data following the GATHER checklist (appendix pp 41-42) (Stevens, et al.,2016). Detailed information of records were extracted into a database, which includes literature information (eg, journal, authors, publication date, title, volume, and issue), survey information (eg, survey type: community- or school-based, year of survey), location information (eg, location name, location type, coordinates), and disease related data (eg, species of parasites, diagnostic method, population age, number of examined, number of positive, and percentage of positive). The coordinates of the survey locations were obtained from Google Maps (https://www.google.com/maps/). Considering that areas are much smaller of ADM2 or ADM3, compared to that of ADM1, we treated surveys aggregated in ADM2 or ADM3 as point-level georeferenced at division centroids referring to previous studies (Lai, et al.,2019), while surveys aggregated over ADM1 as area-level, in order to avoid large uncertainty in the later geostatistical analysis.
Missing data imputation was undertaken: (1) for surveys that only reported prevalence without numbers of examined, a sample size of 50 was assigned to point-referenced surveys, while a sample size of 500 was assigned to area-level surveys; (2) for surveys reported prevalence in intervals without exact observed values, the medians of the intervals were assigned.
Environmental, socioeconomic, and demographic data
The environmental data (ie, annual precipitation, distance to the nearest open water bodies, elevation, land cover, land surface temperature in the daytime and at night and normalized difference vegetation index), socioeconomic data (ie, human influence index, survey type, and travel time to the nearest big city), and demographic data of Southeast Asia were downloaded from open data sources (appendix p 6). All data were aligned over a 5×5 km grid across the Southeast Asia region using the “raster” package (https://cran.r-project.org/web/packages/raster) in R (version 3.5.0). Data at point-referenced survey locations were extracted using the same package. We linked the data to the divisions (ie, ADM1, ADM2 or ADM3) reported aggregated outcome of interest (ie, infection prevalence) by averaging them within the corresponding divisions. Details of data processing are available in the appendix (p 7).
Statistical analysis
As our outcome of interest derived from both point-referenced and area-aggregated surveys, a Bayesian geostatistical joint modeling approach was applied to analyze the area-level and point-level survey data together (Moraga, et al.,2017; Utazi, et al.,2019).
A latent spatial-temporal process following a zero-mean Gaussian distribution was introduced to model the covariance of data residuals in space and time. The survey years were grouped into four periods: before 1990, 1990-1999, 2000-2009, and from 2010 onwards and t = 1, 2,3,4 were set to represent the corresponding periods, respectively. We defined pit the probability of infection at location i and time period t, where i is the index either for the location of point-referenced data or of the area for area-level data. Let Yit and Nit be the number of positive and the number of examined individuals, respectively, at location i and time period t. We assumed that Yit follows a binomial distribution, that is, Yit ∼Bin(pit, Nit), where i = 1, …, nA, nA + 1, …, nA + np, nA the total number of areas for area-level surveys and np the total number of locations for point-referenced surveys. We modelled predictors on a logit scale of pit. A standard grid of 5×5 km was overlaid to each survey area resulting in a certain number of pixels representing the area. We assumed that survey locations and pixels within survey areas shared the same spatial-temporal process. Regarding to area-level data, , where
the vectors of covariate values for ith area in time period t with
and β are the corresponding regression coefficients.
is the size of the i th area and ω(s, t) the spatial-temporal random effects of pixels within the area. For point-referenced data,
, where i = nA + 1, …, nA + np,
the vectors of covariate values and ω(si, t) the spatial-temporal random effect for ith location in time period t. We assumed the spatio-temporal random effect ω(s, t) follow a zero-mean Gaussian distribution, that is ω∼GP(0, Kspace ⊗ Ktime), where the spatial covariance matrix Kspace was defined as a stationary Matérn covariance function
and the temporal covariance matrix as
with |ρ| < 1, corresponding to the autoregressive stochastic process with first order (AR1). Here D donates the Euclidean distance matrix, κ is a scaling parameter and the range
, representing the distance at which spatial correlation becomes negligible (<0.1), and Kv is the modified Bessel function of the second kind, with the smoothness parameter v fixed at 1. We formulated the model in a Bayesian framework. Minimally informative priors were specified for parameters and hyper parameters as following:
, and
, where m is the median distance between the predicted grids.
Additionally, we used the best subset variable selection method to find out the best combination of predictors for the final model (Karagiannis-Voules, et al.,2015). According to previous studies (Aung, et al.,2017; Forrer, et al.,2012; Miyamoto, et al.,2014; Wongsaroj, et al.,2014), the infection risk in community and school may be different, thus the survey type (ie, community- or school-based) was kept in all potential models, while the other ten environmental and socioeconomic variables mentioned above were put forth into the Bayesian variable selection process. The model with a minimum log score was chosen as the final model.
Model fitting and variable selection process were conducted through INLA-SPDE approach (Lindgren, et al.,2011; Rue, et al.,2009), using integrated nested Laplace approximation (INLA) package in R (version 3.5.0). Estimation of risk for O. viverrini infection during each time period was done over a grid with cell size of 5×5 km. The corresponding risk maps were produced using ArcGIS (version 10.2). In addition, the population-adjusted estimated prevalence and number of infected individuals in 2018 were calculated at the country- and provincial levels based on pixel-level estimates and gridded demographic information. Based on previous studies, for the provinces in Vietnam where there was no evidence of O. viverrini infection, we multiplied the estimated results by zero as the final estimated prevalence (Doanh and Nawa,2016). Model validation was conducted using the five-fold out-of-sample cross-validation approach. The predictive accuracy was assessed under the receiver-operating characteristic (ROC) curve by calculating the area under the curve (AUC) (Brooker, et al.,2001). Correlation coefficient between observed and estimated prevalence values was calculated (appendix p 10). Furthermore, a Bayesian geostatistical model only based on point-referenced data was fit and validated, to compared its performance with our joint modeling approach. Considering that the data in this study were sourced from different studies, a test for detecting preferential sampling of spatial-temporal processes is implemented (appendix p 11) (Watson,2020). In addition, a sensitivity analysis was conducted to evaluate the effects of imputation on missing data (appendix pp 18-22).
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Additional information
Funding
Author contributions
TTZ, HYT, and YSL made substantial contributions to the study concept and design. TTZ, YJF, and PND made substantial contributions to data collection. TTZ and YSL made substantial contributions to data analysis, and interpretation. TTZ and YSL were in charge of the manuscript draft. PND, SS, VK, CN, MBQ, YTH, and YSL made substantial revisions to the manuscript.
Acknowledgments
We are grateful to Dr Roy Burstein from Institute for Disease Modeling, Bellevue, Washington, USA for providing very good suggestions for the manuscript.
Competing interests
The authors declare that no competing interests exist.