Abstract
Background Comprehensive environmental risk characterization, encompassing physical, chemical, social, ecological, and lifestyle stressors, necessitates innovative approaches to handle the escalating complexity. This is especially true when considering individual and population-level diversity, where the myriad combinations of real-world exposures magnify the combinatoric challenges. The GeoTox framework offers a tractable solution by integrating geospatial exposure data from source-to-outcome in a series of modular, interconnected steps.
Results Here, we introduce the GeoTox open-source R software package for characterizing the risk of perturbing molecular targets involved in adverse human health outcomes based on exposure to spatially-referenced stressor mixtures. We demonstrate its usage in building computational workflows that incorporate individual and population-level diversity. Our results demonstrate the applicability of GeoTox for individual and population-level risk assessment, highlighting its capacity to capture the complex interplay of environmental stressors on human health.
Conclusions The GeoTox package represents a significant advancement in environmental risk characterization, providing modular software to facilitate the application and further development of the GeoTox framework for quantifying the relationship between environmental exposures and health outcomes.
By integrating geospatial methods with cutting-edge exposure and toxicological frameworks, GeoTox offers a robust tool for assessing individual and population-level risks from environmental stressors. GeoTox is freely available at https://niehs.github.io/GeoTox/.
1 Introduction
Risk characterization of multiple environmental stressors including physical, chemical, social, ecological, and lifestyle, is a top priority of the environmental health sciences [1]. There are numerous technical factors contributing to the complexity of the problem, including the infinite number of real-world exposure combinations and the inter-individual biological complexity of humans. At the intersection of exposure mixtures and biological complexity is the well-established result that most human diseases occur due to high-dimensional interactions between exogenous environmental exposures and endogenous biology such as genetics, epigenetics, and the microbiome [2].
To address the challenge of complex chemical exposures the exposome concept was introduced [3]. The exposome (exposomics) is measure of all environmental, social, lifestyle, and ecological exposures across the life course of an individual [2]. The main-stream philosophy for quantifying the exposome is through individual biological measurements [4] such as chemical characterization, epigenetic alterations, metabalomic derivations, and proteomic responses [5]. Advancements in biological and -omic based quantification is happening quickly [6]; nonetheless, there are immense analytical challenges ahead in large part due to (1) the spatiotemporal variability of exposures, (2) the sheer magnitude and variety of exposures, (3) the specificity of exposomic measurements to multiple endogenous and exogenous processes, and (4) the temporal variability and specificity of biological responses.
Geospatial approaches offer an attractive approach to quantify the exposome because external components such as the social and physical-chemical exposome are more accurately quantified with geospatial models of exposure. Described by Rappaport and Smith [7] and supported by Vermeulen et al. [2], toxic effects are mediated through chemicals that alter critical molecules, cells, and physiological processes inside the body. In the same spirit, it is clear that phenotypic outcomes (e.g. disease) at the individual or population level can only occur after a series of source, exposure, and biological dynamics (Figure 1). We refer to this as the source-to-outcome-continuum. It follows that if methods and data exist to quantify each step in the sequence, and each step can be integrated into each neighboring step, then individual and population outcomes can be quantified from spatiotemporally resolved environmental sources and exposures.
Conveniently, data and methods do exist for quantifying the source-to-outcome-continuum. The Adverse Outcome Pathway (AOP) [8] framework provides a linkage between perturbation of a specific biological target, pathway or process by a stressor, and an adverse outcome considered relevant to risk assessment. Many sources of data can be used to support AOP development, with new approach methodologies (NAMs) such as high-throughput in-vitro assays, small organism models, and in silico toxicology modeling approaches especially useful for probing specific targets. Teeguarden et al. [9] proposed the Aggregate Exposure Pathway (AEP) as an exposure analog to the AOP framework. The AEP framework aims to quantify the fate and transport of environmental chemicals through different media and, when appropriate, chemical transformations that occur in the environment. Additionally, the framework articulates the adsorption, distribution, metabolism, and excretion of chemicals that relate the external concentration and internal concentration of a chemical (or its active metabolites). Most importantly, the AEP framework defines these internal concentrations as the Target Site Exposures (TSEs), which are analogous to molecular initiating event (MIE) from the AOP framework. To summarize, the end process in an AEP is the beginning process for the AOP thus providing explicit methods for source-to-outcome-continuum modelling.
Recent work has demonstrated the potential for source-to-outcome-continuum modeling in NAMs based risk assessment. For example, Hines et al. [10] estimated a hazard index for humans, fish, and small mammals at hypothetical field site with exposures to ClO4−and its degradation products. Price et al. [11] posit source-to-outcome-continuum modeling as a more comprehensive approach for integrating chemical interactions in human health risk assessment. In Eccles et al. [12], we introduced a workflow, referred to as GeoTox, for characterizing the risk of perturbing molecular targets involved in adverse human health outcomes based on exposure to spatially referenced chemical mixtures by connecting the frameworks of the AEP and AOP. The objectives of this paper build upon the GeoTox framework as follows: (1) Develop open-source software to facilitate analysis of the geospatial toxicological workflow that features best-practices in software development including continuous integration, unit testing, ready tunability of key parameters, documentation, and (2) Analyze and demonstrate case-studies with varying individual, population, multiple assays end-points that impact downstream risk. Our goal is that the GeoTox software will allow researchers in the toxicological and geospatial research communities to build and expand application of the GeoTox concept.
The remainder of this manuscript is as follows: In section 2 we provide a review of the GeoTox framework (Section 2.1), an overview of the new, open-source GeoTox package to facilitate usage across the environmental and toxicological risk assessment community (Section 2.3), and proposed analyses for population, individual, and multiassay (Section 2.4) risk as quantified by internal molecular perturbation. Section 3 summarizes key results from the case studies. Section 4 provides a discussion on the GeoTox code development and case study results, GeoTox, and lastly, section 5 summarizes the package features and places it in the context of new approach methodologies development for improved exposomic risk characterization.
2 Materials and Methods
2.1 Review of GeoTox framework
Here, we provide a brief review of the GeoTox framework as described in Eccles et al. [12] including its key steps and data requirements. Figure 2 shows the overall workflow, where blue nodes represent an object and green, rounded nodes represent a methodology or function required to go from one object to the next. Throughout the manuscript, software names and functions are highlighted in code text. This differentiates GeoTox the software from GeoTox the framework.
First, geospatial modeling is used to estimate an external, geospatial referenced exposure concentrations due to presumed external sources. Eccles et al. [12] utilized publicly available chemical transport model external exposure data, but all external exposure assessment methods are possible including geostatistical models and personal monitoring. An important constraint is that publicly available or user-developed geospatial exposure chemical parameters (e.g. CASN, SMILES) must also be available in the subsequent databases and model. For example, while ozone (O3) is an established air pollutant with readily available geospatial exposure data, it is currently not amenable to PBPK models or high-throughput in vitro screening assays. Next, behavioral and physiological modeling is used to estimate internal concentrations from the presumed route of exposure. The standard source is the EPA Exposure Factors Handbook [13], which provides population distributions of physiological factors such as inhalation rates by age, sex, and body weight. Next, physiologically-based toxicokinetic (PBTK) models are used to estimate the internal, target organ concentration. The R package, httk [14], is the standard used in GeoTox due to its high-throughput nature and flexibility, but tailored PBTK models are also applicable. Then, in-vitro-to-in-vivo-equivalent (IVIVE) modeling, also based on PBTK models, is used to convert the human concentration to an in-vitro equivalent concentration. With the external exposure data now comparable to the high-throughput screening (HTS) assay data, individual and mixtures concentration-response modeling can be performed. A key concept here is that the HTS assay(s) chosen informs on the molecular target of interest through a molecular initiating event and is both biologically and risk relevant (e.g. Related to a cancer or outcome mode of action). Lastly, a risk assessment metric is chosen for the risk assessment, resulting in the final risk map. Eccles et al. [12] showed risk maps of the mixture response predicted by independent action (IA) [15] and generalized concentration addition (GCA) [16] and the hazard quotient (HQ)[17].
FAIR Code Development
GeoTox is created with many software best-practices, including findable, accessible, interoperable, and reproducible (FAIR) software development standards. The package is accessible as the source code is made completely open-source, hosted at the NIEHS GitHub (code base: https://github.com/NIEHS/GeoTox; package website: https://niehs.github.io/GeoTox/). The NIEHS GitHub repository and availability through the Comprehensive R Archive Network (CRAN) ensure its potential for long-term findability and accessibility.
GeoTox uses standardized style and linting through the lintr package [18] providing syntax consistency. The object-oriented approach is compatible with the tidyverse [19] syntax and style, allowing interoperability with R-based piping workflows (i.e. x | > f ()).
GeoTox was intentionally developed in a modular and extensible manner allowing the authors and community members such as geospatial scientists and toxicologists to contribute and build upon the code base. As an open-source project under the MIT license, we provide a contributor’s guide, news file for updates, and an on-going list of features or extensions that could be added by the authors or community members. An important contributing guide rule is the enforcement of continuous integration and continuous development (CI/CD) workflows on the GitHub repository. These workflows enforce rules and automatic checks to the code base when the authors or community members provide updates or new features. Additionally, we have created standardized templates for bug reports and feature requests on the GitHub repository. Together, these will aide in the on-going maintenance and further improvements while minimizing the overall technical debt.
Lastly, the testthat package is used to enforce test-driven development (TDD) in the creation and maintenance of GeoTox [20]. The unit tests verify that each function performs as designed and help minimize the chances of bugs – particularly semantic errors in which the program functions, but the calculation is not correct. Performance differs between functions and data sources, but can be generally categorized as expected successes, testing that the function does not return an error when provided valid parameters, and expected failures, testing that the function recognizes invalid parameters and returns an error.
2.3 GeoTox Package Overview
Here, we provide a general overview of the GeoTox package. Figure 3a shows the key data requirements for the GeoTox analysis workflow. GeoTox is not a geospatial exposure assessment tool, thus it is necessary to have georeferenced exposure data for the chemicals of interest. The molecular or biological target such as a molecular initating event or adverse outcome pathway key event anchors the framework as a mechanistic-based risk mapping approach. While traditional in-vivo data can be used, the examples here draw from the Integrated Chemical Environment (ICE) curated high-throughput in-vitro screening (cHTS) [21]. Section 2.3.7 discusses the availability and integration of cHTS data in GeoTox and section 4 discusses other data sources and potential future integration. Lastly, physiological and behavioral data is required to estimate an internal exposure from the external geospatial exposure. Section 2.3.7 details the physiological data built into GeoTox functions from the EPA Exposure Factors Handbook [13].
Figure 3b shows the general functions in GeoTox that provide a simple, tidy-friendly workflow. This overview workflow is provided in the package vignette titled “GeoTox Introduction”. First, a GeoTox object is initialized. All of the subsequent steps add components to the GeoTox object. The set boundaries function appends the sf spatial objects to the GeoTox object. The simulate population function is a wrapper that simulates and appends age, obesity, exposure, and steady-state plasma concentration (Css) data. Next, the fit hill function calculates maximum likelihood single chemical concentration-response estimates, which are then piped to the calculate response function to predict chemical mixture risk metrics including independent action [15] and generalized concentration addition [16] and the hazard quotient [17]. Optionally, the sensitivity analysis replicates the [12] Monte Carlo sensitivity analysis where each input parameters is perturbed across its range of values while the remaining parameters are held constant. Lastly, plot functions take the GeoTox object and provide easy ggplot stylized plots for geospatial maps, concentration-response, and sensitivity results.
2.3.1 GeoTox Object
The core development is the GeoTox object, a R-based S3 class object. S3 is R’s simplest and most flexible version of object-oriented programming [22]. It’s simplicity allows us to define the GeoTox object with flexible components that define each step in a GeoTox framework analysis. sf spatial objects provide georeferenced information to the GeoTox object and generic data frames and lists store information on each step in the GeoTox analysis framework.
Figure 4 shows an example GeoTox object printed to the R console. The header displays the geospatial regions and the simulated population within each region. Next, we see the object components listed as fields with their name, class, and dimensions. Details of each field are described in the following sections including the individual simulations based on the httk Monte Carlo population simulations from Ring et al. [23].
2.3.2 Concentration-Response Model Fitting
There are many R packages available for toxicological concentration-response model fitting including drc [24], tcpl2 [25], and toxicR [26, 27]. For internal consistency and completeness, we provide concentration-response model fitting for parametric hill models using maximum likelihood estimation comparable to tcpl2. The function fit hill() takes a data frame consisting of concentration response data and fits a 2 or 3 parameter hill function. GeoTox package reference and vignettes describe how the data frame input is organized to comply with the function syntax. The output is stored in the GeoTox object as the hill params field and is accessed when piped into the calculator functions (Section 2.3.4 for the subsequent analysis steps.
2.3.3 Simulate Functions
Simulating through Monte Carlo sampling of empirical and estimated parameter distributions underlies the GeoTox framework to capture geospatially resolved individuals and populations. Age, obesity, inhalation rates, and steady-state blood plasma concentrations of chemicals are covered in the GeoTox simulation functions. Following the methods outlined in Eccles et al. [12], for a given region such as a US county, age distributions and obesity prevalence are obtained via the US Census and the Centers for Disease Control PLACES data [28], respectively. Inhalation rates are simulated based on the age of the individuals and derived from Table 6.7 of the EPA Exposure Factors Handbook [13]. In section 4 we discuss potential future integration with more refined or user-defined inhalation rate data. Age and obesity simulation results are input into the toxicokinetic parameter simulation. The httk package is used to simulate toxicokinetic parameters such as Css, the steady-state plasma concentration in [23]. Additionally, as the Css simulation can be time consuming for a large sample size, we simulate a tractable number of individuals using httk (i.e. 1,000) and then utilize bootstrap resampling for inference.
2.3.4 Calculator Functions
A series of self-described calculator functions are developed to perform the key steps of the GeoTox pipeline. Starting from the beginning of the pipeline, the calc internal dose() estimates the internal target organ dose of a chemical given inhalation rate, body weight, and time. Currently, time defaults to 1 since the estimate for blood plasma concentration is based on a steady-state assumption achieved in 1 day. Future methodological and package developments will explore non-steady state assumptions. calc invitro concentration() estimates the in vitro equivalent concentration via a simple multiplication of the internal dose and the steady-state blood plasma concentration. To calculate the total mixture response metrics used as the basis of the risk assessment, the function calc response() estimates and returns the chemical mixture responses using the IA, GCA, and the HQ.
2.3.5 Sensitivity Analysis Functions
Every stage in the GeoTox framework includes uncertainty that contribute to the overall variability of the mapped mixture responses. The compute sensitivity() functions wraps the Monte Carlo uncertainty analysis around each step of the analysis to calculate and aide in visualization of the overall uncertainty. By default, a GeoTox analysis function is chosen to vary freely while the remaining functions and their respective parameters are fixed to a central value. This simple sensitivity functionality in GeoTox will allow researchers to explore the sources and impacts of uncertainty in their analysis and increase confidence in the risk mapping results with targeted experiments.
2.3.6 Plotting Functions
Visualization is an important part of exploratory analysis and results dissemination in the GeoTox framework. Here, we provide easy and extensible plotting functions based on the GeoTox object. GeoTox provides a basic plot() function that takes the GeoTox object as input and can create ggplot based plots for geospatial exposure, plot(geoTox, type = “exposure”), dose-response results, plot(geoTox, type = “hill”), single and multi-assay responses, plot(geoTox, type = “response”), and sensitivity analysis results, plot(geoTox, type = “sensitivity”).
2.3.7 Package Data
The GeoTox package comes with a dataset, geo tox data, that is the basis for vignettes demonstrating the package functionality. Importantly, the “GeoTox Package Data” vignette shows the steps for downloading, processing, and creating the package dataset, which can easily be modified to make adjustments to underlying exposure and population data and will aide users in developing their own workflows based on novel datasets. The package dataset contains external, geospatial air pollution exposure data from the 2019 USEPA AirToxScreen, which utilizes the Community Multiscale Air Quality (CMAQ) and AERMOD dispersion models to estimate annual average hazardous air pollutants at the census tract level [29]. Chemical CAS numbers and preferred names are standardized and linked using metadata from the EPA Comp-Tox dashboard [30]. Population age and obesity distributions are used from the US Census bureau and Center for Disease Control and Prevention PLACES data [28], respectively. Additionally, the code base includes example for downloading data sets curated for targeted toxicity endpoints by Integrated Chemical Environment (ICE) REST API [21, 31].
2.4 Case Studies
We showcase GeoTox with three novel analyses that demonstrate single assay, multiassay, and individual risk characterization. The analyses follow the steps outlined in our package vignettes available at the package website.
First, we replicate the Eccles et al. [12] analysis in North Carolina, but utilize an H2AX histone modification genotoxicity assay (assay id: TOX21 H2AX HTRF CHO Agonist ratio) which is a demonstrated in-vitro marker for genotoxic effects and potential in-vitro replacement for in-vivo based cancer relative potency factors [32]. This analysis highlights the extensibility of the GeoTox framework to characterize risk at a precise biological level such as the molecular initiating events or key events quantified by curated in-vitro assays.
Second, we demonstrate a new advancement in the GeoTox framework that allows the incorporation of multiple assay end-points into the geospatial risk mapping workflow. Utilizing the GeoTox package, we characterize the county-level, annual air toxic carcinogenic risk based on 200+ assays from the key characteristics of carcinogens (KCC) [33]. The assays included are based on the following KCC modes of action: KCC2, Genotoxic Effects; KCC4, Epigenetic Alterations; KCC5, Oxidative Stress; KCC6, Chronic Inflammation; KCC8, Receptor Mediated Effects; and KCC10, Cell Proliferation/Death/Energetics. GeoTox incorporates another level of summarization at the assay level such that multi-assay risk is quantified as the “p total quantile of the q assay-level quantiles”.
Lastly, we demonstrate the extensibility of GeoTox to analyze and visualize individual-level risk that may vary across populations. Population variability in GeoTox is based upon the inter-individual variability introduced in the httk package to accomodate individual varibility in toxicokinetic processing via Monte Carlo simulations[23]. Here, we visualize the individual variability of multiple-assay based GeoTox risk. We discuss how human experimental or epidemiological data can be used to characterise risk based on the GeoTox framework.
3 Results
3.1 Case Study 1: State-wide results from a single in vitro assay
Here, we present county-level, single-assay risk mapping results based on the H2AX histone modification genotoxicity assay (assay id: TOX21 H2AX HTRF CHO Agonist ratio), a potential in-vitro replacement for cancer relative potency factors. All three of the case study results include the intersection of chemicals available in the geospatial exposures, assay information, and valid toxicokinetic parameters. The dose-response results in figure 5A show 7 chemicals with valid hit-calls to the H2AX histone modification assay and valid toxicokinetic parameter information, which is easily produced with a package call to plot(geoTox, type = “hill”). Figure 5B shows the GCA median response mapped to the county level, which is produced with the default call to plot(geoTox). Median GCA mixture responses show risk to these 7 chemicals and this potential cancer target is low when compared to the range of the dose-response functions in figure 5A. The highest risk areas are generally rural counties along the Interstate-95 corridor and near the Virginia border, which indicates a rural confluence of volatile, gas-phase air pollutant exposure and population toxicokinetic characteristics. It is important to note that the chemical space available in this analysis is a small fraction of the overall exposomic chemical space. Lastly, Figure 5C highlights the simplicity of analyzing sensitivity analysis results with a call to plot(geoTox, type = “sensitivity”). This analysis highlights the ease of reproducing the analysis from Eccles et al. [12] and producing plots for evaluating chemical response, geospatially mapped risk, and parameter sensitivity.
3.2 Case Study 2: State-wide results from multiple assays
New advancements in the GeoTox framework and GeoTox package allow the incorporation of multiple assay end-points into the geospatial risk mapping work-flow, which facilitate a more complete mechanistic risk mapping by incorporating multiple biological end-points or modes of action. Figure 6 shows the summarized risk, quantified by the GCA-based hazard quotient (GCA.HQ.10), for all key characteristics of carcinogens (KCC). The rows show the assay-level summarization by the 10, 50, and 90 quartiles (A-10, A-Med, A-90), respectively. Assay-level calculations are summaries over the individual-level metrics (e.g. hazard quotient or assay response) simulated via Monte Carlo sampling from the httk package. The columns show the total summarization, which is the p-th quantile of the assay-level quantiles. This represents a summarization of multiple end-points or modes of action. Here, we chose fifth and tenth quantiles of the hazard quotient as they are conservative risk estimates that avoid the lowest point of departures which are likely capturing cytotoxicity and not the true mode of action [34]. Multi-assay analysis also allows investigation of risk by mode of action or adverse outcome pathways. Figures S1 and S2 in the supporting information show the muti-assay summaries by the KCC for genotoxicity and oxidative stress, respectively. Figure S3 shows the kernel densities of hazard quotients for each assay where each assay is grouped by the KCC mode of action. This analysis highlights the extensibility of the GeoTox framework and the new development of the GeoTox package by demonstrating mechanistically-informed risk mapping from multiple modes of action. For example, further investigations into mechanistically-informed risk can be easily conducted via the GeoTox package by visualizing combinations of the assay-level and total summarization by adjusting the assay quantiles and summary quantiles parameters, respectively.
3.3 Case Study 3: Characterization of population variability in risk estimated from diverse individuals
The GeoTox package is extensibile for estimating and geospatially mapping individual-level risk from chemical mixtures that manifests through individual toxicokinetic variability. This case study highlights how individual toxicokinetic characteristics give rise to the estimated population risks. Figure 7 highlights three North Carolina counties from figure 6 that nominally represent high (Brunswick County, FIPS = 31019), medium (Wake County, FIPS = 37183), and low (Ashe County, FIPS = 37009) risk from the multi-assay risk summarization based on the median or 90th quantile assay and 10th quantile total summarization. We show the individual level metrics as jittered points inside kernel density estimates of the hazard quotient, grouped by weight status (i.e. normal vs obese), county, and by the H2AX histone modification genotoxicity assay and multi-assay metrics. In the high risk county, we see that normal weight individuals’ multi-assay hazard quotient is shifted upwards compared to the medium and low counties. Additionally, the multi-assay hazard quotient for obese individuals in the high risk county has more variability (i.e. a broader, flatter density) than the obese individuals in the medium and low risk counties. In this example, individuals are simulated from underlying populations, but real-world epidemiological cohort data can used as long as geospatial location information and toxicokinetic parameters are known for the individuals. If individual-level spatial information is available such as a residential addresses, then that information can be added to the GeoTox object through the set boundaries() function where the region parameter is an sf point object.
4 Discussion
We present the GeoTox package for source-to-outcome-continuum modeling that integrates geospatial exposure data, toxicokinetic models, and curated high-throughput screening data in a modular, interconnected, and reproducible manner. The package accommodates individual and population level geospatial and toxicokinetic information. The modularity is intentional, so that users can study factors impacting their step(s) of interest. Further, this provides for integration with key models, such as httk, alternative dose-response estimation approaches [27, 35], and geospatial exposure models, as they continue to advance.
The GeoTox package implementation was designed to facilitate analysis of population-level variability and susceptibility to adverse effects from combined exposures to chemical mixtures. The population-level parameters impacting risk are bestowed through individual-level simulations from the httk package. As signals of variability can arise from several factors, GeoTox includes explicit parameters for factors including genetics (simulated css), life stage (age), physiology (inhalation rate), concurrent stressors (exposure), and co-morbidities (obesity). For complete analysis, provisions for sensitivity analysis of all parameters are also built-in. This allows comparison of simulated scenarios with parameters based on real (i.e. epidemiologically studied) populations.
New research will naturally expand and improve upon the parameters that impact the accuracy and uncertainty in GeoTox. For instance, improvements in chemical specific toxicokinetics will lead to comparable improvements in the geospatially mapped risk estimates. Additionally, expanding the chemical space for geospatial exposure maps will complete the exposomic landscape that the framework and package can reasonably capture. The current parameters impacting individual variablity in toxicokinetic processing are age, sex, and obesity. However, it was known that individual genetics such as single-nucleotide polymorphisms (SNPs) can impact toxicokinetic processing. For example, aldehyde dehydrogenase 2 (ALDH2) and aldehyde dehydrogenase 1B (ADH1B) are known variants that greatly impact alcohol metabolism and occurs often in East Asian ancestry [36]. Ginsberg et al. [37] describe six polymorphic enzymes that impact xenobiotic metabolism, which could impart individual-level variability in dose-response to environmental pollutants. Ford et al. [38] have proposed in-vitro NAM for quantifying individual-level variability to chemical responses and highlight the potential for single variants genome wide to impact the individual-level toxicity. If it is known how these variants impact toxicokinetic parameters such as hepatic clearance, then it will endow the framework with the ability to capture individual SNP related differences in risk.
With the introduction of multi-assay or multiple mechanistic endpoint analysis, GeoTox will support next-generation risk assessment (NGRA) approaches that utilize NAMs to reduce our reliance on in-vivo animal models, but still incorporate multiple-levels of biological activity and function. Our case study on mapping the total risk quantified by the KCC modes of action provides a straightforward approach to evaluate risk using NAMs that includes multiple routes or modes of toxicity as recommended by Schmeisser et al. [34]. Moreover, our analysis by each KCC category shows that population-level risk can vary based on the mechanistic endpoint. The simplicity of the analysis and plotting functions allow further exploration of causes and policies around mode-of-action variability in population-level chemical responses. Future NGRA developments that can be integrated into GeoTox include risk metrics that utilize a more complete or sophisticated mechanistic pathway information.
Arguably the most important aspect of GeoTox is our adherence to FAIR practices in software development that will enable an extensible, reproducible, documented, and maintained resource for the geospatial risk assessment community. GeoTox was developed and is maintained according to test-driven development principles, with a series of unit and integration tests ensuring that each function operates correctly on its own and within the cumulative workflow. With adherence to FAIR data principles GeoTox is a tested, reliable, and accessible tool which aims to reduce the barriers associated with integrating geospatial information into NGRA or providing geospatial analysts with news tools to integrate health analysis into their previously pure geospatial analyses. As with any sound science or software, minor fixes and methodological improvements will be incorporated through the continuous integration and development process. We provide the user community the forum to report problems and request new features through the code repository (https://github.com/NIEHS/GeoTox/issues).
5 Conclusion
We present GeoTox, an open-source, tested, reproducible, and extensible software designed for source-to-outcome-continuum modeling. The software provides researchers and practitioners with a tool for integrating geospatial information into NGRA. Our case studies demonstrate its functionality for geospatially mapping the combined chemical mixture risk quantified by (1) a single assay, AOP key event, or mechanistic endpoint, (2) multiple assay response or disease outcome mode of actions, and (3) individual-level assay-based dose-response. The software was designed with modularity and extensibility such that users can utilize any aspect of the workflow or extend its capabilities for addressing novel environmental, spatial, and human health research questions. We believe there is vast potential to utilize GeoTox to elucidate novel genomic and environmental risk questions at both the population and individual level. And most critically, the software is developed with computational best-practices such that it will me maintained and evolve with the needs of the community in addressing complex human health problems.
Data Availability
The software and package data is open-source. The large data examples from the manuscript are available upon reasonable request to the authors.
Acknowledgements
This work is supported by the National Institute of Environmental Health Sciences, Division of Translational Toxicology, Division of Intramural Research, and the Spatiotemporal Exposures and Toxicology group under project number ZIA ES103368-02.
6 List of Abbreviations
- AEP
- Aggregate Exposure Pathway
- AOP
- Adverse Outcome Pathway
- cHTS
- curated high-throughput screening
- GCA
- Generalized Concentration Addition
- httk
- high-throughput toxicokinetics
- KCC
- Key Characteristics of Carcinogens
- NGRA
- Next Generation Risk Assessment
- SNP
- Single Nucleotide Polymorphism