SUMMARY
This report describes a comparison of two geocoding methods used by the Cohorts for Environmental Exposures and Cancer Risks (CEECR) consortium: ArcGIS Geocoding by Esri and the SAS GEOCODE Procedure. The goal of this report is to determine the comparability of data sets that employ different approaches for linking survey data with spatial surrogates of exposure to environmental and socioeconomic factors. ArcGIS and SAS GEOCODE were selected as two platforms for this comparison because both programs are being used by one or more CEECR cohort study teams and they can be used locally offline. The latter minimizes confidentiality issues related to online data linkages. Residential addresses from 3,238 Wisconsin residents in the Cancer & COVID Study and the Wisconsin in situ Cohort (WISC) were geocoded and linked to eight different publicly available datasets of environmental and socioeconomic factors at various geographic scales using both geocoding platforms. Since the two analytic platforms vary in geocoding approaches, the validity and accuracy of both platforms were compared to examine differences when assigning surrogate measurements of exposure based on spatial locations. ArcGIS offered a higher specificity for matched addresses with slightly more latitude/longitude point and street matches (97.7%) than SAS (95.9%), with the remainder matching at the zip code level. The two geocoding platforms showed high concordance in assignment at the county (99.6%), census tract (96.5%), and census block group (94.7%). As a result, the correlations based on census tracts and block groups were very strong for linked exposure measures of socioeconomic status, environmental justice, urban/rural residence, air pollution, proximity to industrial sites, and cancer risk (all intraclass correlation coefficients ≥98%). Slightly lower concordance was observed for point source linkages (intraclass correlation coefficients 96-97%). Approximately ~4% of addresses were mis-matched largely in rural areas where census areas are larger and accurate geocoding base-layers are less widely available than in urban areas. For researchers that are already utilizing SAS, the GEOCODE procedure can be a logical choice as it is included in base SAS software and does not require an additional cost. However, SAS and ArcGIS provide similar options for the vast majority of study address locations.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This report was supported by grant U24CA265813 by the National Cancer Institute and the National Institute of Environmental Health Sciences, and by grant P30CA014520 from the National Cancer Institute.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This project was approved by the University of Wisconsin Health Sciences Institutional Review Board.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
This report was supported by grant U24CA265813 by the National Cancer Institute and the National Institute of Environmental Health Sciences, and by grant P30CA014520 from the National Cancer Institute
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.