Abstract
Fundus pictures of the eye allow for non-invasive inspection of the microvasculature system of the retina which is informative on cardiovascular health. Automated image processing enables the extraction of morphometric properties of this system as quantitative features that can be used for modelling disease risks.
Here we report the results of the largest genome-wide association study (GWAS) of retinal vessel tortuosity conducted to date using data from the UK Biobank (N=63,899). We identified 87 loci associated with this trait (85 of which are novel). The heritability of the trait was h2=0.23 (0.02). We carried out a replication study on a small independent population-based cohort, SKIPOGH (N=436). While the power of this study was too small to replicate individual hits, the effect size estimates correlated significantly between the two studies (Pearson correlation r=0.55, p=4.6E-6). Using LD score regression, we showed that the alleles associated with retinal vessel tortuosity point to a common genetic architecture of this trait with CVD and related traits.
Our results shed new light on the genetics of cardiovascular risk factors and disease.
Introduction
The fundus of the eye is covered by blood vessels which are essential for bringing oxygen and nutrients to the various tissues of the retina. Fundus photography allows easy and non-invasive inspection of this retinal microvasculature, and it is well known that there are disease-related changes in the morphometric properties of the vessels. The fundus is of interest beyond the field of ophthalmology, since pathological changes in the retinal vessels often reflect those in microvasculature of other organs of major importance. Indeed, based on the homology between the microvasculature in the retina and that found in other organs, retinal analysis has the potential of becoming a powerful screening tool for diseases elsewhere in the body, notably the brain1,2,3,4,5, kidney6,7 and ear8. The retinal microvasculature can therefore provide signs of systemic disease, including increased risk of diabetes9,10,11,12, obesity 13 and cardiovascular disease (CVD)14,15,16,17,18, specifically stroke16,19,19,20,21, coronary heart disease22, coronary artery disease23, hypertension11,24,25,20,26,27,28,29,30,31,32,33,34, atherosclerosis19,20,35 and myocardial infarction36,37. It is also informative of specific eye conditions such as Plus disease in the case of retinopathy of prematurity38,39,38.
In recent years, measuring retinal features in large genotyped cohorts has paved the way for studying the genetic underpinning of these phenotypes and Genome Wide Association Studies (GWAS) have already identified a number of loci40,41 associated with renal vessel size42,4344, optic disc morphology45,46 and vessel tortuosity23.
In this study we carried out the largest GWAS on median retinal vessel tortuosity to date, confirming two known variants and discovering 85 new ones. Our discovery cohort was the UK Biobank, which provides a large collection of retinal images suitable for automatic analysis of morphometric properties of the vasculature of the human eye47. We used data from the much smaller, yet independent, population-based cohort SKIPOGH48,49 for replication analysis. Many of the variants we identified with our GWAS have previously been associated with other traits, specifically CVD and some of its risk factors.
Results
Automated processing of retina images defines retinal tortuosity phenotype
We applied the ARIA50 software for automated processing of 175,821 images from 63,899 individuals available in the UK Biobank. We modified ARIA to operate in batch mode, annotating the blood vessels in each image by extracting a list of points along the midline of each vessel. Using these data, we measured the tortuosity for each vessel (or annotated segment thereof) in terms of the so-called Distance Factor, i.e. the ratio between the path length along the vessel and the distance between its start and end point, as was first suggested in51 used the median retinal vessel tortuosity over all annotated vessels (averaged over multiple images if available) as trait (see Methods for more details).
Retinal vessel tortuosity GWAS identifies 85 novel loci
Applying linear regression of quantile-normalized median retinal vessel tortuosity on the genotypes of the matching subjects imputed to a panel of 15M genetic variants, we identified 6481 significantly associated SNPs (see Supplementary File 1). Applying LD pruning with a threshold of R2 < 0.01 within a window of 500K bases to define independence, we obtained a list of 87 independent lead SNPs (the top 10 are listed in table 1, ordered by statistical significance, and a full list can be found in Supplementary File 2).
Two of the 6481 significant variants, namely rs1808382 and rs7991229, had previously been associated with retinal vessel tortuosity23, while the remaining 85 SNPS represent novel loci associated with this trait (see figure 1 for a Manhattan plot of our genome-wide signals). According to the GWAS Catalogue, there is a third known locus for retinal vessel tortuosity, namely rs73157566. Yet, the association signal of this SNP was only marginally significant and it did not replicate in the replication cohort of the study which reported it23. We also did not replicate this finding (for details about these three variants and the respective genomic regions, see Supplementary Material on known associations with retinal vessel tortuosity).
We also performed LD score regression to estimate heritability, obtaining a h2 = 0.2293 (SE = 0.0229). We did not observe any significant genomic inflation (slope = 1.0135, SE = 0.0103).
Trend of effect sizes replicates in the SKIPOGH cohort
We attempted replication of our lead SNPs from the UKBB analysis in the SKIPOGH cohort48,49 (436 individuals, multiple images per eye, for a total of 1,352 images). 60 out of the 87 lead SNPs were available for comparison. Given the limited sample size of the replication cohort, we lacked power to replicate the individual associations found in the discovery cohort, as none of them survived Bonferroni correction (p = 0.05/60 = 8.3E-4). Nevertheless, the effect size estimates using SKIPOGH data showed good concordance with those from the UKBB (see Supplementary File 3). First, 42 of 60 lead SNPs had the same sign of their effect size estimate in both studies (binomial test p = 5.3E-4). Second, we observed a Pearson correlation of r = 0.55 (p = 4.6E-6) across these estimates (see figure 2). Both results stay significant when removing outliers (see Supplementary Material, replication of effect sizes without outliers).
Tortuosity variants are associated with numerous diseases
A shared genetic basis of retinal tortuosity and Coronary Artery Disease had already been noted for locus rs1808382 (mapped to the ACTN4/CAPN12 genes), underlining the usefulness of retinal vascular traits as biomarkers for cardiovascular diseases23. We replicated this finding and asked to what extent it also applies to the large panel of new variants we associated with retinal vessel tortuosity. Querying the GWAS Catalogue52 for our hits revealed 9 loci linked to genes that had been reported as genome-wide significant in associations with other diseases including coronary heart disease, myocardial infarction, arterial hypertension, type 2 diabetes, chronic lymphocytic leukemia, Alzheimer’s disease, diverticular disease, glaucoma and myopia (see table 2). Besides these 9 loci, we also uncovered 26 additional SNPs with pleiotropic effects on various diseases which could not be confidently mapped to a specific gene (see full list in Supplementary Material variants associated with disease outcome). We next expanded our query to include phenotypes known to confer a disease risk. We report a list of 12 loci linked to genes influencing both tortuosity and disease risk factors (see table 3). Furthermore, we uncovered another 9 SNPs showing similar pleiotropic properties, which could not be confidently mapped to a specific gene (see Supplementary Material).
SNP-level statistics were aggregated to obtain gene-wise association scores using the tool Pascal53. The results of our gene-wise association are summarized in figure 3: red squares mark disease genes, reported in table 2. Similarly, green squares indicate risk factor genes from table 3.
Genetic signal is shared with hypertension and CVD
We extended our analysis of overlap with known genetic signal beyond variants with the same rsID, considering SNPs belonging to the same LD block (R2 > 0.8). Figure 4 shows how many of the variants identified by our retinal tortuosity GWAS had previously been reported as being associated with any of the large range of complex traits in the GWAS Catalogue52. A number of traits that stand out. Firstly, both diastolic (49) and systolic blood pressure (46) have been associated with many of our newly identified loci. Second, also pulse pressure and BMI share 19 associated loci. We note that both elevated blood pressure and BMI are well-known CVD risk factors, which we purposefully did not use as covariates in our GWAS, so as to be able to study the overlap in signal, even though correcting for these traits would have left the association signals largely unchanged (see Supplementary Material on Analysis of potential confounders). Furthermore, we observe a sizable number of tortuosity-associated variants overlapping with coronary artery disease variants, in line with what has recently been reported by a smaller scale GWAS on retinal vessel tortuosity23. For two additional phenotypes with a sizable overlap of trait associated variants, namely blood protein levels and bone mineral density, the relationship to tortuosity is less obvious, but might point to common pathogenic mechanisms. Similarly, for some of the other traits sharing several associated variants, notably, Type I and Type II diabetes, colorectal cancer, cholesterol, lung function, skin and eye pigmentation, and autoimmune diseases, there could be joint genetically modulated pathways, but some of these common associations may also just be spurious (see Supplementary File 4 for the full list of phenotypes and references to publications).
Discussion
GWAS Analysis
Adapting the retina image processing tool ARIA to run on a cluster facilitated the extraction of median retinal vessel tortuosity estimates for close 64 thousand subjects of the UK Biobank, enabling a GWAS for this trait with substantially increasing power compared to previous studies. This gain in power resulted in the identification of 85 novel loci and the replication of 2 out of 3 associations known from previous studies, providing a substantially improved picture of the genetic architecture of this trait.
We detected pleiotropic effects of 10 tortuosity variants associated with disease, specifically CVD-related diseases (Coronary Artery disease, Coronary Heart disease, Myocardial infarction, Hypertension), systemic diseases (diabetes, chronic lymphocytic leukemia, Alzheimer’s disease) and ophthalmological conditions (myopia, glaucoma). Our results only link these diseases through common associated genetic variants to retinal vessel tortuosity. While one might speculate that this trait may reflect pathological developments of blood vessels that are causally upstream of some of these diseases, establishing such causal links will require more work, including the application of Mendelian randomization54,55.
This study was subject to several limitations. First, our tortuosity measurements combine those of arteries and veins, while most ophthalmological studies distinguish between arterial and venular tortuosity. This compromise was made because we could not fully automate vessel classification. Indeed this is a difficult problem that still seems to require some expert input at least for some images or vessels, which would have prevented us from analysing such a large set of retinal images. Yet, the noise introduced by mixing arteries and veins, apparently was outweighed by the gain in sensitivity we achieved, as evidenced by the large number of associated loci. Subsequent studies using vessel annotations distinguishing their type may test whether these loci obtain different effect sizes for arterial and venular tortuosity. The second limitation of our study was that the software tool we used estimated tortuosity by the Distance Factor51, a global measure which may not be ideal to capture vessel pathology (that may be better described by more local measures such as curvature56, based on which, potentially more disease-relevant measures have been proposed57,58). We provide statistics about the distributions of vessel lengths in our dataset and repeated our GWAS using only a subset of relatively short vessel segments on which we measured Distance Factor tortuosity, but observed no dramatic change in the observed effect size estimates of our top hits (see Supplementary Material on Tortuosity of short vessels). Also, we did not adjust for spherical equivalent refractive error, which might have confounded our measurements to some degree. The third limitation of our study was the small size of the replication cohort, which prevented us from replicating any individual hits. Nevertheless, the effects sizes in the replication study correlated strongly with those in the discovery cohort, providing independent evidence that they were not driven by any artifact specific to UKBB.
In conclusion, our highly powered GWAS on median retinal vessel tortuosity identified 85 novel loci in the maintenance of the microvasculature system, or failure thereof, as precursors or symptoms of complex diseases.
Materials and Methods
Definition of tortuosity
Our study assessed tortuosity using the measurements provided by the ARIA software50. We estimated tortuosity as the total vessel length divided by the euclidean distance between the vessel segment endpoints.
A number of measures have been designed to estimate vascular tortuosity. The measure adopted by the ARIA tool, in particular, is reported in a recent review as the AOC measure (Arc Over Chord ratio)56. In an earlier work on retinal vascular tortuosity, this measure was referred to as Tau 159 (an equivalent formulation in which a unit of one is subtracted). The measure was originally proposed (in the context of the femoral artery) by Smedby et al. as the Distance Factor51
UK Biobank phenotypes
Our data was collected as part of the UK Biobank effort. The UK Biobank is a large-scale study that includes over half a million volunteers from the UK (502,505 participants, collection years 2006-2010). It includes a repeated assessment phase (20,000 participants, collection years 2012-2013). The age of participants ranges between 40 and 69 (median age 59), roughly balanced between sexes (229,122 males and 273,383 females).
175,821 fundus eye images (87,562 images of left eyes and 88,259 images of right eyes) were available at the time of required data extraction. We processed all images, including those from the reassessment time point. Other phenotypes were used to correct biases in the genetic associations (age, sex, PCs of genotypes) or to study correlation with disease and lifestyle (all other phenotypes). The following health statistics are of interest to interpret the medical implications of our analysis: 26,989 participants (5.3%) reported being diabetics, 16,787 (3.3%) reported being diagnosed with angina, 12,226 participants (2.4%) had a heart attack, 10,472 (2.0%) had deep-vein thrombosis (DVT, blood clot in leg), 8,216 (1.6%) were diagnosed with stroke, median DBP was 81 mm/Hg, median SBP was 136 mm/Hg, 227,360 (45.2%) received medication.
Data extraction
The tortuosity phenotype measure was extracted using a modified version of the software ARIA by Peter Bunkehad50. We modified this tool to run in batch mode dumping vessel statistics to disk in the process, processing images without the need for human interaction. The ARIA parameters used for vessel extraction were the default applied by the software for tests to images from the database REVIEW60.
We now describe the phenotype extraction quality control procedure. The ARIA tool was used to perform segmentation of blood vessels and measurement of both their tortuosity and diameters. The software is designed to perform vessel diameter measurements at regular intervals along the centerline of each vessel, so that the number of measured diameters could be used as a proxy for the total length of the vascular system depicted in one image. Images for which less than 11 thousand equally spaced diameters could be measured were discarded. This threshold was (conservatively) set by visual inspection to discriminate lower quality images that were too dark, too light or out of focus. Roughly two out of three images passed this quality control, with a total of 120,363 images being sent forward in the pipeline.
Postprocessing of the data consisted in averaging the values derived from the left and right eye of each participant (for the resulting distribution, refer to Supplementary Material on Tortuosity estimates in UKBB)
The data extraction pipeline was written in python and bash and was run on a cluster using SLURM.
UK Biobank genotype data
Around 488,000 participants were genotyped on Axiom arrays for a total of 805,426 markers. From this, about 96 million genotypes were imputed using a combined reference panel from the 1000 Genomes and UK10K projects61. The annotation used to report variant positions is the Genome Reference Consortium Human genome build 37 (known as GRChb37 or hg37). We subset the genotypes using the software BGENIX, shrinking the list of investigated variants to those that have been assigned an rsID (around 15 million SNPs). An additional Quality Control was performed via a postprocessing step on the GWAS output. We filtered out SNPs with MAF < 5E-4 (which given the sample size of 63,899 subjects, translates to having an average of over 30 individuals having at least one minor allele). We filtered out SNPs with imputation quality < 0.3, as used in Ref. [62].
The SKIPOGH study
We performed replication of the GWAS results in the SKIPOGH cohort48,49 (Swiss Kidney Project On Genes in Hypertension) which is a Swiss family-based population-based cohort that includes 1’042 participants (493 males and 549 females), aged between 18 and 96 years old, which have been extensively phenotyped at baseline and in a 3-year follow-up. Participants were recruited from three different locations in Switzerland, namely, Bern, Geneva and Lausanne. The genotyping was performed with the Illumina 2.5 omni chip, followed by an imputation based on HRC v1.1 panel using Minimach3. The annotation used to report variant positions is the Genome Reference Consortium Human genome build 37 (GRChb37).
Genome-wide association analysis
The raw tortuosity measures extracted from the image data were transformed in order to correct for confounding effects that would bias the genetic association analysis (for an overview of the confounder analysis that we performed, please refer to Supplementary Material on Analysis of potential confounders). Only variables that showed a statistically significant correlation to tortuosity were corrected for. Specifically, we applied the linear model:
A rank-based inverse normal transformation was applied to the residuals of this linear model and the GWAS was run on the output as a univariate linear regression without confounders (refer to Supplementary Material to inspect the resulting distributions).
The genetic association study was run using this software BGENIE63. The (unpruned) output consisted of 6481 significant SNPs (see Supplementary File 1).
The list of independent SNPs was calculated by performing LD pruning using the LDpair function of the R package LDlinkR64, selecting as a reference panel “GBR” (Great Britain) from the 1k genome project. Two SNPs were considered as independent if they had LD R2 < 0.01 or were more than 500K bases apart (see Supplementary File 2). This resulted in 87 independent lead SNPs. LD pruning was repeated with an alternative LD threshold of R2 < 0.1 (for comparison with other GWAS studies, where such value is used) resulting in a list of 124 significant SNPs (see Supplementary File 5).
The association analysis at replication stage for the SKIPOGH cohort was performed using the Emmax function of the Epacts software in order to account for family structure by using the kinship matrix in the model. Additionally, the recruitment center was included as covariable.
Summary plots were generated using the R packages qqman65 and the GWASTools66.
Heritability
We carried out LD score regression using the software LDHub67. The portion of phenotypic variance cumulatively explained by the SNPs was h2=0.2293 (0.0229). The measure of inflation was lambda_GC=1.1364; lambda GC measures the effect of confounding and polygenicity acting on the trait. The mean chi-square statistic mean_X2=1.2941. The LD Score regression intercept was 1.0135 (0.0103); an intercept close 1 indicates little influence of confounders (mostly of population stratification). The ratio of the proportion of the inflation in the mean X2 that is not due to polygenicity was 0.0459 (0.0352); a ratio close to 0 is desirable as it indicates low inflation from population stratification.
Shared genetic architecture with disease
SNP variants overlap with disease phenotypes (same rsID) was analysed using the EBI’s GWAS Catalogue52. We report independent SNPs in the tortuosity GWAS that are part of the GWAS Catalogue because associated to disease (or to a disease-related phenotype). This analysis was extended to genes and pathways using FUMA68. We list independent SNPs in the tortuosity GWAS who were in LD with SNPs that had already been reported in the GWAS Catalogue (see Supplementary File 4 and Supplementary File 6).
Data Availability
The data are made available by the UK Biobank Consortium.
Supplementary Material
Please refer to the Supplementary material file.
Author contributions
MT, MuB and SB designed the study. MT performed tortuosity measurements of the raw image data from the UKBB and the SKIPOGH cohort and post-processed it. MT carried out the GWAS and subsequent bioinformatics analyses, with the guidance of SB, NM and EP. TC performed the replication analysis in SKIPOGH. HAZ provided medical and ophthalmological expertise. MiB helped in quality controlling the retina images. MT and SB wrote the manuscript.
Funding
This work was supported by the Swiss National Science Foundation grant no. FN 310030_152724/1 to SB. The SKIPOGH study was supported by the Swiss National Science Foundation (#FN 33CM30-124087 to MuB).
Conflict of Interest statement
None declared.
Acknowledgements
Thanks to Micha Hersch for inspiring this project to the UK Biobank team for their support and responsiveness, and to all UKBB participants for sharing their data.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.
- 43.
- 44.
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.↵
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.