Diabetic retinopathy environment-wide association study (EWAS) in NHANES 2005-8 =============================================================================== * Kevin Blighe * Sarega Gurudas * Ying Lee * Sobha Sivaprasad ## SUMMARY **Background** Several circulating biomarkers are reported to be associated with diabetic retinopathy (DR). However, their relative contributions to DR compared to known risk factors, such as hyperglycemia, hypertension, and hyperlipidemia, remain unclear. In this data driven study, we used novel models to evaluate the associations of over 400 laboratory parameters with DR. **Methods** We performed an environment-wide association study (EWAS) of laboratory parameters available in National Health and Nutrition Examination Survey (NHANES) 2007-8 in individuals with diabetes with DR as the outcome (test set). We employed independent variable (‘feature’) selection approaches, including parallelized univariate regression modeling, Principal Component Analysis (PCA), penalized regression, and RandomForest™. These models were replicated in NHANES 2005-6 (replication set). **Findings** The test and replication set consisted of 1025 and 637 individuals with available DR status and laboratory data respectively. Glycohemoglobin (HbA1c) was the strongest risk factor for DR. Our PCA-based approach produced a model that incorporated 18 principal components (PCs) that had AUC 0.796 (95% CI 0.761-0.832), while penalized regression identified a 9-feature model with 78.51% accuracy and AUC 0.74 (95% CI 0.72-0.77). RandomForest™ identified a 31-feature model with 78.4% accuracy and AUC 0.71 (95% CI 0.65-0.77). On grouping the selected variables in our RandomForest™, hyperglycemia alone achieved AUC 0.72 (95% CI 0.68-0.76). The AUC increased to 0.84 (95% CI 0.78-0.9) when the model also included hypertension, hypercholesterolemia, hematocrit, renal and liver function tests. **Interpretation** All models showed that the contributions of established risk factors of DR especially hyperglycemia outweigh other laboratory parameters available in NHANES. **RESEARCH IN CONTEXT** What is already known about this subject? * ▪ There are >500 publications that report associations of candidate circulating biomarkers with diabetic retinopathy (DR). * ▪ Although hyperglycemia, hypertension, and hyperlipidemia are established risk factors, they do not always explain the variance of this complication in people with diabetes; DR also shares risk factors with other diabetes complications including markers of renal and cardiovascular disease. * ▪ ‘Holistic’ studies that quantify risk across all of these parameters combined are lacking. What is the key question? * ▪ It is unclear whether risk models for DR may be improved by adding some of these reported biomarkers - there is an unmet need to systematically evaluate as many circulating biomarkers as possible to help rank their associations with DR. What are the new findings? * ▪ We show that hyperglycemia is the strongest risk factor across all models. * ▪ We stratified the rest of the highest ranked parameters into groups related to diabetes control, renal and liver function, and hematocrit changes. How might this impact on clinical practice in the foreseeable future? * ▪ The importance of focusing on parameters beyond hyperglycemia control to reduce risk of progression from diabetes to DR is emphasized. ## INTRODUCTION Diabetes represents the most common cause of microvascular changes in the retina. The initial retinal lesions of diabetic retinopathy (DR) are microaneurysms but they can occur in eyes with and without diabetes (1-3). With increasing duration of diabetes, other lesions develop and co-exist in the retina such as retinal hemorrhages, exudates, intraretinal microvascular abnormalities and neovascularization of the retina or optic disc. Based on the presence of individual lesions or a constellation of them, DR severity level is graded from mild, moderate and severe non-proliferative diabetic retinopathy (NPDR) to proliferative diabetic retinopathy (PDR) (4, 5). Diabetic macular edema (DME) can occur in any stage of DR (5). In population-based studies, approximately a third of people with diabetes have DR (6, 7). The established systemic risk factors for DR are suboptimal control of hyperglycemia, hypertension and hyperlipidemia (8, 9). Hypertension can also cause some of these retinal lesions independent of diabetes (10). There are several laboratory parameters that have been shown to be abnormal in people with DR such as hyperuricemia (11), low vitamin D levels (12), low thyroxine levels (13), anemia (14), oxidative stress and inflammatory markers (15). In addition, DR is also associated with markers of diabetic kidney disease including microalbuminuria and serum creatinine (16, 17) and cardiovascular disease markers such as raised C-reactive protein (CRP) (18). Most of these associations and risks of DR are reported based on analysis of candidate laboratory-based serum or urinary markers. In addition to these risk factors, there are several other non-modifiable and modifiable risk factors that have been attributed to the development and progression of DR. Some of these include age of onset of diabetes, duration of diabetes, male sex, and ethnicity (19-21). There is an unmet need to rank these reported retinal, systemic and laboratory risk factors to understand their relative contributions or associations with DR in people with diabetes. The National Health and Nutrition Examination Survey (NHANES - [https://wwwn.cdc.gov/Nchs/Nhanes/](https://wwwn.cdc.gov/Nchs/Nhanes/)) was initiated in the 1960s in order to examine the health and nutritional status of US citizens and has been surveying the population up to the present time. Since 1999, it has examined ∼5000 citizens per year and includes various topics, including cardiovascular disease, diabetes, environmental exposures, eye diseases, hearing loss, infectious diseases, kidney disease, nutrition, etc. The data also contains several laboratory markers including environmental toxins, allergens, pollutants. In this study, we used an environment wide association study (EWAS) methodology (22-25) on NHANES 2007-8 to evaluate the rank order of systemic and laboratory risks of DR among individuals identified as having diabetes to evaluate their relative associations with DR. Our findings are then replicated in NHANES 2005/6. Our objective is not to only use previously reported risk factors but also provide new research avenues from this data driven agnostic modelling study. ## METHODS ### Study data preparation We used National Health and Nutrition Examination Survey (NHANES) 2007-8 as our primary cohort and 2005-6 as a replication cohort. Both datasets were prepared in the same fashion, however, for ease of interpretation, the following methods describe 2007-8. Specifically, three main categories of data were used: examination data (*Ophthalmology - Retinal Imaging data*; OPXRET_E), demographics data (DEMO_E), and laboratory data (**Figure 1** footnote). The main outcome of interest in the examination data was *4 levels retinopathy severity, worse eye* (OPDURL4) – this variable was recoded as binary with levels: no retinopathy; retinopathy (including mild NPR, moderate/severe NPR, and proliferative). All datasets were downloaded as SAS XPORT (xpt) format and read into R (v4.0.2) via the *Hmisc* package. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/22/2020.09.20.20198218/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/F1) Figure 1. Cohort selection process for NHANES 2007-8. Individuals with a missing value in the main outcome variable were removed before aligning the examination, demographics, and laboratory data via each individual’s respondent sequence number (SEQN). This dataset was then further filtered for only those individuals who had diabetes (**Figure 1**). Variables were removed from the data that had 0 variance (i.e. constant values) (**Supplementary Table 1**). Prior to any analysis, in addition, any variable that contained a single value occupying > 90% of total values was removed, as were variables that had > 90% missingness. Further specific filtering and encoding was then applied per dataset. [A] Examination data: variables that were different encodings of the main outcome were removed; variables that related to the status of the examination appointment were removed; OPDUHMA was removed, as it is a combination of 2 other variables that were retained (OPDUMA and OPDUHEM); variables related to glaucoma, for which there is already a single variable, were removed; variables related to the left or right eye where there was already a variable for ‘worse’ eye were removed; values encoded as missing were recoded as NA; and all other remaining variables were encoded as binary, with 0 representing the absence of the condition, and 1 representing any recorded presence (at any level) of the condition. [B] Demographic data: variables associated with interpreters and the language of the interview were removed; variables that were duplicates or different encoding of each were removed. [C] Laboratory data: categorical variables were removed and only continuous retained; duplicate variables related to the oral glucose tolerance test (OGTT_E) were removed; variables related to time since domestic activities (‘pump gas’, ‘shower’, etc.) were removed; variables that were duplicates or different encodings of each were removed; variables measured on the imperial system of weights and measures were removed if they had a corresponding variable in SI units. We focused only on continuous laboratory variables for the following reasons: 1, in NHANES, the majority of categorical variables are derived from the continuous variables; 2, our PCA-based approach can only work on continuous variables; 3, for RandomForest™, having continuous variables increases the number of splitting points in the data, and metrics of importance such as Gini are known to exhibit less bias on such data (26). ### Diabetes status To define the diabetes status for each individual, questionnaire data (DIQ_E) was used in addition to variables already included in the laboratory data. Diabetes status was then defined as an individual satisfying any of the following: Self-reported diabetes (DIQ010); on anti-diabetes drugs (DIQ070); taking insulin (DIQ050); fasting blood sugar (FBS) ≥ 6.1 (110mg/dl) (LBDGLUSI); random blood sugar (RBS) ≥ 11.1 (200mg/dl) (LBDSGLSI); oral glucose tolerance test (OGTT) ≥ 200mg/dl (LBDGLTSI); Glycohemoglobin (HbA1c) ≥ 6.5% (LBXGH). ### Covariates Age, ethnicity, and diabetes duration were used as covariates. Diabetes duration was calculated as age at screening minus the age at which the individual was first informed that he/she had diabetes. ### Statistical analysis Prior to statistical analysis, continuous laboratory variables were logged (loge) and then transformed into z-scores to ensure that these were on the same scale. In regression analysis, the complex sampling design of the NHANES dataset was accounted for through use of survey sampling weights via the *survey* package in R / CRAN. To do this, the following value-pairs were used with the *svydesign* function: (*id*, SDMVPSU; *strata*, SDMVSTRA; *weights*, WTMEC2YR; *nest*, TRUE). Univariate analysis was performed on all candidate predictors using a survey-weighted compute-parallelized logistic regression model via the R / Bioconductor package *RegParallel*, adjusting for age, ethnicity, and duration of diabetes separately. The Benjamini-Hochberg (27) procedure was used to control the type I error false discovery rate (FDR). A customized Manhattan plot was generated using *ggplot2*, while pairwise scatter and correlation plots were generated via a customized pairs plot. Finally, a heatmap was generated via the R / Bioconductor package *ComplexHeatmap*. As our study is also hypothesis-generating, multivariate approaches based on principal component analysis (PCA), penalized regression, and the RandomForest™ classification algorithm were additionally used. Variables were pre-filtered and prepared as per univariate testing. Principal component analysis was performed via the R / Bioconductor package *PCAtools*. After conducting PCA, each eigenvector was then independently regressed against retinopathy outcome via binary logistic regression and those that passed p≤0.05 were used to construct a multivariable model that was further tested in ROC analysis via the *pROC* package in R. Separately, as model complexity and multi-collinearity can arise from a large number of predictors, elastic net regularization (penalized regression with L1 and L2 penalties of the Lasso and Ridge methods) was used to reduce the number of predictor variables using *glmnet* in R / CRAN. To fit the model, 100x cross-validation was used and alpha (α) set to 0.5. The final chosen variables were those whose coefficients were not shrunk to zero – these were plot as violin plots with scatter overlays to show differences between non-DR and DR via *ggplot2*. To determine accuracy, model predictions were made on the data using the lambda (λ) one-standard-error rule using the *predict* function from the *stats* package in R. Finally, the RandomForest™ (RF) model was fitted via the *randomForest* R / CRAN package. For this, the dataset was divided randomly into 50% training and 50% validation. Prior to model fitting, the initial model was tuned using functionality provided by the *caret* package in R / CRAN, as follows: 1), a 10x cross-validation control function was defined via *trainControl* function; 2) the best value for ‘mtry’, i.e., the ideal number of variables to randomly sample, was determined using the *train* function across a search / tuning grid ranging between 1-40 and with Kappa as the metric; and 3) using the selected value of ‘mtry’, the ideal number of trees, ‘ntrees’, was determined also via the *train* function with selection metric based on Kappa. After the initial model was fit, variables with mean decrease in accuracy ≤ 1% were excluded and the model re-fit. This was then repeated in a recursive fashion until all variables with negative mean decrease accuracy were removed from the model. ### Final risk models Variables selected from RandomForest™ were grouped based on similarity of function or clinical use. Each group was then used to create independent univariate or multivariable binary logistic regression models with DR as the end-point. A single Wald test p-value was derived for each model using *wald*.*test* from the *aod* package. ROC analysis was performed using *pROC*. McFadden’s and Nagelkerke’s pseudo-R2 were derived via the *pscl* and *rms* packages, respectively. ### Role of the funding source The funders had no role in study design, data collection, data analysis, data interpretation, writing, editing the report, or the decision to submit for publication. ## RESULTS ### Study cohort In NHANES 2007-8, retinal imaging data is available for 3863 individuals, demographics data is available for 10149 individuals, and laboratory data is available for between 394 and 9307 individuals, depending on the individual laboratory dataset in NHANES (see **Figure 1** footnote). After aligning all data and filtering for those who had diabetes by our classification, 1025 individuals remained in our dataset. The selection process is illustrated in **Figure 1**, while **Table 1** provides an overview of the demographics of these individuals. View this table: [Table 1.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/T1) Table 1. Demographic overview of study cohort. For our replication cohort, NHANES 2005-6, we prepared laboratory data following the same filter criteria as NHANES 2007-8 and produced a final dataset of 2459 individuals, among which 637 (with retinopathy, 176; no retinopathy, 461) had diabetes. ### Retinal lesions of diabetic retinopathy To help validate our methodology and cohort selection, we aimed to determine retinal lesions that define DR. To this end, we identified 9 retinal lesions in NHANES 2007-8 that were statistically significantly associated with DR and survived to p-value adjustment for false discovery (**Table 2**). The top lesions were retinal microaneurysms (p≤0.0001), followed by retinal hard exudates (typically due to lipoprotein deposition in the retina and may be associated with macular edema) (p≤0.0001). Other key lesions at p≤0.0001 were retinal soft exudate (now termed cotton wool spots), retinal blot hemorrhages, intraretinal microvascular abnormalities (IRMA), and macular edema. In NHANES, retinal microaneurysms and retinal blot hemorrhages are encoded to be mutually exclusive, i.e., an individual is recorded as having retinal microaneurysms only when not accompanied with retinal blot hemorrhages, and *vice-versa* (**Table 2**). View this table: [Table 2.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/T2) Table 2. Retinal lesions that constitute diabetic retinopathy outcome. ### Univariate logistic regression analysis In total, 6 variables reached statistical significance in the unadjusted univariate analysis, 11 after adjustment for age, 2 after adjustment for ethnicity, and 7 for diabetes duration (**Supplementary Figure 1; Table 3**). Glycohemoglobin (HbA1c) was the only variable that was statistically significant in both the unadjusted and adjusted analyses. Other risk variables of note that reached statistical significance in the unadjusted analysis included serum glucose (mmol/L) (i.e., RBS), osmolality (mmol/Kg), urinary albumin (mg/L), and fasting glucose (mmol/L) (i.e., FBS). The only protective variable, i.e., negatively associated, was hemoglobin (g/dL). These variables indicate suboptimal diabetes control, abnormal kidney function and presence of anemia as risk factors for DR. There was evidence of co-correlation among these statistically significant variables from the unadjusted analysis (**Supplementary Figure 2**). View this table: [Table 3.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/T3) Table 3. Laboratory variables associated with retinopathy in individuals with diabetes. Interestingly, after adjustment for diabetes duration, the following variables reached statistical significance: HbA1c (%), osmolality (mmol/Kg), urinary iodine (µg/L), urinary cobalt (µg/L), urinary triclosan (ng/mL), urinary creatinine (µmol/L), and urinary barium (µg/L). ### Principal component analysis Unsupervised PCA using all laboratory variables revealed that 59 PCs could account for 80% or more variation in the dataset. Eighteen PCs were statistically significantly associated with DR at p≤0.05 via independent binomial regression models testing each PC (**Supplementary Table 2**). The top variables responsible for variation along these PCs included measures of blood glucose (HbA1c, random blood glucose and fasting blood glucose), kidney function markers (urinary albumin, blood urea nitrogen [BUN]), hematological markers (hematocrit, hemoglobin, red blood cell distribution width), inflammatory markers (CRP), white blood cell count, urinary nitrates, segmented neutrophil count), and toxic elements (urinary beryllium and cotinine) among others – these PCs were also statistically significantly correlated to microaneurysms, the previously-identified top retinal lesion, and the covariates used during univariate testing **(Supplementary Figure 3**). Through ROC analysis, these 18 PCs achieved AUC 0.796 (95% CI: 0.761-0.832). ### Penalized regression model We fitted an unbiased elastic-net penalized regression model to the laboratory variables and cross-validated it 100x. The model selected 9 variables whose coefficients were not shrunk to zero: urinary albumin, BUN, urinary cobalt, CRP, HbA1c, blood osmolality, serum potassium, systolic blood pressure, and urinary nitrate (**Figure 2**). Of note, these measurements mainly represent diabetes and blood pressure control and kidney function. This model had an accuracy of 78.51% and AUC 0.74 (95% CI: 0.72-0.77) when predicted on the same dataset on which the model was produced. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/22/2020.09.20.20198218/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/F2) Figure 2. Penalized regression-selected variables. Variables were selected from a 100x cross-validated model with α=0.5. Final variable selection was based on coefficients not shrunk to 0. Model accuracy was determined to be 78.4% accuracy and AUC 0.71 (95% CI: 0.65-0.77). ### RandomForest™ classification model From our RandomForest™ model, HbA1c was the single best predictor of DR (mean decrease accuracy, 31.94%; Gini, 21.75) (**Table 4**). However, other notable variables of appreciable accuracy were markers of diabetes control (FBS, RBS) inflammation (CRP), kidney function (potassium, BUN, creatinine and urinary albumin), hematological markers (hematocrit), systolic blood pressure, among others. The overall accuracy of the model on the validation cohort was 78.4% and AUC 0.71 (95% CI: 0.65-0.77). View this table: [Table 4.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/T4) Table 4. RandomForest™-selected variables (features). ### Replication cohort In the NHANES 2005-6 replication cohort, we performed penalized regression and RandomForest™ in the same way as per the 2007-8 cohort. Our penalized regression model identified urinary albumin (mg/L), cockroach IgE antibody (kU/L), HbA1c (%), hemoglobin (g/dL), and urinary nitrate (ng/mL), with a model accuracy of 73.16% and AUC 0.76 (95% CI: 0.73-0.78). RandomForest™ identified HbA1c (%) as the variable contributing most to accuracy (mean decrease 16.96%), with many other variables contributing appreciable accuracy to the overall model (**Supplementary Table 3**) - the overall model accuracy was 72.98% and AUC 0.68 (95% CI: 0.61-0.75). ### Final clinical risk models The 31 features identified by RandomForest™ (**Table 4**) were grouped into different categories of blood tests according to diabetes status, hematocrit values, blood pressure (BP), immune markers, renal function, sterols, toxins and metals, and liver function. When modeled against DR outcome, each group varied in performance; diabetes tests alone achieved AUC 0.72 (95% CI: 0.68-0.76). A final clinical risk model comprising diabetes tests, BP, renal and liver function tests, hematocrit values, circulating sterols and immune markers achieved AUC 0.84 (95% CI: 0.78-0.9) (p=0.00013) (Nagelkerke R2 0.36) (**Table 5**; **Figure 3**). ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/22/2020.09.20.20198218/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/F3) Figure 3. Final clinical risk models. Features from RandomForest™ were grouped logically based on similar function or clinical use (**Table 4**) and then tested independently in a univariate or multivariate regression model against DR outcome. A final risk model including markers of hypertension, hypercholesterolemia, renal and liver function tests, and hematocrit achieved AUC 0.84 (0.78-0.9). View this table: [Table 5.](http://medrxiv.org/content/early/2020/09/22/2020.09.20.20198218/T5) Table 5. Final clinical risk models. ## DISCUSSION This EWAS of NHANES 2007-8 data with DR outcomes in individuals with diabetes included an unbiased feature selection approach based on a rudimentary univariate regression enabled for compute parallelization, PCA, penalized regression, and RandomForest™ of a large number of laboratory parameters. In contrast, epidemiological studies are typically conducted based on pre-conceived hypotheses and involve a single or just a few variables. These methods can be scaled to datasets of any size and therefore provide ways of working with large clinical and epidemiological datasets for the purpose of searching for novel hypotheses that could then lead to further focused investigations. In our rudimentary approach, which is ultimately running many univariate models in a parallelized fashion, HbA1c was the only variable to reach statistical significance after adjusted for age, ethnicity, and diabetes duration. The relationship between HbA1c and DR has been explored extensively and was selected as the strongest risk factor in every approach we undertook, with a mean decrease accuracy of 31.94% via RandomForest™. Our penalized regression and RandomForest™ algorithms also identified an association between elevated systolic blood pressure —but not diastolic— and DR (mean decrease accuracy, 1.9%), again confirming literature (10, 28-30). Further risk variables identified by both penalized regression and RandomForest™ were renal function tests including BUN, urinary albumin, potassium, osmolality, and urinary nitrate. These confirm the strong association of DR with markers of impaired kidney function. Other known risk factors that contributed higher up in the ranking order include hematocrit (%) and cholesterol. Although HbA1c has the strongest association with DR, our study highlight how the addition of other clinical parameters, e.g., from renal and liver function and hematocrit can increase the sensitivity and specificity of predicting DR outcome, with our final clinical risk model achieving AUC 0.83 (95% CI: 0.77-0.89) (p=0.00012) (Nagelkerke R2 0.33), higher than any traditional diabetes control parameter in isolation or in combination. The EWAS methodology and our RandomForest™ approach of non-targeted recursive feature also indicates a small contribution from toxins and metals, including 3-hydroxyphenanthrene, 9-hydroxyfluorene, phthalates, blood o-Xylene, and blood nitromethane. Therefore, retina may be a target tissue for environmental contamination. Some of the associations provide directions to future mechanistic research in DR. For example, we found that retinal microaneurysms (FDR-adjusted p≤0.0001), the most statistically significant retinal lesions in individuals with DR, is already correlated with some of the variables such as HbA1c, CRP, BUN, beryllium, and hematocrit, suggesting early effects. In contrast, increased urinary cobalt, triclosan and barium became significant only when adjusted for duration of diabetes. Most of these parameters are also linked to risk of allergies and lung disease, an association that has not been previously explored systematically. As this is a cross-sectional study, a cause-effect relation cannot be established. Moreover, we are unable to rule out any confounding effects of any unmeasured factors. On the other hand, the main strength of the study is the use of the well characterised NHANES cohort in whom standardised protocols were used to measure laboratory parameters. We are not aware of any other association studies in DR where over 400 laboratory parameters were analysed simultaneously to develop multiple models. As the top variables of all four data driven agnostic models were similar, we also believe our findings are generalisable. ## CONCLUSION We confirm that DR is a complex disease and that the already established risk factors contribute significantly to the risk models of DR, with HbA1c being the strongest risk factor. Although our model provides an accuracy of approximately 80%, it also provides mechanistic insights into future research on DR including interrogating the interaction of low-ranking risk factors with more established factors in the models and highlights need to explore epigenetic screens to gauge better how risk factors influence gene expression. Most importantly, the study reinforces the need to control known risk factors of DR, especially hyperglycemia. ## Data Availability Data is available from The National Health and Nutrition Examination Survey (NHANES - https://wwwn.cdc.gov/Nchs/Nhanes/) ## FUNDING This work was funded by Global Challenges Research Fund and UK Research and Innovation through the Medical Research Council grant number MR/P027881/1. The research was supported] by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. ## AUTHOR CONTRIBUTIONS KB, SG and SS conceived and designed the study. KB, SG and YL analysed the data. KB is the study data guarantor. All authors interpreted the results and reviewed the manuscript. KB had full access to all of the data and takes responsibility for the integrity of the data and the accuracy of data analysis. All authors read and approved the final manuscript. ## ACKNOWLEDGEMENTS We thank the many thousands of NHANES study participants who, over the course of decades, have provided valuable information for epidemiological studies. * Received September 20, 2020. * Revision received September 20, 2020. * Accepted September 22, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## REFERENCES 1. 1.Feman SS. The natural history of the first clinically visible features of diabetic retinopathy. Trans Am Ophthalmol Soc. 1994;92:745–73. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7886883&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) 2. 2.Chao JR, Lai M-Y, Azen SP, Klein R, Varma R, Group tLALES. Retinopathy in Persons without Diabetes: The Los Angeles Latino Eye Study. Investigative Ophthalmology & Visual Science. 2007;48(9):4019–25. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiaW92cyI7czo1OiJyZXNpZCI7czo5OiI0OC85LzQwMTkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8yMi8yMDIwLjA5LjIwLjIwMTk4MjE4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 3. 3.Venkatramani J, Mitchell P. Ocular and systemic causes of retinopathy in patients without diabetes mellitus. BMJ (Clinical research ed). 2004;328(7440):625–9. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEyOiIzMjgvNzQ0MC82MjUiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8yMi8yMDIwLjA5LjIwLjIwMTk4MjE4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 4. 4.Singh R, Ramasamy K, Abraham C, Gupta V, Gupta A. Diabetic retinopathy: an update. Indian J Ophthalmol. 2008;56(3):178–88. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18417817&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) 5. 5.Wang W, Lo ACY. Diabetic Retinopathy: Pathophysiology and Treatments. Int J Mol Sci. 2018;19(6):1816. 6. 6.Yau JW, Rogers SL, Kawasaki R, Lamoureux EL, Kowalski JW, Bek T, et al. Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care. 2012;35(3):556–64. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiIzNS8zLzU1NiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzIyLzIwMjAuMDkuMjAuMjAxOTgyMTguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 7. 7.Heintz E, Wiréhn AB, Peebo BB, Rosenqvist U, Levin LA. Prevalence and healthcare costs of diabetic retinopathy: a population-based register study in Sweden. Diabetologia. 2010;53(10):2147–54. 8. 8.Group UPDSU. Intensive blood-glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type 2 diabetes (UKPDS 33). UK Prospective Diabetes Study (UKPDS) Group. Lancet. 1998;352(9131):837–53. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(98)07019-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9742976&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000075857300006&link_type=ISI) 9. 9.Chatziralli IP. The Role of Dyslipidemia Control in the Progression of Diabetic Retinopathy in Patients with Type 2 Diabetes Mellitus. Diabetes Ther. 2017;8(2):209–12. 10. 10.Liu L, Quang ND, Banu R, Kumar H, Tham Y-C, Cheng C-Y, et al. Hypertension, blood pressure control and diabetic retinopathy in a large population-based study. PLoS One. 2020;15(3):e0229665–e. 11. 11.Chen X, Meng Y, Li J, She H, Zhao L, Zhang J, et al. Serum uric acid concentration is associated with hypertensive retinopathy in hypertensive chinese adults. BMC Ophthalmol. 2017;17(1):83-. 12. 12.Luo B-A, Gao F, Qin L-L. The Association between Vitamin D Deficiency and Diabetic Retinopathy in Type 2 Diabetes: A Meta-Analysis of Observational Studies. Nutrients. 2017;9(3):307. 13. 13.Kong X, Wang J, Gao G, Tan M, Ding B, Li H, et al. Association between Free Thyroxine Levels and Diabetic Retinopathy in Euthyroid Patients with Type 2 Diabetes Mellitus. Endocr Res. 2020;45(2):111–8. 14. 14.Merin S, Freund M. Retinopathy in severe anemia. Am J Ophthalmol. 1968;66(6):1102–6. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=5727647&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1968C274600010&link_type=ISI) 15. 15.Khan AA, Rahmani AH, Aldebasi YH. Diabetic Retinopathy: Recent Updates on Different Biomarkers and Some Therapeutic Agents. Curr Diabetes Rev. 2018;14(6):523–33. 16. 16.Moriya T, Tanaka S, Kawasaki R, Ohashi Y, Akanuma Y, Yamada N, et al. Diabetic retinopathy and microalbuminuria can predict macroalbuminuria and renal function decline in Japanese type 2 diabetic patients: Japan Diabetes Complications Study. Diabetes Care. 2013;36(9):2803–9. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo5OiIzNi85LzI4MDMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8yMi8yMDIwLjA5LjIwLjIwMTk4MjE4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 17. 17.Chen YH, Chen HS, Tarng DC. More impact of microalbuminuria on retinopathy than moderately reduced GFR among type 2 diabetic patients. Diabetes Care. 2012;35(4):803–8. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiIzNS80LzgwMyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzIyLzIwMjAuMDkuMjAuMjAxOTgyMTguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 18. 18.Lim LS, Tai ES, Mitchell P, Wang JJ, Tay WT, Lamoureux E, et al. C-reactive Protein, Body Mass Index, and Diabetic Retinopathy. Investigative Ophthalmology & Visual Science. 2010;51(9):4458–63. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiaW92cyI7czo1OiJyZXNpZCI7czo5OiI1MS85LzQ0NTgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8yMi8yMDIwLjA5LjIwLjIwMTk4MjE4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 19. 19.Raymond NT, Varadhan L, Reynold DR, Bush K, Sankaranarayanan S, Bellary S, et al. Higher prevalence of retinopathy in diabetic patients of South Asian ethnicity compared with white Europeans in the community: a cross-sectional study. Diabetes care. 2009;32(3):410–5. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiIzMi8zLzQxMCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzIyLzIwMjAuMDkuMjAuMjAxOTgyMTguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 20. 20.Spanakis EK, Golden SH. Race/ethnic difference in diabetes and diabetic complications. Current diabetes reports. 2013;13(6):814–23. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s11892-013-0421-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24037313&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) 21. 21.Wong TY, Klein R, Islam FMA, Cotch MF, Folsom AR, Klein BEK, et al. Diabetic retinopathy in a multi-ethnic cohort in the United States. American journal of ophthalmology. 2006;141(3):446–55. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajo.2005.08.063&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16490489&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000236057900003&link_type=ISI) 22. 22.McGinnis DP, Brownstein JS, Patel CJ. Environment-Wide Association Study of Blood Pressure in the National Health and Nutrition Examination Survey (1999–2012). Scientific Reports. 2016;6(1):30373. 23. 23.Zhuang X, Guo Y, Ni A, Yang D, Liao L, Zhang S, et al. Toward a panoramic perspective of the association between environmental factors and cardiovascular disease: An environment-wide association study from National Health and Nutrition Examination Survey 1999-2014. Environ Int. 2018;118:146–53. 24. 24.Hall MA, Dudek SM, Goodloe R, Crawford DC, Pendergrass SA, Peissig P, et al. Environment-wide association study (EWAS) for type 2 diabetes in the Marshfield Personalized Medicine Research Project Biobank. Pac Symp Biocomput. 2014:200–11. 25. 25.Patel CJ, Rehkopf DH, Leppert JT, Bortz WM, Cullen MR, Chertow GM, et al. Systematic evaluation of environmental and behavioural factors associated with all-cause mortality in the United States national health and nutrition examination survey. Int J Epidemiol. 2013;42(6):1795–810. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyt208&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24345851&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000329870400041&link_type=ISI) 26. 26.Nembrini S, König IR, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34(21):3711–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bty373&link_type=DOI) 27. 27.Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological). 1995;57(1):289–300. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2346101&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:A1995QE4&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QE45300017&link_type=ISI) 28. 28.Stratton IM, Kohner EM, Aldington SJ, Turner RC, Holman RR, Manley SE, et al. UKPDS 50: risk factors for incidence and progression of retinopathy in Type II diabetes over 6 years from diagnosis. Diabetologia. 2001;44(2):156–63. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s001250051594&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11270671&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000167005200004&link_type=ISI) 29. 29.Klein R, Klein BEK. Blood pressure control and diabetic retinopathy. Br J Ophthalmol. 2002;86(4):365–7. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MTI6ImJqb3BodGhhbG1vbCI7czo1OiJyZXNpZCI7czo4OiI4Ni80LzM2NSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzIyLzIwMjAuMDkuMjAuMjAxOTgyMTguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 30. 30.Zheng Y, Lamoureux EL, Lavanya R, Wu R, Ikram MK, Wang JJ, et al. Prevalence and risk factors of diabetic retinopathy in migrant Indians in an urbanized society in Asia: the Singapore Indian eye study. Ophthalmology. 2012;119(10):2119–24. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ophtha.2012.04.027&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22709419&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F22%2F2020.09.20.20198218.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000310581900027&link_type=ISI)