Genome-wide polygenic score with APOL1 risk genotypes predicts chronic kidney disease across major continental ancestries ========================================================================================================================= * Atlas Khan * Michael C. Turchin * Amit Patki * Vinodh Srinivasasainagendra * Ning Shang * Rajiv Nadukuru * Alana C. Jones * Edyta Malolepsza * Ozan Dikilitas * Iftikhar J. Kullo * Daniel J. Schaid * Elizabeth Karlson * Tian Ge * James B. Meigs * Jordan W. Smoller * Christoph Lange * David R. Crosslin * Gail Jarvik * Pavan Bhatraju * Jacklyn N. Hellwege * Paulette Chandler * Laura Rasmussen Torvik * Alex Fedotov * Cong Liu * Christopher Kachulis * Noura S. Abul-Husn * Judy H. Cho * Iuliana Ionita-Laza * Ali G. Gharavi * Wendy K. Chung * George Hripcsak * Chunhua Weng * Girish Nadkarni * Marguerite R. Irvin * Hemant K. Tiwari * Eimear E. Kenny * Nita A. Limdi * Krzysztof Kiryluk ## ABSTRACT **Introduction** Chronic kidney disease (CKD) is a common complex condition associated with significant morbidity and mortality in the US and worldwide. Early detection is critical for effective prevention of kidney disease progression. Polygenic prediction of CKD could enhance screening and prevention of kidney disease progression, but this approach has not been optimized for risk prediction in ancestrally diverse populations. **Methods** We developed and validated a genome-wide polygenic score (GPS) for CKD defined by estimated glomerular filtration rate (eGFR) <60 mL/min/1.73m2 using common variant association statistics from GWAS for eGFR combined with information on *APOL1* risk genotypes. The score was designed to ensure transferability across major continental ancestries, genotyping platforms, imputation panels, and phenotyping strategies, and was tested following ClinGen guidelines. The polygenic component of the score was developed and optimized using 28,047 cases and 251,772 controls (70% of UK Biobank participants of European ancestry), while the weights for *APOL1* effects were derived based on UK Biobank participants of African ancestry (967 cases and 6,191 controls). We tested the performance of the score in 15 independent testing cohorts, including 3 cohorts of European ancestry (total 23,364 cases and 117,883 controls), 6 cohorts of African ancestry (4,268 cases and 10,276 controls), 4 cohorts of Asian ancestry (1,030 cases and 9,896 controls), and 2 Hispanic/Latinx cohorts (1,492 cases and 2,984 controls). **Results** We demonstrated the risk score transferability with reproducible performance across all independent testing cohorts. In the meta-analyses, disease odds ratios per standard deviation of the score were estimated at 1.49 (95%CI: 1.47-1.50, P<1.0E-300) for European, 1.32 (95%CI: 1.26-1.38, P=1.8E-33) for African, 1.59 (95%CI: 1.52-1.67, P=1.3E-30) for Asian, and 1.42 (95%CI: 1.33-1.51, P=4.1E-14) for Latinx cohorts. The top 2% cutoff of the GPS was associated with nearly 3-fold increased risk of CKD across all major ancestral groups, the degree of risk that is equivalent to a positive family history of kidney disease. In African-ancestry cohorts, *APOL1* risk genotype and the polygenic risk components of the GPS had additive effects on the risk of CKD with no significant interactions. We also observed that individuals of African ancestry had a significantly higher polygenic risk score for CKD compared to other populations, even without accounting for *APOL1* variants. **Conclusions** By combining *APOL1* risk genotypes with the available GWAS for renal function, we designed, optimized, and validated a GPS predictive of CKD across four major continental ancestries. With the upper tail of the GPS distribution associated with disease risk equivalent to a positive family history, this score could be used for clinically meaningful risk stratification. Keywords * chronic kidney disease (CKD) * polygenic risk score (PRS) * GWAS * *APOL1* ## INTRODUCTION Chronic kidney disease (CKD) affects 10-16% of general population and has high morbidity and mortality1-3. In the US, CKD disproportionally affects African Americans (16.3%) when compared to European Americans (12.7%), Asian Americans (12.9%), or Hispanic Americans (13.6%) ([https://www.cdc.gov/kidneydisease/publications-resources/ckd-national-facts.html](https://www.cdc.gov/kidneydisease/publications-resources/ckd-national-facts.html)). CKD stage 3 or greater is defined by a chronic loss of glomerular filtration rate (GFR) to under 60 mL/min/1.73m2. Because this definition is based on estimated kidney function rather than markers of specific kidney injury, it captures an etiologically heterogeneous set of primary and secondary kidney disorders. As expected for a highly heterogeneous trait, CKD has a complex determination with both genetic and environmental contributions. The observational heritability of CKD based on large-scale analysis of medical records ranged from 25-44% depending on self-reported race and ethnicity, with higher heritability estimated for pedigrees with majority of members self-identifying as Black4. These heritability estimates are generally consistent with prior smaller family-based studies of CKD and glomerular filtration rate5-11. The substantial heritability of CKD is attributed to both monogenic12-14 and polygenic causes15-19. Moreover, in individuals of African ancestry, two common risk alleles (G1 and G2) in *Apolipoprotein L1* (*APOL1*) gene have been described to convey a large effect on the risk of kidney disease20,21. While heterozygotes for G1 or G2 alleles appear to be protected from trypanosomal sleeping sickness, kidney disease risk is conveyed under a recessive model in carriers of two risk alleles (G1G1, G2G2, or G1G2). Because of the selective pressure exerted by endemic trypanosomal species in certain parts of eastern and western Africa, G1 and G2 alleles are observed almost exclusively individuals whose ancestry can be linked to those areas22,23. In the US population, frequency of *APOL1* risk genotypes is estimated at approximately 15% in African Americans, 0.5-2% in Hispanic Americans, and <0.01% in European Americans24. These differences may be contributing to the higher prevalence of CKD in African Americans in the US, but additional non-*APOL1* genetic risk factors have not yet been elucidated. Genome-wide polygenic scores (GPS) have emerged as promising tools for genetic risk stratification that can enhance traditional risk models for complex diseases. This approach has been applied to a variety of traits, including coronary artery disease25-35, type 2 diabetes28-30, hypertension36,37, obesity38, schizophrenia33-35, and malignancies including breast, colorectal, prostate, and lung cancers39-46. One of the major limitations of the GPS approach is that existing GWAS are based predominantly on European cohorts and, as a result, most GPS do not perform well in more diverse cohorts, or in individuals with admixed ancestry47. Similar to other complex traits, GWAS for kidney function involved predominantly European cohorts. The latest study involved 765,348 participants, 75% of which were European, 23% East Asian, 2% African American, and <1% Hispanic17. Notably, this study did not capture the effects of *APOL1* risk variants because of their recessive inheritance and very low frequencies in non-African populations. The objective of the present study is to test if the existing knowledge on polygenic contributions to renal function is sufficient to build a clinical risk predictor for moderate-to-advanced CKD with adequate performance across diverse ancestral groups. We specifically aimed to design, optimize, and test a new GPS for clinical risk prediction of kidney disease that maximizes the performance across major continental ancestries. We combined information on *APOL1* risk genotypes with the latest GWAS for renal function to formulate a GPS that can reliably discriminate moderate to advanced CKD (stage 3 or greater) from population controls. In our approach, we took advantage of the power of the existing GWAS for a quantitative biomarker of renal function (serum creatinine-based eGFR) to predict a disease state. To demonstrate transferability across different genotyping and imputation platforms, and to document comparable predictive performance by ancestry, we performed rigorous testing of our GPS in 15 independent and ancestrally diverse case-control cohorts following ClinGen standards48. ## METHODS ### Study cohorts: genotyping, imputation, and quality control analyses #### Electronic Medical Records and Genomics (eMERGE) The eMERGE network provides access to EHR information linked to GWAS data for 102,138 individuals; detailed quality control analyses of genetic data have been described previously4,49,50. Briefly, GWAS datasets were imputed using the latest multiethnic Haplotype Reference Consortium (HRC) panel using Michigan Imputation Server51. The imputation was performed in 81 batches across the 12 contributing medical centers participating in eMERGE-I, II, and III. For post-imputation analyses, we included only markers with minor allele frequency (MAF)≥ 0.01 and *R*2 ≥ 0.8 in ≥ 75% of batches. A total of 7,529,684 variants were retained for the GPS analysis. For principal component analysis (PCA), we used FlashPCA52 on a set of 48,509 common (MAF>0.01) and independent variants (pruned in PLINK with --indep-pairwise 500 50 0.05 command). The G1 (1072A>G (rs73885319) and 1200T>G (rs60910145)) and G2 (1212-del6 (rs71785313)) alleles of *APOL1* were imputed separately using the TOPMed imputation server53. The allelic frequencies of G1 and G2 alleles were comparable to previous studies54 and were summarized in **Supplementary Table 1**. The analyses were performed using a combination of VCFtools, PLINK, and custom scripts in PYTHON and R55-57. #### UK Biobank (UKBB) UKBB is a large prospective cohort based in the United Kingdom that enrolled individuals ages 40-69 for the purpose of genetic studies58. This cohort is comprised of 488,377 individuals recruited since 2006, genotyped with high-density SNP arrays, and linked to electronic health record data. All individuals underwent genome-wide genotyping with UK Biobank Axiom array from Affymetrix and UK BiLEVE Axiom arrays (∼825,000 markers). Genotype imputation was carried out using a 1000 Genomes reference panel with IMPUTE4 software59-61. We then applied QC filters similar to eMERGE-III, retaining 9,233,643 common (MAF ≥ 0.01)variants imputed with high confidence (*R** ≥ 0.8). For principal component analysis by FlashPCA52, we used a set of 35,226 variants that were common (MAF>0.01) and pruned using the following command in PLINK --indep-pairwise 500 50 0.05. Similar to the eMERGE-III datasets, the *APOL1* G1 and G2 alleles were imputed separately using the TOPMed imputation server53. #### Bio*Me* Biobank The Bio*Me* Biobank is an electronic health record (EHR)-linked biorepository that has been enrolling participants non-selectively from across the Mount Sinai Health System (MSHS) in New York City since 2007. There are currently over 60,000 participants enrolled in Bio*Me* under and Institutional Review Board (IRB) approved study protocol (IRB 07–0529). Participants consent to provide DNA and plasma samples linked to their EHRs. Participants provide additional information on self-reported ancestry, personal and family health history through questionnaires administered upon enrollment. A total of 32,595 Bio*Me* participants were genotyped on the Illumina Global Screening Array (GSA) through a collaboration with Regeneron Genetics Center and 11,953 on the Illumina Global Diversity Array (GDA) through a collaboration with Sema4. Population groups were determined by self-reported race/ethnicity as published previosuly62. Participants were removed if genotype missing rates were > 5%, and if sex or ancestry information were missing. Variants were removed if genotype missing rates were >5% and if HWE was less than P < 1.00E-5 for the GSA data and p < 1.00E-6 for the GDA data within each ancestry group. Imputation (including G1 and G2 variants in *APOL1*) was performed using the TOPMed Imputation Server and the TOPMed Freeze 8 reference panel. Post-imputation variants with quality scores <0.7 were removed. Participants younger than 40 years old and individuals detected to be cryptically related (2nd degree or above) were removed from the analysis. We additionally removed a subset of Bio*Me* participants that were included in the original CKDGen GWAS discovery dataset17. After all QC steps, there were 9,154 Bio*Me* participants of European ancestry, 7,318 African ancestry, 11,606 Latinx ancestry, and 843 East Asian ancestry included in the analysis. #### Reasons for Geographic and Racial Differences in Stroke Study (REGARDS) REGARDS is a population-based, longitudinal study of incident stroke and associated risk factors of over 30,000 Black and White adults aged 45 years or older from all 48 contiguous US states and the District of Columbia63. REGARDS was designed to investigate reasons underlying the higher rate of stroke mortality among self-reported Black compared to White participants, as well as why residents of the Southeastern US had worse death rates compared to other US regions. By design, participants were oversampled if they were residents of the stroke belt or if they were Black6363626161. Participants completed a computer-assisted telephone interview to collect demographic information and medication adherence, and an in-home visit for blood pressure measurements and collection of blood and urine samples and have been contacted at six-month intervals to obtain information on incident stroke or secondary outcomes. Genotyping was performed on 8,916 self-identified Black participants using Illumina MEGA-EX arrays and imputed using the NHLBI TOPMed reference panel (Freeze 8). Participants were excluded with call rates less than 95%, if they were internal duplicates, had sex mismatches, or were outliers on principal component analysis (outside of 6 standard deviations), resulting in 8,669 participants with genotypes available for analysis. Imputed variants were inspected for their imputation quality scores (R2) and it was noted that more than 99% of the variants with MAF>1% had an imputation quality of 0.6 or higher. Given the high-quality of imputation for variants of MAF>1%, in order to retain maximum overlap of variants with the SNPs with PRS weights, genotypes with genotypic probability of 0.9 or higher were retained. *APOL1* alleles (G1 [rs73885319A>G, S342G] and G2 [rs71785313 TTATAA/– N388Y389/–]) were genotyped directly using TaqMan SNP Genotyping Assays (Applied Biosystems/ThermoFisher Scientific). #### The Hypertension Genetic Epidemiology Network (HyperGEN) HyperGEN is a cross-sectional, population-based study and component of the NHLBI Family Blood Pressure Program that was designed to identify genetic risk factors for hypertension and target end-organ damage due to hypertension64. The cohort is composed of White and Black sibships in which at least two siblings were diagnosed with hypertension (defined as either self-reported use of antihypertensive medications or SBP ≥140 mmHg and/or DBP ≥90 mmHg at two separate evaluations) before age 60, their unmedicated adult offspring, and age-matched controls. Later the study population was expanded to include other siblings of the original sibling pair as well as any offspring for a total sample size of N=5000. The Black participants underwent whole genome sequencing (WGS), through the National Heart Lung and Blood Institute (NHLBI) WGS program. In order to harmonize our imputation efforts with the array-based panels of the REGARDS, GenHAT and Warfarin (see below) studies, we compiled a set of non-monomorphic and non-multi-allelic SNPs with MAF >1% that were genotyped for GWAS as part of those studies. This yielded a total of 2,204,415 SNPs that were used as fence post markers for imputation. In order to maintain consistency HyperGEN samples were then imputed into the same version of TOPMed release2 reference panel that was adopted for imputing REGARDS, GenHAT and Warfarin. Imputed variants were inspected for their imputation quality scores (R2) and it was noted that more than 99% of the variants with MAF>1% had an imputation quality of 0.6 or higher. Given the high-quality of imputation for variants of MAF>1% and to retain maximum overlap of variants with the SNPs with PRS weights, genotypes with genotypic probability of 0.9 or higher were retained. Participants were excluded if they were younger than 40 years at enrollment, leaving 1,898 self-identified Black participants for analysis. *APOL1* genotypes were obtained directly from the WGS data. #### Warfarin Pharmacogenomics Cohort (WPC) WPC is a prospective cohort of first-time warfarin users aged 19 years or older starting warfarin for anticoagulation65,66. Warfarin therapy requiring a target international normalized ratio (INR) range of 2-3 was initiated in patients with venous thromboembolism, stroke/transient ischemic attacks, atrial fibrillation, myocardial infarction, and/or peripheral arterial disease. Patients requiring a higher intensity (INR 2.5 to 3.5) or lower intensity (INR 1.5 to 2.5) of anticoagulation were excluded. A detailed history was obtained that included baseline demographics, as well as medication history and compliance. Changes in INR, medications and laboratory parameters were documented at each clinical visit as reported previously. The WPC includes genotyped data using the Illumina MEGA-EX array, as well as an Illumina IM duo array for 599 and 297 Black participants, respectively. Imputation was performed using the NHLBI TOPMed r2 reference panel (Freeze 8). Imputed variants were inspected for their imputation quality scores (R2) and it was noted that more than 99% of the variants with MAF>1% had an imputation quality of 0.6 or higher. Given the high-quality of imputation for variants of MAF>1%, in order to retain maximum overlap of variants with the SNPs with PRS weights, genotypes with genotypic probability of 0.9 or higher were retained. This strategy retained highest-quality genotype calls for the SNPs that were employed for the PRS derivation. Ten principal components (PCs) were calculated using a program EIGENSOFT (version 6.1.4) selecting directly genotyped SNPs (MAF ≥ 5%, missing data < 5%, and HWE p-value >1E-04) based on pairwise linkage disequilibrium (LD, using r2 < 0.05). This resulted in 44,137 tagged SNPs in African American samples. *APOL1* information was obtained from genotypic array data; rs143830837 (bp hg38 36265995) was used as a proxy for rs71785313 (bp hg38 36265996) since these SNPs represent the same G2 variant and were recently merged in the NCBI dbSNP database ([https://www.ncbi.nlm.nih.gov/snp/?term=rs143830837](https://www.ncbi.nlm.nih.gov/snp/?term=rs143830837)). For this analysis, only participants aged 40 years or older were included, leaving a total of 448 self-identified Black participants. #### The Genetics of Hypertension Associated Treatments (GenHAT) Study GenHAT is an ancillary study to the Antihypertensive and Lipid Lowering Treatment to Prevent Heart Attack Trial (ALLHAT)67. ALLHAT was a randomized, double blind, multicenter clinical trial with over 42,000 high-risk individuals with hypertension, aged 55 years or older, and had at least one additional risk factor for CVD. ALLHAT is the largest antihypertensive treatment trial to date and was ethnically diverse, enrolling over 15,000 Black participants68. Participants were randomized into four groups defined by the class of assigned antihypertensive medication including chlorthalidone, lisinopril, amlodipine, and doxazosin at a ratio of 1.7:1:1:1, respectively. The original GenHAT study (N=39,114) evaluated the effect of the interaction between candidate hypertensive genetic variants and different antihypertensive treatments on the risk of fatal and non-fatal CVD outcomes67. In an ancillary study to the original GenHAT study, participants self-identified as Black were genotyped for GWAS to better understand genes involved in response to chlorthalidone and lisinopril. Genotyping using Illumina MEGA-EX arrays was performed on 7,546 Black adults who met these criteria. Upon genotyping completion samples that failed or had low call rate (<95%) were excluded. Participants were also excluded with sex mismatches, or if they were an outlier on principal component analysis (outside of 6 standard deviations), which resulted in 6,919 Black GenHAT participants with genotypes available for analysis. Imputation was carried out using the NHLBI TOPMed release 2 (r2) reference panel (Freeze 8). Imputed variants were inspected for their imputation quality scores (R2) and more than 99% of the variants with MAF>1% had an imputation quality of 0.6 or higher. Given the high-quality of imputation for variants of MAF>1%, in order to retain maximum overlap of variants with the SNPs with PRS weights, genotypes with genotypic probability of 0.9 or higher were retained. *APOL1* information was extracted from the genotypic array data. Similar to WPC, rs143830837 variant was used as a proxy for the *APOL1* G2 allele. ### Ancestry definitions In UKBB and eMERGE-III datasets, the ancestry sub-cohorts were defined based on clustering on principal component analysis of genetic markers. We grouped all individuals into major continental ancestry clusters: European, African, Hispanic/Latinx, South Asian, and East Asian. This was done by projecting each sample onto the reference principal components calculated from the 1000G reference panel69. Briefly, we merged our UKBB and eMERGE samples with 1000G samples and kept only SNPs in common between the two datasets. We used common variants between UKBB and eMERGE with 1000G following command in PLINK --indep-pairwise 500 50 0.05. The numbers of pruned variants for UKBB and eMERGE were 35,091 and 43,080 respectively. We then calculated principal components (PCs) for the 1000G samples using FlashPCA and projected each of our samples onto those PCs. Ancestry assignments were then performed by co-clustering of the 1000G reference populations and ancestry memberships were verified by visual inspection of PCA plots. Ancestry in BioMe, REGARDS, HyperGEN, WPC, and GenHAT was determined by self-reported race/ethnicity, and PCA was subsequently performed for verification and to exclude outliers. ### CKD phenotyping and case-control definitions For CKD phenotyping based on EHR data, we used a computable CKD phenotype recently developed and extensively validated by the eMERGE-III network4. In the population-based UKBB datasets, we defined cases as those having CKD stage 3 or above (based on eGFR estimated with CKD-EPI equation70, and including patients on chronic dialysis or after a kidney transplant) and compared them to population controls. In all other case-control testing datasets, we defined cases using the same definition (CKD stage 3 or above based on CKD-EPI eGFR70, or patients on chronic dialysis or after a kidney transplant), while controls were defined as those without known CKD and eGFR > 90 mL/min/1.73m2 based on the latest serum Cr available. While reducing the overall sample size of our control datasets, the exclusion of individuals with CKD stage 2 (eGFR 60-90 mL/min/1.73m2) aimed to minimize any potential misclassification of the case-control status. Only individuals 40 years of age or older were included across all datasets for consistency with the UKBB ascertainment strategy. Additional covariates used in the predictive models included age, sex, diabetes (type I or II), and principal components of ancestry. The diagnosis of diabetes type I or II was added as an important covariate, given that it represents an established large-effect risk factor for CKD and kidney failure. The diagnosis of hypertension was not added as a covariate to avoid over-adjustment, since case-effect relationship of hypertension to CKD is usually not clear in electronic health records, and CKD itself represents the most common cause of secondary hypertension. ### Polygenic score design and optimization We used 70% Europeans of UKBB (28,047 cases and 251,772 controls) to optimize the GWAS-based polygenic component of the GPS by selecting the best model between two commonly used methods and a range of input parameters (“ Optimization Dataset” , **Table 1, Supplementary Table 2**). We used the summary statistics for 8.2 million common variants from the CKDGen consortium GWAS for eGFR17 in combination with the diverse LD reference panel from phase 3 1000G project (all populations, N=2,504)59. We first computed 7 candidate GPSes using the LDPred computational algorithm71 across the following range of rho (fraction of casual variants): 1.00E+00, 1.00E-01, 1.00E-02, 1.00E-03, 3.00E-01, 3.00E-02 and 3.00E-03. We also generated 12 pruning and thresholding (P+T) scores with r2=0.2 and P-value thresholds of 1.0, 1.00E-02, 1.00E-03, 1.00E-04, 1.00E-05, 1.00E-06, 1.00E-07, 1.00E-08, 3.00E-02, 3.00E-03, 3.00E-04 and 3.00E-05. Based on the above parameters, each GPS was expressed as a weighted sum of alleles with weights based on the GWAS for the eGFR study: View this table: [Table 1:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T1) Table 1: Summary of study cohorts used for GPS optimization and testing. ![Formula][1] where M is number of variants in the model and *β**j* is the weight based on GWAS summary statistics and the negative sign reflects an inverse relationship between eGFR and CKD. Each of the 19 scores derived above was subsequently assessed for discrimination of CKD cases from population controls in the first UKBB optimization dataset after adjustment for age, sex, diabetes status and four principal components of ancestry. The score with the best performance was defined by the maximal area under the receiver operator curve and the largest fraction of variance explained (**Supplementary Table 2**). The best performing score was normal-standardized (by subtracting control mean and dividing by control standard deviation) and advanced for testing in the second UKBB optimization cohort of African ancestry. ### Modeling the effects of *APOL1* risk genotypes To optimize trans-ethnic performance, our final score was further optimized using the second, smaller UKBB optimization dataset of African ancestry (967 cases and 6,191 controls). We aimed to assess if adding *APOL1* risk genotype (under a recessive model) enhanced CKD risk prediction. For this purpose, we first removed any variants in the *APOL1* region from the GWAS-based GPS equation. Next, we tested the GPS and *APOL1* risk genotype jointly for association with CKD in this dataset. The GPS (without *APOL1* region) and recessive *APOL1* risk genotypes both represented independently significant predictors of CKD before and after adjustment for age, sex, diabetes (Type I or II), and 4 principal components of ancestry. The risk effects of *APOL1* and GPS were additive, with one SD unit of the standard-normalized GPS conveying the risk that was approximately equivalent to *APOL1* risk genotype (**Supplemental Table 3**). We also tested for effect modification of *APOL1* risk genotype by the polygenic component in CKD prediction, but we detected no significant interactive effects (P interaction = 0.29). To best account for an independent additive effect of recessive *APOL1* risk genotypes, we therefore updated the CKD GPS for each subject using the following equation: ![Formula][2] ### Predictive performance in independent testing datasets The predictive performance of the final risk score formulation was assessed in 15 ancestrally diverse testing datasets, including 3 cohorts of European ancestry (by genetic or self-reported European ancestry, 23,364 cases and 117,883 controls in total), 6 cohorts of African ancestry (4,268 cases and 10,276 controls), 4 cohorts of Asian (East and South-West) ancestry (1,030 cases and 9,896 controls), and 2 Hispanic/Latinx ancestry cohorts (1,492 cases and 2,984 controls). We calculated a full set of standardized risk score performance metrics following eMERGE-IV and ClinGen guidelines48. Logistic regression models were used for predicting case-control status with adjustment for age, sex, diabetes (Type I or II), center and genotype/imputation batch (if relevant), and four principal components of ancestry using glm function in R. We used pROC R package to calculate the receiver operating characteristics area under curve (AUC), using logistic regression model with CKD case status as an outcome and the GPS, age, sex, center, batch, and four significant PCs as predictors. We calculated variance explained using the Nagelkerke’s pseudo-R2, including for the full model (GPS plus covariates), for the covariates-only model, and for the GPS component alone expressed as the R2 difference between the full and the covariates-only model. We also expressed the effect of standardized risk score as odds ratios (with 95% confidence intervals) per standard deviation unit of the control standard normalized risk score distribution in each of the validation cohorts. We examined the risk score discrimination at tail cut-offs corresponding to the top 20%, 10%, 5%, 2%, 1% of the GPS distribution by deriving odds ratios of disease for each tail of the distribution compared to all other individuals in each cohort. We also calculated sensitivities and specificities for each cut-off point in each testing cohort. The performance metrics were then meta-analyzed across ancestry-defined testing cohorts using an inverse variance weighed fixed-effects method to derive pooled performance metrics for each ancestral grouping72. Finally, we calculated ancestry-specific prevalence-adjusted positive and negative predictive values for each GPS cut-off based on pooled estimates of sensitivity and specificity and known CKD prevalence in US population ([https://www.cdc.gov/kidneydisease/publications-resources/ckd-national-facts.html](https://www.cdc.gov/kidneydisease/publications-resources/ckd-national-facts.html)). Statistical analyses were conducted using R version 3.6.3 (2019-02-29) software. ### Comparing GPS distributions in the 1000G reference populations To assess differences in the distributions of GPS by ancestry, we computed risk scores for the multiethnic reference of all 1000G phase 3 participants using our final optimized CKD GPS equation: ![Formula][3] where *w**J* are optimized weights from CKD GWAS summary statistics for each marker included in the score, *d**J* is dosage or genotypes (0, 1 and 2) for 1000G samples and m is the total number of variants from GWAS summary statistics included in our final GPS (m=471,316**)**. The distributions were examined visually in the form of histograms, and distributional differences by ancestry were tested using ANOVA. ### Post-hoc ancestry adjustment In order to express GPS effects on the same scale across ancestrally diverse individuals and select a single cut-off for clinical implementation of the GPS, we adjusted for differences in the first two moments of the GPS distributions by ancestry. Using multiethnic eMERGE cohorts, we tested two different regression-based genetic ancestry adjustment strategies that utilize 1000G (all populations) reference: **method 1** which adjusts for differences in mean and **method 2** which adjusts for both differences in mean and variance. For **method 1**, we first regressed the GPS of 1000G participants against the first five PCs as proposed previously73: ![Formula][4] Fitting the model to 1000G reference panel allows us to find *α*’s and generate residuals. Next, we used the predicated *α*’s to calculate the adjusted score for any individual projected onto the same PCA space: ![Formula][5] where ![Graphic][6] is the raw GPS, ![Graphic][7] is the predicted (ancestry-adjusted) mean, and *δ* is the residual standard deviation from the 1000G model (all populations). To adjust for ancestral differences in both mean and variance (**method 2**), we used the same method as above, but we also modeled residual variance (*δ*2) as a function of PCs of ancestry: ![Formula][8] Next, we used the predicated *α* ‘s and *β*’s to calculate the adjusted Zp score: ![Formula][9] Where ![Graphic][10] sis the predicted (ancestry-adjusted) residual variance. The distributional transformations achieved by these methods were examined visually. We then compared the effects of these adjustments for the top percentile cut-offs in eMERGE-III cohorts. We also assessed the overall GPS calibration across all eMERGE cohorts after final ancestry adjustment. ## RESULTS ### GPS Optimization The flowchart summary of the GPS derivation and testing strategy is provided in **Figure 1**. In the optimization step, a total of 19 candidate risk scores were generated based on multiethnic LD reference panel and summary statistics from GWAS for eGFR17. We then used a large optimization dataset comprised of 70% of European UK Biobank participants to select the best performing model (**Table 1, Supplementary Table 2**). The best model was based on the P+T method and involved 471,316 markers selected based on r2 = 0.2 and P < 0.03. The score was standardized to zero-mean and unit-variance based on ancestry-matched population controls. In the optimization dataset, the polygenic component of the risk score explained 3% of variance (R2), with one standard deviation of the score increasing CKD risk by 60% (OR=1.60, 95%CI 1.59-1.61, P <1.00E-300) after controlling for age, sex, diabetes, genotyping batch, and genetic ancestry (**Supplementary Table 2**). ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/26/2021.10.25.21265398/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/F1) Figure 1: Overview of the study design. The CKD GPS was designed based on CKDGen GWAS summary statistics for eGFR and a cosmopolitan LD reference panel of 1000 Genomes (all populations); optimization was performed in two stages using UKBB participants of European (optimization 1) and African (optimization 2) ancestries; GPS performance validation was conducted in 15 additional independent testing cohorts of diverse ancestries. The second optimization step involved testing for independent contributions of *APOL1* risk genotypes and included 7,158 UKBB participants of genetically defined African ancestry (967 cases and 6,191 controls). In the model adjusted for age, sex, diabetes (Type I and II), batch, and PCs of ancestry, we observed statistically significant independent effects of the polygenic component (OR per SD =1.16, 95%CI: 1.09-1.25, P=1.00E-04) and the recessive *APOL1* risk genotype (OR=1.19, 95%CI: 1.01-1.38, P=4.00E-02), but no significant multiplicative interactions between the two predictors (P interaction=0.29) (**Supplementary Table 3**). Given these findings, we subsequently modeled *APOL1* risk as additive to the polygenic component, with the recessive risk genotype effects approximately equivalent to one standard deviation of the standard normalized polygenic score (a weight of one standard deviation unit was selected because the β per standard deviation of the polygenic score and the β for APOL1 risk genotype were comparable in magnitude). ### Population differences in GPS distributions We next examined the distributions of the polygenic risk component (without *APOL1*), as well as the final combined GPS (with *APOL1*) in the reference populations of 1000 Genomes. We detected significant differences in the mean polygenic risk across reference populations (**Figure 2**, ANOVA P=3.40E-154), with a notable shift towards higher average risk in the African population compared to all other populations (P=4.92E-163). This shift became even more pronounced after the inclusion of *APOL1* risk genotype information in the combined GPS (P=1.58E-168). These results suggest that the polygenic risk for CKD is considerably higher in African compared to non-African populations independent of *APOL1*. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/26/2021.10.25.21265398/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/F2) Figure 2: Risk score distributions in five 1000 Genomes populations: **(a)** raw polygenic score without *APOL1*; **(b)** ancestry-adjusted polygenic score without *APOL1* (method 1: mean only); **(c)** ancestry-adjusted polygenic score without *APOL1* (method 2: mean and variance); **(d)** raw combined GPS with *APOL1*; **(e)** ancestry-adjusted combined GPS with *APOL1* (method 1) and **(f)** ancestry-adjusted combined GPS with *APOL1* (method 2). AFR: African, AMR: Native American, EAS: East Asian, EUR: European, and SAS: South Asian. Given that the weights of the polygenic score equation are fixed and derived based on the GWAS performed predominantly in European cohorts, we hypothesized that these distributional differences are likely driven by a higher frequency of CKD risk alleles in African genomes. Therefore, we examined the overall frequency spectrum of CKD risk alleles included in the GPS between European and African reference populations (**Supplemental Figure S2**). As expected for variants selected based on European GWAS, we observed a greater number of risk alleles at the extremes of the frequency spectrum (RAF < 0.01 or > 0.099) in the African compared to European populations of 1000G. This observation is likely due to the routine use of MAF filter of 0.01 in the individual European GWAS discovery cohorts contributed to the GWAS meta-analysis for eGFR. Across all variants included in the score, the mean difference in risk allele frequencies (RAF) between African and European populations was positive (i.e. greater than the expected mean=0), and this rightward shift in the distribution of RAF difference is indicative of higher average frequency of risk alleles in African genomes. We further observed that the risk alleles with largest weights (effect sizes in GWAS) had a significantly higher frequency in African genomes compared to those with low effect sizes (P=0.02), or intermediate effect sizes (P=0.018) (**Supplemental Figure S2d)**. Thus, it appears that the observed GPS distributional shifts between European and African populations are driven predominantly by frequency differences of large effect risk alleles. ### GPS Testing in cohorts of European ancestry We next tested the final GPS in three European cohorts, including the remaining 30% of the UKBB (11,922 cases and 108,002 controls) and two large US-based European ancestry cohorts, eMERGE-III (10,572 cases and 8,030 controls) and Bio*Me* (870 cases and 1,851 controls). In the combined meta-analysis across the three testing cohorts, the GPS exhibited highly reproducible performance, with pooled OR per SD = 1.49, 95%CI: 1.47-1.50, P < 1.00E-300 and ROC AUC=0.75, 95%CI: 0.75-0.76 (**Supplemental Table 4**). While the UKBB testing cohort had nearly identical performance metrics to the UKBB optimization cohort (OR per SD = 1.60, 95%CI: 1.58-1.62, P < 1.00E-300) the effect sizes were slightly attenuated in the US-based cohorts (eMERGE-III: OR per SD=1.38, 95%CI:1.35-1.40, P=1.58E-83 and Bio*Me*: OR per SD=1.58, 95%CI:1.46-1.70, P=2.50E-14. We also note that the frequency of *APOL1* risk genotype was extremely low, thus its effect was negligible in European cohorts. ### GPS testing in cohorts of African ancestry The GPS was tested in six independent African ancestry cohorts, including eMERGE-III (1,143 cases and 1,600 controls), Bio*Me* (729 cases and 1,149 controls), HyperGEN (109 cases and 619 controls), REGARDS (1,055 cases and 4,314 controls), GenHat (924 cases and 2,454 controls) and Warfarin (308 cases and 140 controls). In the combined meta-analysis, the GPS had pooled OR per SD = 1.32, 95%CI: 1.26-1.38, P =1.77E-33 and ROC AUC=0.78, 95%CI: 0.77-0.79 (**Table 2 and Supplemental Table 5**). We note that the inclusion of *APOL1* risk genotype in the GPS considerably enhanced CKD risk prediction across all cohorts of African ancestry, substantially improving tail cut-off discrimination i.e., the risk for the top 2% was approximately 1.8-fold higher in the model without *APOL1* and 2.7-fold higher in the model with APOL1, when compared to the remaining 98% of individuals (**Table 3)**. View this table: [Table 2:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T2) Table 2: The performance metrics of the GPS in the testing cohorts meta-analyzed by continental ancestry. For performance testing in individual cohorts, please refer to Supplemental Tables 4-7. View this table: [Table 3:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T3) Table 3: Added value *APOL1* risk genotype to polygenic risk components in predicting CKD using extreme tail (98th percentile) of the risk score distribution in African-American (4,268 cases and 10,276 controls) and Hispanic (1,492 cases and 2,984 controls) cohorts. All effect estimates are adjusted for age, sex, diabetes, and principal components of ancestry. ### GPS testing in cohorts of Hispanic/Latinx ancestry The GPS was also tested in two Hispanic/Latinx cohorts from eMERGE-III (488 cases and 639 controls) and Bio*Me* (1,004 cases and 2,345 controls). The combined meta-analysis of these cohorts resulted in pooled OR per SD = 1.42, 95%CI: 1.33-1.51, P=4.10E-14 and ROC AUC 0.88, 95%CI: 0.85-0.89 (**Supplementary Table 6)**. As expected, due to high rates of African admixture in the Latinx population, the inclusion of *APOL1* risk genotypes in the GPS also improved predictive performance and tail discrimination of the GPS in these cohorts (**Table 3**). ### GPS testing in cohorts of Asian ancestry Lastly, we evaluated GPS performance in four diverse Asian cohorts including UKBB South Asian (797 cases and 7,698 controls), UKBB East Asian (74 cases and 1,740 controls), eMERGE-III East Asian (98 cases and 105 controls) and Bio*Me* East Asian (61 cases and 353 controls) cohorts. The combined meta-analysis revealed results that were very similar to Europeans, with pooled OR per SD = 1.59, 95%CI: 1.52-1.67, P =1.30E-30 and ROC AUC=0.90, 95%CI: 0.87-0.92 (**Supplementary Table 7**). *APOL1* risk genotypes were absent in the Asian cohorts, thus the modelled risk was entirely attributable to the polygenic component. ### Tail discrimination performance by ancestry For each individual testing cohort, we derived risk estimates comparing extreme tails of the risk score distribution to all other cohort members, and estimated sensitivity and specificity for a range of tail cutoffs (20%, 10%, 5%, 2% and 1%). These metrics were meta-analyzed by ancestry and are summarized in **Table 2 and Supplemental Tables 4-7**. Depending on ancestry, the top 2% tail of the risk score distribution was associated with 2-4 fold higher risk of CKD than the remaining 98% of individuals, including European (OR=3.09, 95%CI: 2.98-3.19, P=3.22E-98), African (OR=2.66, 95%CI: 2.01-3.51, P=4.93E-12), Latinx (OR=4.27, 95%CI: 2.21-8.25, P=1.59E-05) and Asian (OR=2.95, 95%CI: 1.93-4.50, P=5.91E-07) cohorts. We have therefore selected this cut-off as potentially clinically meaningful, since this degree of risk is approximately equivalent to the risk conferred by a positive family history of kidney disease. In **Table 4**, we summarize various metrics of diagnostic performance for this cut-off by ancestry, including sensitivity, specificity, as well as prevalence-adjusted positive and negative predictive values. For comparison, we also provide similar metrics for the cut-offs of top 5% of the risk score distribution in controls. View this table: [Table 4:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T4) Table 4: Pooled sensitivity, specificity, and prevalence-adjusted PPV and NPV for extreme tail cut-offs (95th and 98th percentile) of the GPS distribution in controls. The CKD prevalence was assumed to be 12.7% in European Americans; 16.3% in African Americans; 13.60% in Hispanic Americans, and 12.9% in Asian Americans as per the most recent CDC data (2021). ### Ancestry adjustments and GPS calibration We next compared the effect of two different ancestry adjustment methods on the GPS distributions in 1000G, eMERGE-III, and UKBB testing cohorts (**Figure 2 and Supplemental Figure S1**). Adjusting for mean only (**method 1**, see *methods*) eliminated major distributional shifts by ancestry, but did not fully resolve the observed tail differences. The ancestry adjustment **method 2** (adjusting for both mean and variance) results in comparable shapes of the GPS distributions by ancestry and facilitates the selection of a single trans-ancestry cut-off to define a “ high risk” group. Both methods result in comparably good risk score calibration when applied to the combined multiethnic eMERGE-III dataset (**Supplementary Figure S3)**. As a trade-off, however, the ancestry adjustment methods reduce tail discrimination at extreme cut-offs as summarized **Supplementary Table 8**. This trade-off appears most pronounced for **method 2** and for more admixed cohorts (African-American and Latinx). ### Comparison with existing studies While our manuscript was in preparation, an alternative version of a polygenic score for renal function was published by Yu et al.74 This score was based on the combined GWAS meta-analysis for eGFR involving the CKDGen consortium study17 and 90% of the UK Biobank, resulting in larger sample size but also higher proportion of Europeans (82%) contributing to summary statistics. Notably, the GPS designed by Yu et al. did not include *APOL1* risk genotype and was not optimized for transethnic prediction of CKD. The key design and validation differences between our GPS and the one by Yu et al. are summarized in **Supplementary Table 9**. Because the score by Yu et al. has not been tested in multi-ethic cohorts, we next tested its performance in our independent eMERGE-III cohorts (**Supplementary Table 10**). As expected given the design differences, the score by Yu et al. performed better in cohorts of European ancestry (R2 3.5% vs. 2.2%), comparably well in Hispanic cohorts (R2 1.2% vs. 1.1%), but was less predictive in cohorts of African (R2 1.0% vs. 1.4%) or East Asian ancestry (R2 1.0% vs. 2.6%). In the African ancestry testing cohort, individuals in the top 2% of the GPS distribution by Yu et al. had 92% increase in the risk of CKD (OR 1.92, 95%CI: 1.03-3.59, P=0.039), compared to 2.6-fold increased risk using the GPS developed in this study (OR 2.60, 95%CI: 1.38-4.90, P=0.0031). Lastly, we note that the African ancestry distribution of the GPS by Yu et al. was also shifted towards higher values in compared to other ancestries (**Supplementary Figure 4**), confirming that the observed distributional shift is independent of any specific method used to design the score. ## DISCUSSION We developed a genome-wide polygenic score for CKD that captures the most recent data on the polygenic determinants of renal function, predicts CKD across four major continental ancestry groups, and is transferable across various genotyping platforms and imputation methods. We also developed a continuous ancestry adjustment to allow for trans-ancestry standardization of the risk score. Our score was designed and validated following the ClinGen and the PRS Catalogue guidelines and can be implemented as a clinical test for individual level risk prediction48. Importantly, our testing studies demonstrated that extreme tails of the risk score distribution (top 2%) convey over 2-3-fold increase in the disease risk across the four major continental ancestries. From the clinical perspective, this magnitude of risk could be considered as actionable given its equivalence to a positive family history of kidney disease75. Although our risk score is based on a multiethnic GWAS for eGFR, we recognize that the allelic effects used to develop our GPS are heavily biased by the predominant Euro-Asian composition of the discovery GWAS, including approximately 75% European, 23% East Asian, 2% African, and only <1% Hispanic ancestry participants. Because there are currently no studies of similar size performed in African American and Hispanic participants that could be used to improve the accuracy of effect estimates in these populations, our model assumes fixed allelic effects across all ancestral groups. Nevertheless, we used a diverse linkage disequilibrium reference panel in order to improve the trans-ethnic performance of the score, and we further enhanced the model by including African ancestry-specific recessive *APOL1* risk genotypes known to have large effects on the risk of kidney disease. We demonstrated that *APOL1* risk genotypes (coded under a recessive model) have an additive effect with the polygenic component, and significantly enhance case-control discrimination in cohorts of African and Hispanic ancestry. We additionally tested two post-hoc ancestry corrections that improve comparability of the scores across diverse populations. Several important limitations of this work need to be discussed. First, we are most limited by the lack of large-scale GWAS for renal function in non-European populations, as well as small size of existing cohorts that could be used for performance optimization in non-Europeans76. As a result, the largest cohorts presently available for robust risk score optimization are also of predominantly European ancestry. The assumption of fixed allelic effects across different ancestral groups is likely inaccurate, because many disease-related lifestyle factors and environmental exposures correlated with ancestry could modify allelic effects. Although it is not possible to overcome this limitation in the present study, our GPS approach could be enhanced in the future by inclusion of larger non-European GWAS studies for eGFR or CKD. Second, the performance comparisons between different ancestral groups could be biased by differences in genotyping platforms and ascertainment methods employed by various biobanks and studies used for testing. For example, the eMERGE-III and Bio*Me* cohorts are ascertained among patients receiving care at tertiary medical centers in the US and are enriched in diseased individuals attending subspecialty clinics. As a result, these cohorts have higher rates of CKD and comorbidities when compared to population-based studies, such as UKBB. Third, the ancestry definitions varied in our testing cohorts. For eMERGE and UKBB, the ancestry was defined agnostically using genetic approaches. In contrast, our analysis of the Bio*Me*, REGARDS, GenHAT, HyperGEN, and WPC cohorts relied on self-report. Despite these cohort differences, the risk score performance was similar in most testing cohorts regardless of the specific ancestry definition employed to define testing cohort. Fourth, by design, our score models polygenic effects from GWAS for renal function as approximated by estimated GFR from serum Cr (filtration biomarker) rather than CKD itself. We recognize multiple limitations to the use of estimated GFR as a phenotype in GWAS, including the fact that serum Cr level is influenced by the rate of Cr production and metabolism in addition to kidney clearance. Accordingly, we designed the score to predict moderately advanced CKD (stage 3 and above) rather than a mild degree of renal dysfunction, to capture a clinically meaningful disease state. Notably, our risk score does not incorporate available information on the polygenic determination of albuminuria77 or primary kidney diseases78,79. However, GWAS for these traits that are published to date remain several orders of magnitude smaller in sample size compared to the GWAS for eGFR and thus incorporation of such data must await more powerful studies. Fifth, we observed significant differences in the mean and variance of the GPS distributions by ancestry. The observed shift in the mean GPS towards higher values in individuals of African ancestry is independent of *APOL1* and is driven by a higher average RAF in the African genomes. The inter-population RAF differences are greatest for the risk alleles with largest effects. This pattern may be consistent with polygenic adaptation, but the effects of uncorrected population stratification in the discovery GWAS may also potentially explain this phenomenon80. We note that the observed differences in the GPS distributions by ancestry represent a significant challenge for the clinical implementation of polygenic risk scores. The key problem is that it is not possible to select a single GPS threshold for all ancestries that results in the similar magnitude of risk. Therefore, we have explored several approaches that could be used to overcome this issue. One approach involves classifying anyone undergoing GPS testing into one of the four continental ancestry groups based on self-report, then using ancestry-specific risk score cut-offs to interpret the results. However, because of a potential for racial bias, the use of race in clinical algorithms has been discouraged.81 One could also classify a tested individual based on inferred genetic ancestry from SNP data and subsequently apply ancestry-specific cut-offs. This approach still categorizes individuals into distinct groups and can be inaccurate, especially for those individuals with highly admixed genomes. We have therefore tested two different regression-based ancestry correction methods that model a continuous spectrum of genetic ancestry based on the diverse reference panel of 1000 Genomes. We demonstrate that the reference population-based correction for both mean and variance can best align distribution tails for the purpose of selecting a single trans-ancestry cut-off to define individuals with comparable risk of CKD. This however, results in some performance trade-offs, especially in admixed populations. Although imperfect, this ancestry adjustment may be helpful in improving risk score standardization for clinical use in diverse populations. In summary, we derived, optimized, and tested a new GPS for CKD across major ancestries and suggested new methods for its trans-ethnic standardization. We demonstrated that the effects of the polygenic component and *APOL1* risk genotypes on the risk of CKD are additive. We also observed that the polygenic component of the score is significantly higher in individuals of African ancestry compared to other ancestral groups. Our study also demonstrated that individuals in the highest 2% of the risk score distribution have nearly 3-fold increase in the disease risk, the degree of relative risk that is equivalent to a positive family history of kidney disease. Our results suggest that at the extreme tail cutoff of the GPS may provide a clinically actionable genetic test for screening and early detection of CKD. Other potential applications of the GPS may include improved risk stratification of potential living kidney donors, or enhanced quality assessment of deceased donor kidneys in the setting of transplant. The clinical utility of the GPS in these clinical settings will require further testing in large prospective studies. ### Data sharing: GPS catalogue deposition The final formulation of the GPS for CKD along with the standardized metrics of performance were deposited in the GPS catalogue, accession number pending. #### Web Resources eMERGE: [https://emerge.mc.vanderbilt.edu/](https://emerge.mc.vanderbilt.edu/) UKBB: [https://www.ukbiobank.ac.uk/](https://www.ukbiobank.ac.uk/) BioMe: [https://icahn.mssm.edu/research/ipm/programs/biome-biobank](https://icahn.mssm.edu/research/ipm/programs/biome-biobank) PLINK: [https://www.cog-genomics.org/plink2](https://www.cog-genomics.org/plink2) PYTHON: [https://www.python.org/](https://www.python.org/) R: [https://www.r-project.org/](https://www.r-project.org/) KING: [http://people.virginia.edu/~wc9c/KING/](http://people.virginia.edu/~wc9c/KING/) FlashPCA: [https://github.com/gabraham/flashpca](https://github.com/gabraham/flashpca) Michigan Imputation Server: [https://imputationserver.sph.umich.edu](https://imputationserver.sph.umich.edu) Human Reference Consortium: [http://www.haplotype-reference-consortium.org](http://www.haplotype-reference-consortium.org) TOPMed Imputation Server: [https://imputation.biodatacatalyst.nhlbi.nih.gov/](https://imputation.biodatacatalyst.nhlbi.nih.gov/) LDPred: [https://github.com/bvilhjal/ldpred](https://github.com/bvilhjal/ldpred) GPS catalogue: [https://www.pgscatalog.org/](https://www.pgscatalog.org/) ## Data Availability The final formulation of the GPS for CKD along with the standardized metrics of performance were deposited in the GPS catalogue, accession number pending. ## Supplementary Information ![Supplemental Figure S1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/26/2021.10.25.21265398/F3.medium.gif) [Supplemental Figure S1:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/F3) Supplemental Figure S1: Risk score distributions eMERGE-III and UKBB datasets: **(a)** ancestry PCA projection of UKBB participants on all 1000G samples; **(b)** the distribution of raw polygenic score without *APOL1* in UKBB by ancestry; **(c)** the distribution of ancestry-adjusted polygenic score (method 1: mean-adjusted) in UKBB by ancestry; **(d)** the distribution of ancestry-adjusted polygenic score (method 2: mean and variance-adjusted) in UKBB by ancestry. Panels **(e), (f), (g)** and **(h)** show the same analyses for the eMERGE-III dataset, respectively. ![Supplemental Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/26/2021.10.25.21265398/F4.medium.gif) [Supplemental Figure S2.](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/F4) Supplemental Figure S2. Distribution of risk allele frequencies (RAF) and their effect sizes for the variants included in the GPS **(a)** comparison of RAF distributions for the risk variants included in the CKD GPS demonstrates higher frequency of rare (RAF<0.01) and common (RAF>0.99) risk alleles in African compared to European genomes (based on 1000G reference populations); this may be explained by the exclusion of variants with MAF<0.01 in European discovery GWAS; **(b)** highly skewed effect size (weight) distribution for the variants included in the GPS for CKD; **(c)** Distribution of RAF difference (AFR-EUR) demonstrating higher average frequency of risk alleles in African genomes (mean RAF difference = 0.002) and a slight rightward shift of the RAF difference distribution from the expected mean of 0; **(d)** Mean RAF difference (AFR-EUR) as a function of effect size binned into three categories (high, intermediate, and low) based on the observed distribution of effects sizes in panel b, demonstrating that the risk alleles with larger effect size have higher average frequency in African compared to European genomes. EUR: European and AFR: African. The bars represent 95% confidence intervals around the mean RAF difference estimate for each bin. ![Supplemental Figure S3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/26/2021.10.25.21265398/F5.medium.gif) [Supplemental Figure S3:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/F5) Supplemental Figure S3: Final GPS calibration analysis in eMERGE-III cohorts combined: predicted risk (X-axis) as a function of the observed risk (Y-axis) in the multiethnic eMERGE-III dataset after ancestry adjustment with **(a)** method 1 and **(b)** method 2. ![Supplemental Figure S4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/10/26/2021.10.25.21265398/F6.medium.gif) [Supplemental Figure S4:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/F6) Supplemental Figure S4: Distributions of the raw (non-standardized) genome-wide polygenic score (GPS) by Yu et al. in the eMERGE-III dataset by ancestry. View this table: [Supplementary Table 1:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T5) Supplementary Table 1: Overall frequencies of *APOL1* G1 and G2 risk alleles and *APOL1* risk genotypes in African American and Latinx cohorts included in the study. View this table: [Supplementary Table 2:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T6) Supplementary Table 2: Association of candidate polygenic scores with CKD in the first UKBB optimization dataset (70% of UKBB Europeans). Odds ratio (OR) per standard deviation (SD) of each risk score, and area under the receiver-operator curve (AUC) were calculated in the UKBB optimization dataset of 279,819 Europeans with adjustment for age, sex, diabetes, first four principal components of ancestry and genotyping batch; AUC crude and variance explained are calculated for the risk score component alone without any covariates; r2: linkage disequilibrium pruning threshold; ρ-tuning parameter to model the proportion of variants assumed to be causal; the best performing score is highlighted in red. Variance explained is estimated as a Nagelkerke pseudo R2 and refers to the variance in case-control status. View this table: [Supplementary Table 3:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T7) Supplementary Table 3: Mutually-adjusted effects for *APOL1* risk genotype and the best GPS from the first optimization cohort when tested in the second optimization cohort of African ancestry. View this table: [Supplemental Table 4:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T8) Supplemental Table 4: GPS performance meta-analysis for testing cohorts of European ancestry (23,364 cases and 117,883 controls in total) View this table: [Supplementary Table 5:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T9) Supplementary Table 5: GPS performance meta-analysis for testing cohorts of African ancestry (4,268 cases and 10,276 controls in total). View this table: [Supplementary Table 6:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T10) Supplementary Table 6: GPS performance meta-analysis for testing cohorts of Latinx/Hispanic ancestry (1,492 cases and 2,984 controls in total). View this table: [Supplementary Table 7:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T11) Supplementary Table 7: GPS performance meta-analysis for testing cohorts of Asian continental ancestry (1,030 cases and 9,896 controls in total). View this table: [Supplementary Table 8:](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T12) Supplementary Table 8: Comparison of the changes in GPS effects for the top 1%, 2%, and 5% tail cut-offs using different ancestry adjustment methods in the eMERGE-III dataset. View this table: [Supplementary Table 9.](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T13) Supplementary Table 9. Comparison of the design, optimization, and testing strategies for the polygenic risk score published by Yu et al. with the score from the present study. View this table: [Supplementary Table 10.](http://medrxiv.org/content/early/2021/10/26/2021.10.25.21265398/T14) Supplementary Table 10. Comparison of the performance characteristics of the polygenic score published by Yu et al. with the score from the present study across the ancestry-defined eMERGE-III case-control cohorts. Adjusted AUC refers to the area under the receiver operating curve for the full model (PRS, age, sex, diabetes, and ancestry PCs). Crude AUC (in parenthesis) refers to the area under the receiver operating curve for the PRS predictor alone. Variance explained was estimated as a Nagelkerke pseudo-R2 and refers to the variance in case-control status explained by the PRS predictor alone. Performance differences in the African-ancestry testing cohort highlighted in red. ## ACKNOWLEDGEMENTS This work was funded by the National Human Genome Research Institute (NHGRI) Electronic Medical Records and Genomics-IV (eMERGE-IV grants 2U01HG008680-05, 1U01HG011167-01, 1U01HG011176-01). Additional sources of funding included UG3DK114926 (KK), RC2DK116690 (KK), R01LM013061 (CW, KK), K25DK128563 (AK) and UL1TR001873 (AK, KK), R01HL151855 (JBM), UM1DK078616 (JBM). The REGARDS (R01HL136666), HyperGEN (R01HL055673), GenHAT (R01HL123782), and WPC (R01HL092173, K24HL133373) studies were all supported by the National Heart, Lung, and Blood Institute (NHLBI). Parts of this study have been conducted using the UKBB Resource under UKBB project ID number 41849. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. ## Footnotes * **Conflicts of Interest:** None * Received October 25, 2021. * Revision received October 25, 2021. * Accepted October 26, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/) ## References 1. 1.Coresh J, Selvin E, Stevens LA, et al. Prevalence of chronic kidney disease in the United States. Jama-J Am Med Assoc 2007;298:2038–47. 2. 2.Nahas ME. The global challenge of chronic kidney disease. Kidney Int 2005;68:2918–29. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1523-1755.2005.00774.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16316385&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000233204300065&link_type=ISI) 3. 3.Collaborators GBDCoD. Global, regional, and national age-sex specific mortality for 264 causes of death, 1980-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet 2017;390:1151–210. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=doi:10.1016/S0140-6736(17)32152-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28919116&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 4. 4.Shang N, Khan A, Polubriaginof F, et al. Medical records-based chronic kidney disease phenotype for clinical care and “ big data” observational and genetic studies. Npj Digit Med 2021;4. 5. 5.Fox CS, Yang Q, Cupples LA, et al. Genomewide linkage analysis to serum creatinine, GFR, and creatinine clearance in a community-based population: the Framingham Heart Study. J Am Soc Nephrol 2004;15:2457–61. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiam5lcGhyb2wiO3M6NToicmVzaWQiO3M6OToiMTUvOS8yNDU3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMTAvMjYvMjAyMS4xMC4yNS4yMTI2NTM5OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 6. 6.Langefeld CD, Beck SR, Bowden DW, Rich SS, Wagenknecht LE, Freedman BI. Heritability of GFR and albuminuria in Caucasians with type 2 diabetes mellitus. Am J Kidney Dis 2004;43:796–800. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.ajkd.2003.12.043&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15112169&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000221104800004&link_type=ISI) 7. 7.Bochud M, Elston RC, Maillard M, et al. Heritability of renal function in hypertensive families of African descent in the Seychelles (Indian Ocean). Kidney Int 2005;67:61–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1523-1755.2005.00055.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15610228&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000225635000006&link_type=ISI) 8. 8.Mottl AK, Vupputuri S, Cole SA, et al. Linkage analysis of glomerular filtration rate in American Indians. Kidney Int 2008;74:1185–91. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ki.2008.410&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18854848&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000260061300013&link_type=ISI) 9. 9.Hunt SC, Hasstedt SJ, Coon H, et al. Linkage of creatinine clearance to chromosome 10 in Utah pedigrees replicates a locus for end-stage renal disease in humans and renal failure in the fawn-hooded rat. Kidney Int 2002;62:1143–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1523-1755.2002.kid557.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12234284&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000178042000004&link_type=ISI) 10. 10.Satko SG, Freedman BI. The familial clustering of renal disease and related phenotypes. Med Clin N Am 2005;89:447-+. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.mcna.2004.11.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15755461&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 11. 11.Wu HH, Kuo CF, Li IJ, et al. Family Aggregation and Heritability of ESRD in Taiwan: A Population-Based Study. American Journal of Kidney Diseases 2017;70:619–26. 12. 12.Groopman EE, Marasa M, Cameron-Christie S, et al. Diagnostic Utility of Exome Sequencing for Kidney Disease. New Engl J Med 2019;380:142–51. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1806891&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30586318&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 13. 13.Lata S, Marasa M, Li Y, Fasel DA, Groopman E, Jobanputra V. Whole-Exome Sequencing in Adults With Chronic Kidney Disease: A Pilot Study (vol 168, pg 100, 2018). Annals of Internal Medicine 2018;168:308-+. 14. 14.Groopman EE, Rasouly HM, Gharavi AG. Genomic medicine for kidney disease. Nature Reviews Nephrology 2018;14:83–104. 15. 15.Chambers JC, Zhang W, Lord GM, et al. Genetic loci influencing kidney function and chronic kidney disease. Nat Genet 2010;42:373–5. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.566&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20383145&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000277179500005&link_type=ISI) 16. 16.Kottgen A, Glazer NL, Dehghan A, et al. Multiple loci associated with indices of renal function and chronic kidney disease. Nat Genet 2009;41:712–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.377&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19430482&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000266411700021&link_type=ISI) 17. 17.Wuttke M, Li Y, Li M, et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat Genet 2019;51:957–72. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0407-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31152163&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 18. 18.Liu CT, Garnaas MK, Tin A, et al. Genetic Association for Renal Traits among Participants of African Ancestry Reveals New Loci for Renal Function. Plos Genetics 2011;7. 19. 19.Okada Y, Sim X, Go MJ, et al. Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations. Nature Genetics 2012;44:904-+. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2352&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22797727&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 20. 20.Genovese G, Friedman DJ, Ross MD, et al. Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science 2010;329:841–5. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMjkvNTk5My84NDEiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8xMC8yNi8yMDIxLjEwLjI1LjIxMjY1Mzk4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 21. 21.Parsa A, Kao WL, Xie D, et al. APOL1 risk variants, race, and progression of chronic kidney disease. New Engl J Med 2013;369:2183–96. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1310345&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24206458&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000327801100005&link_type=ISI) 22. 22.Thomson R, Genovese G, Canon C, et al. Evolution of the primate trypanolytic factor APOL1. Proc Natl Acad Sci U S A 2014;111:E2130–9. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTExLzIwL0UyMTMwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMTAvMjYvMjAyMS4xMC4yNS4yMTI2NTM5OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 23. 23.Ko WY, Rajan P, Gomez F, et al. Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations (vol 93, pg 54, 2013). American Journal of Human Genetics 2013;93:191-. 24. 24.Nadkarni GN, Gignoux CR, Sorokin EP, et al. Worldwide Frequencies of APOL1 Renal Risk Variants. New Engl J Med 2018;379:2571-+. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMc1800748&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30586505&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 25. 25.Gladding PA, Legget M, Fatkin D, Larsen P, Doughty R. Polygenic Risk Scores in Coronary Artery Disease and Atrial Fibrillation. Heart Lung Circ 2019. 26. 26.Wunnemann F, Sin Lo K, Langford-Avelar A, et al. Validation of Genome-Wide Polygenic Risk Scores for Coronary Artery Disease in French Canadians. Circ Genom Precis Med 2019;12:e002481. 27. 27.Rao AS, Knowles JW. Polygenic risk scores in coronary artery disease. Curr Opin Cardiol 2019;34:435–40. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/HCO.0000000000000629&link_type=DOI) 28. 28.Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 2018;50:1219–24. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018–0183-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30104762&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 29. 29.Lall K, Magi R, Morris A, Metspalu A, Fischer K. Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores. Genet Med 2017;19:322–9. 30. 30.Udler MS, McCarthy MI, Florez JC, Mahajan A. Genetic Risk Scores for Diabetes Diagnosis and Precision Medicine. Endocr Rev 2019;40:1500–20. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1210/er.2019-00088&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 31. 31.Liu L, Kiryluk K. Genome-wide polygenic risk predictors for kidney disease. Nat Rev Nephrol 2018;14:723–4. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30279535&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 32. 32.Canadas-Garre M, Anderson K, Cappa R, et al. Genetic Susceptibility to Chronic Kidney Disease - Some More Pieces for the Heritability Puzzle. Front Genet 2019;10:453. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31214239&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 33. 33.Weinberger DR. Polygenic Risk Scores in Clinical Schizophrenia Research. Am J Psychiatry 2019;176:3–4. 34. 34.Reginsson GW, Ingason A, Euesden J, et al. Polygenic risk scores for schizophrenia and bipolar disorder associate with addiction. Addict Biol 2018;23:485–92. 35. 35.Power RA, Steinberg S, Bjornsdottir G, et al. Polygenic risk scores for schizophrenia and bipolar disorder predict creativity. Nat Neurosci 2015;18:953–5. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nn.4040&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26053403&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 36. 36.Hoffmann TJ, Ehret GB, Nandakumar P, et al. Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation. Nat Genet 2017;49:54–64. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3715&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27841878&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 37. 37.Ehret GB, Ferreira T, Chasman DI, et al. The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nat Genet 2016;48:1171–84. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3667&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27618452&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 38. 38.Khera AV, Chaffin M, Wade KH, et al. Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Cell 2019;177:587–96 e9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2019.03.028&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31002795&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 39. 39.Dai J, Lv J, Zhu M, et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir Med 2019;7:881–91. 40. 40.Aly M, Wiklund F, Xu J, et al. Polygenic risk score improves prostate cancer risk prediction: results from the Stockholm-1 cohort study. Eur Urol 2011;60:21–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.eururo.2011.01.017&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21295399&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 41. 41.Pashayan N, Duffy SW, Neal DE, et al. Implications of polygenic risk-stratified screening for prostate cancer on overdiagnosis. Genet Med 2015;17:789–95. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=doi:10.1038/gim.2014.192&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25569441&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 42. 42.Fritsche LG, Gruber SB, Wu Z, et al. Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative. American journal of human genetics 2018;102:1048–61. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 43. 43.Jeon J, Du M, Schoen RE, et al. Determining Risk of Colorectal Cancer and Starting Age of Screening Based on Lifestyle, Environmental, and Genetic Factors. Gastroenterology 2018;154:2152-64.e19. 44. 44.Huyghe JR, Bien SA, Harrison TA, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nature Genetics 2019;51:76–87. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0286-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 45. 45.Mavaddat N, Michailidou K, Dennis J, et al. Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. American journal of human genetics 2019;104:21–34. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2018.11.002&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30554720&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 46. 46.Seibert TM, Fan CC, Wang Y, et al. Polygenic hazard score to guide screening for aggressive prostate cancer: development and validation in large scale cohorts. BMJ 2018;360:j5757. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNjAvamFuMDlfNS9qNTc1NyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzEwLzI2LzIwMjEuMTAuMjUuMjEyNjUzOTguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 47. 47.Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics 2019;51:584–91. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0379-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30926966&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 48. 48.Wand H, Lambert SA, Tamburro C, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 2021;591:211–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-021-03243-6&link_type=DOI) 49. 49.Stanaway IB, Hall TO, Rosenthal EA, et al. The eMERGE genotype set of 83,717 subjects imputed to ∼40 million variants genome wide and association with the herpes zoster medical record phenotype. Genet Epidemiol 2019;43:63–81. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.22167&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30298529&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 50. 50.Khan A, Shang N, Petukhova L, et al. Medical Records-Based Genetic Studies of the Complement System. J Am Soc Nephrol 2021. 51. 51.McCarthy S, Das S, Kretzschmar W, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 2016;48:1279–83. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3643&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27548312&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 52. 52.Abraham G, Inouye M. Fast principal component analysis of large-scale genome-wide data. PLoS One 2014;9:e93766. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0093766&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24718290&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 53. 53.Das S, Forer L, Schonherr S, et al. Next-generation genotype imputation service and methods. Nat Genet 2016;48:1284–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3656&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27571263&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 54. 54.Nadkarni GN, Gignoux CR, Sorokin EP, et al. Worldwide Frequencies of APOL1 Renal Risk Variants. N Engl J Med 2018;379:2571–2. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMc1800748&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30586505&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 55. 55.Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 2007;81:559–75. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/519795&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17701901&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 56. 56.Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics 2011;27:2156–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btr330&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21653522&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000292778700023&link_type=ISI) 57. 57.Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 2010;26:1205–10. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq126&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20335276&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000277225400011&link_type=ISI) 58. 58.Sudlow C, Gallacher J, Allen N, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 2015;12:e1001779. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1001779&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25826379&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 59. 59.Altshuler DM, Durbin RM, Abecasis GR, et al. A global reference for human genetic variation. Nature 2015;526:68-+. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature15393&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26432245&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 60. 60.Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics 2012;44:955-+. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2354&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22820512&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 61. 61.Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018;562:203-+. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0579-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 62. 62.Belbin GM, Cullina S, Wenric S, et al. Toward a fine-scale population health monitoring system. Cell 2021;184:2068–83 e11. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/J.CELL.2021.03.034&link_type=DOI) 63. 63.Howard VJ, Cushman M, Pulley L, et al. The reasons for geographic and racial differences in stroke study: objectives and design. Neuroepidemiology 2005;25:135–43. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1159/000086678&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15990444&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000231700200005&link_type=ISI) 64. 64.Williams RR, Rao DC, Ellison RC, et al. NHLBI family blood pressure program: methodology and recruitment in the HyperGEN network. Hypertension genetic epidemiology network. Ann Epidemiol 2000;10:389–400. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1047-2797(00)00063-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10964005&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000088943000007&link_type=ISI) 65. 65.Limdi NA, Brown TM, Shendre A, Liu N, Hill CE, Beasley TM. Quality of anticoagulation control and hemorrhage risk among African American and European American warfarin users. Pharmacogenet Genomics 2017;27:347–55. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28806200&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 66. 66.Limdi NA, Nolin TD, Booth SL, et al. Influence of kidney function on risk of supratherapeutic international normalized ratio-related hemorrhage in warfarin users: a prospective cohort study. Am J Kidney Dis 2015;65:701–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.ajkd.2014.11.004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25468385&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 67. 67.Arnett DK, Boerwinkle E, Davis BR, Eckfeldt J, Ford CE, Black H. Pharmacogenetic approaches to hypertension therapy: design and rationale for the Genetics of Hypertension Associated Treatment (GenHAT) study. Pharmacogenomics J 2002;2:309–17. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sj.tpj.6500113&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12439737&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 68. 68.Major cardiovascular events in hypertensive patients randomized to doxazosin vs chlorthalidone: the antihypertensive and lipid-lowering treatment to prevent heart attack trial (ALLHAT). ALLHAT Collaborative Research Group. JAMA 2000;283:1967–75. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.283.15.1967&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10789664&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000086436600031&link_type=ISI) 69. 69.Genomes Project C, Auton A, Brooks LD, et al. A global reference for human genetic variation. Nature 2015;526:68–74. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature15393&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26432245&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 70. 70.Levey AS, Stevens LA. Estimating GFR using the CKD Epidemiology Collaboration (CKD-EPI) creatinine equation: more accurate GFR estimates, lower CKD prevalence estimates, and better risk predictions. Am J Kidney Dis 2010;55:622–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.ajkd.2010.02.337&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20338463&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000276054500004&link_type=ISI) 71. 71.Vilhjalmsson BJ, Yang J, Finucane HK, et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American journal of human genetics 2015;97:576–92. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2015.09.001&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26430803&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 72. 72.Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26:2190–1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq340&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20616382&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281738900017&link_type=ISI) 73. 73.Khera AV, Chaffin M, Zekavat SM, et al. Whole-Genome Sequencing to Characterize Monogenic and Polygenic Contributions in Patients Hospitalized With Early-Onset Myocardial Infarction. Circulation 2019;139:1593–602. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 74. 74.Yu Z, Jin J, Tin A, et al. Polygenic Risk Scores for Kidney Function and Their Associations with Circulating Proteome, and Incident Kidney Diseases. J Am Soc Nephrol 2021. 75. 75.Zhang J, Thio CHL, Gansevoort RT, Snieder H. Familial Aggregation of CKD and Heritability of Kidney Biomarkers in the General Population: The Lifelines Cohort Study. Am J Kidney Dis 2021;77:869–78. 76. 76.Sirugo G, Williams SM, Tishkoff SA. The Missing Diversity in Human Genetic Studies (vol 177, pg 26, 2019). Cell 2019;177:1080-. 77. 77.Teumer A, Li Y, Ghasemi S, et al. Genome-wide association meta-analyses and fine-mapping elucidate pathways influencing albuminuria. Nat Commun 2019;10:4130. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-019-11576-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31511532&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 78. 78.Kiryluk K, Li Y, Scolari F, et al. Discovery of new risk loci for IgA nephropathy implicates genes involved in immunity against intestinal pathogens. Nat Genet 2014;46:1187–96. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3118&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25305756&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 79. 79.Xie J, Liu L, Mladkova N, et al. The genetic architecture of membranous nephropathy and its potential to improve non-invasive diagnosis. Nat Commun 2020;11:1600. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-020-15383-w&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32231244&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) 80. 80.Sohail M, Maier RM, Ganna A, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife 2019;8. 81. 81.Vyas DA, Eisenstein LG, Jones DS. Hidden in Plain Sight - Reconsidering the Use of Race Correction in Clinical Algorithms. N Engl J Med 2020;383:874–82. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/nejmms2004740&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F10%2F26%2F2021.10.25.21265398.atom) [1]: /embed/graphic-2.gif [2]: /embed/graphic-3.gif [3]: /embed/graphic-4.gif [4]: /embed/graphic-5.gif [5]: /embed/graphic-6.gif [6]: /embed/inline-graphic-1.gif [7]: /embed/inline-graphic-2.gif [8]: /embed/graphic-7.gif [9]: /embed/graphic-8.gif [10]: /embed/inline-graphic-3.gif