Algorithms for the identification of prevalent diabetes in the All of Us Research Program validated using polygenic scores – a new resource for diabetes precision medicine ============================================================================================================================================================================= * Lukasz Szczerbinski * Ravi Mandla * Philip Schroeder * Bianca C. Porneala * Josephine H. Li * Jose C. Florez * Josep M. Mercader * Alisa K. Manning * Miriam S. Udler ## ABSTRACT **OBJECTIVE** The study aimed to develop and validate algorithms for identifying people with type 1 and type 2 diabetes in the All of Us Research Program (AoU) cohort, using electronic health record (EHR) and survey data. **RESEARCH DESIGN AND METHODS** Two sets of algorithms were developed, one using only EHR data (EHR), and the other using a combination of EHR and survey data (EHR+). Their performance was evaluated by testing their association with polygenic scores for both type 1 and type 2 diabetes. **RESULTS** For type 1 diabetes, the EHR-only algorithm showed a stronger association with T1D polygenic score (*p*=3×10−5) than the EHR+. For type 2 diabetes, the EHR+ algorithm outperformed both the EHR-only and the existing AoU definition, identifying additional cases (25.79% and 22.57% more, respectively) and showing stronger association with T2D polygenic score (DeLong *p*=0.03 and 1×10−4, respectively). **CONCLUSIONS** We provide new validated definitions of type 1 and type 2 diabetes in AoU, and make them available for researchers. These algorithms, by ensuring consistent diabetes definitions, pave the way for high-quality diabetes research and future clinical discoveries. ![Figure1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/05/2023.09.05.23295061/F1.medium.gif) [Figure1](http://medrxiv.org/content/early/2023/09/05/2023.09.05.23295061/F1) **Why did we undertake this study?**This study was conducted to develop and validate algorithms for identifying type 1 and type 2 diabetes cases in the All of Us Research Program (AoU). **What is the specific question(s) we wanted to answer?**Can accurate algorithms for type 1 and type 2 diabetes identification be developed and validated using AoU cohort Electronic Health Record (EHR) and survey data? Do the identified diabetes cases show association with polygenic scores in diverse populations? **What did we find?**We developed a new validated type 1 diabetes definition and expanded upon the existing type 2 diabetes definition. **What are the implications of our findings?**The developed algorithms can be universally implemented in AoU for identifying study participants for well-defined case-control diabetes studies. ## 1. Introduction The All of Us Research Program (AoU) aims to collect data from at least one million individuals across the United States, creating a diverse health database for epidemiological and genomic studies (1). However, the lack of a readily available type 1 diabetes algorithm and the underutilization of all data sources in the existing type 2 diabetes algorithm limit the potential for AoU to contribute to diabetes research. This study addresses these gaps by developing and validating the first type 1 diabetes algorithm and an optimized type 2 diabetes algorithm in AoU, using both electronic health record and survey data. We validated these algorithms using polygenic scores (PSs) (2), assessing their performance across diverse ancestries within the AoU cohort. This work enhances the utility of AoU for high-quality diabetes research and future clinical discoveries. ### 2. Materials and methods We analyzed data from 372,397 AoU participants enrolled by January 1, 2022, with EHR data available for 309,974 participants and whole genome sequencing (WGS) data for 98,590 participants (extracted from the AoU v6 dataset on November 3, 2022). Genetic ancestry classifications were defined by the AoU Research Program, resulting in subgroups labeled as African/African American (AFR), American Admixed/Latino (AMR), East Asian (EAS); European (EUR) and South Asian (SAS). For the development of diabetes algorithms, we incorporated multiple data points available in AoU, including EHR-based diagnosis, diabetes medications, laboratory data, and self-reported diabetes diagnosis from survey data. More details on the cohort description and data extraction and processing can be found in the **Supplementary Methods**. We developed algorithms to identify individuals diagnosed with type 1 and 2 diabetes, along with individuals without a diabetes diagnosis, for use as “cases” and “controls” in diabetes studies. Two algorithm versions were created: EHR and EHR+. The EHR algorithms utilized EHR-driven diagnosis, medications, and laboratory measurement values and were developed based on previously reported algorithms (3,4). Modifications were made to ensure relevance to American Diabetes Association’s diagnostic criteria (5) and to exclude certain patient categories to avoid misclassification. The EHR+ algorithms also included self-reported diagnosis obtained from survey data. For the type 1 diabetes case identification algorithm, we modified the algorithm proposed by the eMERGE Phase-IV Program (3) (**Figure 1A**). For the type 2 diabetes case identification algorithm, we modified the Northwestern University algorithm (4) (**Figure 1B**). Moreover, we applied the algorithm for type 2 diabetes case identification available in the “Phenotype Library” of the AoU Researcher Workbench “Featured Workspaces” (labeled ‘AoU-T2D’), to compare the performance of our developed algorithms. Finally, we developed a universal algorithm for the identification of control individuals without diabetes (**Figure 1C**), based on the Northwestern University type 2 diabetes control algorithm (4). Detailed descriptions of the algorithms used in our study are provided in the **Supplementary Methods**. ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/05/2023.09.05.23295061/F2/graphic-2.medium.gif) [](http://medrxiv.org/content/early/2023/09/05/2023.09.05.23295061/F2/graphic-2) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/05/2023.09.05.23295061/F2/graphic-3.medium.gif) [](http://medrxiv.org/content/early/2023/09/05/2023.09.05.23295061/F2/graphic-3) Figure 1. Algorithms for the identification of: A. type 1 diabetes cases; B. type 2 diabetes cases; C. controls without diabetes. We implemented two published type 1 diabetes polygenic scores: ‘T1D-PS EUR’ from Sharp *et al*. (6), which consisted of 67 single nucleotide polymorphisms (SNPs), derived from European ancestry cohorts; and ‘T1D-PS AA’ from Onengut-Gumuscu *et al*. (7) with 12 variants derived from genetic associations in African-American cohorts. We also created a global extended type 2 diabetes score ‘T2D-PS EUR’, using PRS-CS software, based on a meta-analysis of summary statistics from the European ancestry cohorts (8,9). All scores underwent ancestry adjustment as per methods described by Khera *et al*. (10), using AoU genetic ancestry probabilities. We performed logistic regression analyses to evaluate the accuracy of our diabetes definitions using disease-specific polygenic scores. The analysis included an assessment of the area under the receiver operating characteristic curve (AUC), incremental AUC, which is the difference between the AUC of the full model including the PS and the model only including the covariates, and an evaluation using the DeLong test (11) to compare AUC curves. We also studied the impact of the polygenic scores in the top 10%, 5%, and 2.5% of the distribution compared to the interquartile range. Further details on calculations and statistical analyses are in the **Supplementary Methods**. ## 3. Results Demographics of the individuals with diabetes identified using the T2D-EHR, T2D-EHR+, T1D-EHR and T1D-EHR+ algorithms are summarized in **Table 1**. Corresponding information for cases identified by the existing AoU-T2D definition and for controls without diabetes identified by our algorithm are presented in **Supplementary Tables S2 and S3**, respectively. To determine the best-performing algorithm for type 1 diabetes and type 2 diabetes, we calculated the associations between diabetes case-control definitions and relevant polygenic scores, and compared odds ratios (ORs), AUCs and incremental AUCs (**Table 2, Figures 2A and 2B**). View this table: [Table 1.](http://medrxiv.org/content/early/2023/09/05/2023.09.05.23295061/T1) Table 1. Demographics of diabetes status in AoU, from the diabetes phenotyping algorithms, stratified by self-reported race. View this table: [Table 2.](http://medrxiv.org/content/early/2023/09/05/2023.09.05.23295061/T2) Table 2. Predictive accuracy of generated PSs using EHR and EHR+ algorithms for cases and universal control algorithm for controls without diabetes identification, in All of Us population, stratified by genetic ancestry subgroups. We exclude genetic ancestry subgroup results for subgroups with insufficient sample size according to the All of Us Data and Statistics Dissemination Policy (16). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/09/05/2023.09.05.23295061/F3.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2023/09/05/2023.09.05.23295061/F3) Figure 2. Predictive accuracy of generated PS and diabetes risk discrimination at various percentage cutoffs (2.5%, 5%, or 10%) in AoU population, using developed algorithms. A) Receiver operating characteristic (ROC) curves for T1D-EHR (red) and T1D-EHR+ (blue) definitions, with the p-value of the DeLong test. B) ROC curves for T2D-EHR (red), T2D-EHR+ (blue), and AoU-T2D (green) definitions, with the p-value of the DeLong test, in the entire sample and in genetic-ancestry subgroups with sufficiently large case counts. C) Forest plot for high-risk T1D-PS EUR groups for the T1D-EHR definition (due to the limited number of individuals with genetic data in the AoU cohort (16), we were able to compare the risk of type 1 diabetes defined by developed algorithms only in the European ancestry subgroup). D) Forest plot for high-risk T2D-PS EUR groups for the T2D-EHR+ definition in the entire sample and in genetic-ancestry subgroups with sufficiently large case counts (we did not perform the analysis of extreme T2D-PS EUR thresholds in the East Asian and South Asian ancestry subgroups due to insufficient sample size (16)). For type 1 diabetes, the T1D-EHR consistently out-performed the T1D-EHR+ algorithm across the full dataset as well as in all analyzed ancestry subgroups, as indicated by larger AUCs (DeLong *p*-value = 3×10−5, AUC for entire AoU cohort, **Figure 2A**) and incremental AUC values (**Table 2**), particularly in the European ancestry subgroup. Due to insufficient sample size, we could not evaluate the performance of the T1D-PS EUR in East Asian and South Asian ancestry subgroups. To evaluate the possible reasons for the superior performance of the T1D-EHR algorithm, we analyzed the clinical characteristics of individuals identified as having type 1 diabetes based on survey data but not EHR data. We observed that across all ancestries, individuals identified as T1D based on survey alone compared to EHR alone had higher BMI (*p*-value = 4.7×10−26), were older (*p*-value = 4.8×10−25) (**Supplementary Table S4**), which suggest that participants in this group are more likely to have T2D despite self-reporting having T1D. For type 2 diabetes, the predictive power of the T2D-PS EUR was significantly better when using the T2D-EHR+ algorithm compared to the T2D-EHR or AoU-T2D algorithm in the entire sample (DeLong *p*-values = 0.03 and 1×10−4, respectively **Figure 2B, Supplementary Table S5**) and in each of the genetic ancestry subgroups, and was able to identify more cases **(Table 2)**. We were also interested to see to what extent polygenic scores could classify individuals of diverse ancestries as high risk, using the best-performing type 1 diabetes and type 2 diabetes algorithms, as determined above (T1D-EHR and T2D-EHR+). We looked at risk of diabetes using progressively more extreme cutoffs of the PSs distribution. For type 1 diabetes, in the overall AoU population, individuals within the top 10th, 5th, 2.5th percentile of the T1D-PS EUR are 11.16, 16.09, and 25.02 times more likely to have disease, compared to the individuals with T1D-PS EUR within the 25th-75th percentile (**Figure 2C, Table S6**). For type 2 diabetes, all individuals within the top 10th, 5th, and 2.5th percentile of the T2D-PS EUR distribution were 3.28, 3.68, and 4.16 times more likely to have type 2 diabetes compared to the individuals with T2D-PS EUR within the 25th-75th percentile (**Figure 2D, Table S7**). ## 4. Discussion To improve the utility of the AoU Research Program in advancing research in diabetes, we have constructed and validated algorithms for diabetes definition, and compared the performance of algorithms that included or excluded participant survey data (EHR vs EHR+). For type 1 diabetes, where no definition was previously available in AoU, we propose that the T1D-EHR algorithm is best performing, identifying more than 250 cases in the AoU cohorts. For type 2 diabetes, the EHR+ definition was best performing, showing increased accuracy and identifying 6,661 (22.57%) more cases than the existing AoU-T2D algorithm. The generation of accurate phenotype definitions from EHR data is a challenging but crucial step for any type of disease-related research that utilizes large scale biobank data, including the research performed with AoU. Integrating both EHR and self-reported survey data may offer a more complete picture of the individual than either alone, particularly addressing missing information in EHR data. In this study, we use a genetic tool, polygenic scores (12), to validate the accuracy of our newly developed algorithms and to compare across algorithms. Notably, the validity of our diabetes algorithms was confirmed by comparable performance of PSs with established research, both for type 1 (6) and type 2 diabetes (13), supporting the integrity of our case definitions, and underscoring the value of AoU for diabetes research in a diverse population. While we observed that including survey data increased both the accuracy (as validated by associations with polygenic scores) and the number of cases for type 2 diabetes, we found that for type 1 diabetes, inclusion of survey data resulted in poorer performance. The higher BMI and older age of individuals identified as type 1 diabetes cases based on the survey, but not EHR data, compared to those identified by the EHR algorithm, raises the possibility that some of these individuals could, in fact have type 2 diabetes, suggesting that the inclusion of survey data may inadvertently introduce noise into type 1 diabetes case identification. Thus, for type 1 diabetes, we concluded that the EHR algorithm should be preferentially applied for research in AoU. Whereas for type 2 diabetes, we recommend the preferential use of the EHR+ algorithm, as it substantially increases the number of individuals without introducing contamination of false positives cases. This study has several strengths, including the development and validation of algorithms that incorporate multiple data sources. We also note that the prediction accuracy of our algorithms showed variations across different populations, highlighting an ongoing Eurocentric bias in genomic studies (14,15). By focusing on diverse groups, the AoU Research Program intends to address this bias. In conclusion, we provide, for the first time, a validated type 1 diabetes definition for AoU and expand upon existing type 2 diabetes definitions to incorporate both EHR and survey data. We offer access to these harmonized algorithms to help facilitate and standardize diabetes research. Our algorithms, methods, and relevant analytical code will be readily available in the Research Workbench to be shared and implemented by other researchers working within the AoU Research Program. ## Supporting information Supplemental material [[supplements/295061_file03.pdf]](pending:yes) ## Data Availability All data produced in the present study are available upon reasonable request to the authors ## Funding and Assistance L.S. is supported by funds from the Ministry of Education and Science of Poland within the project “Excellence Initiative—Research University”, the Ministry of Health of Poland within the project “Center of Artificial Intelligence in Medicine at the Medical University of Bialystok” and American Diabetes Association grant 11-22-PDFPM-03. J.H.L. is supported by NIDDK K23 DK131345 and MGH ECOR Fund for Medical Discovery Clinical Research Award. J.C.F. is supported by NHLBI K24 HL157960. J.M.M. is supported by American Diabetes Association Innovative and Clinical Translational Award 1-19-ICTS-068, American Diabetes Association grant #11-22-ICTSPM-16 and by NHGRI U01HG011723. A.K.M. is supported by the Foundation for the National Institutes of Health with funding from AMP CMD RFP 2: GENERATION of New genetic, -omic, or biomarker data for Common Metabolic Diseases titled “Common metabolic disease genetic association analysis in the All of Us Research Program” and by NHGRI U01HG011723. M.S.U. is supported by NIDDK K23DK114551, NIDDK R03DK131249, and Doris Duke Foundation Award 2022063. ## Conflict of Interest No potential conflicts of interest relevant to this article were reported. ## Author Contributions and Guarantor Statement L.S., R.M. and P.S. researched data, wrote, reviewed, and edited the manuscript. B.C.P., J. H. L. and J. C. F. reviewed and edited the manuscript. J. M. M., A.K.M. and M.S.U. reviewed and edited the manuscript and are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. ## Prior Presentation Part of the results included in this article was presented at the All of Us Researchers Convention (March 30th, 2023). ## 5. Acknowledgments The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants. ## Footnotes * # These authors jointly directed this work. * **Twitter Summary** “New study develops and validates type 1 and type 2 diabetes algorithms in the All of Us Research Program cohort, improving case identification for diabetes research. #diabetesresearch #AllOfUsResearchProgram” * Received September 5, 2023. * Revision received September 5, 2023. * Accepted September 5, 2023. * © 2023, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## 6. References 1. 1.All of Us Research Program Investigators, Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, et al. The “All of Us” Research Program. N Engl J Med. 2019 Aug 15;381(7):668–76. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMsr1809937&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31412182&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F05%2F2023.09.05.23295061.atom) 2. 2.Chen CY, Lee PH, Castro VM, Minnier J, Charney AW, Stahl EA, et al. Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records. Transl Psychiatry. 2018 Apr 18;8(1):1–8. 3. 3.Qu H, Roizen J, Mentch F, Connolly J, Hain H, Sleiman P, et al. Phenotype Algorithm for Type 1 Diabetes – eMERGE Phase-IV Program [Internet]. [cited 2023 Aug 3]. Available from: [https://phekb.org/phenotype/type-1-diabetes](https://phekb.org/phenotype/type-1-diabetes) 4. 4.Pacheco J, Thompson W. Northwestern University Type 2 diabetes mellitus (T2DM) algorithms [Internet]. [cited 2023 Aug 3]. Available from: [https://phekb.org/phenotype/type-2-diabetes-mellitus](https://phekb.org/phenotype/type-2-diabetes-mellitus) 5. 5.ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 2. Classification and Diagnosis of Diabetes: Standards of Care in Diabetes—2023. Diabetes Care. 2022 Dec 12;46(Supplement_1):S19–40. 6. 6.Sharp SA, Rich SS, Wood AR, Jones SE, Beaumont RN, Harrison JW, et al. Development and Standardization of an Improved Type 1 Diabetes Genetic Risk Score for Use in Newborn Screening and Incident Diagnosis. Diabetes Care. 2019 Jan 11;42(2):200–7. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiI0Mi8yLzIwMCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIzLzA5LzA1LzIwMjMuMDkuMDUuMjMyOTUwNjEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 7. 7.Onengut-Gumuscu S, Chen WM, Robertson CC, Bonnie JK, Farber E, Zhu Z, et al. Type 1 Diabetes Risk in African-Ancestry Participants and Utility of an Ancestry-Specific Genetic Risk Score. Diabetes Care. 2019 Jan 18;42(3):406–15. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiI0Mi8zLzQwNiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIzLzA5LzA1LzIwMjMuMDkuMDUuMjMyOTUwNjEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 8. 8.Vujkovic M, Keaton JM, Lynch JA, Miller DR, Zhou J, Tcheandjieu C, et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat Genet. 2020 Jul;52(7):680–91. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-020-0637-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F05%2F2023.09.05.23295061.atom) 9. 9.Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023 Jan;613(7944):508–18. 10. 10.Khera AV, Chaffin M, Zekavat SM, Collins RL, Roselli C, Natarajan P, et al. Whole-Genome Sequencing to Characterize Monogenic and Polygenic Contributions in Patients Hospitalized With Early-Onset Myocardial Infarction. Circulation. 2019 Mar 26;139(13):1593–602. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1161/circulationaha.118.035658&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F05%2F2023.09.05.23295061.atom) 11. 11.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988 Sep;44(3):837–45. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2531595&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=3203132&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F05%2F2023.09.05.23295061.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1988Q069100016&link_type=ISI) 12. 12.Udler MS, McCarthy MI, Florez JC, Mahajan A. Genetic Risk Scores for Diabetes Diagnosis and Precision Medicine. Endocrine Reviews. 2019 Dec 1;40(6):1500–20. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1210/er.2019-00088&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F05%2F2023.09.05.23295061.atom) 13. 13.Ge T, Irvin MR, Patki A, Srinivasasainagendra V, Lin YF, Tiwari HK, et al. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Medicine. 2022 Jun 29;14(1):70. 14. 14.Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019 Apr;51(4):584–91. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0379-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30926966&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F09%2F05%2F2023.09.05.23295061.atom) 15. 15.Barroso I. The importance of increasing population diversity in genetic studies of type 2 diabetes and related glycaemic traits. Diabetologia. 2021 Dec;64(12):2653–64. 16. 16.All of Us Research Program. Data and Statistics Dissemination Policy [Internet]. Available from: [https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/2020/05/AoU\_Policy\_Data\_and\_Statistics\_Dissemination\_508.pdf](https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/2020/05/AoU\_Policy\_Data\_and_Statistics_Dissemination_508.pdf).