PT - JOURNAL ARTICLE AU - Thomas, Nicholas J AU - McGovern, Andrew AU - Young, Katherine G AU - Sharp, Seth A AU - Weedon, Michael N AU - Hattersley, Andrew T AU - Dennis, John AU - Jones, Angus G TI - Identifying type 1 and 2 diabetes in population level data: assessing the accuracy of published approaches AID - 10.1101/2022.04.11.22273617 DP - 2022 Jan 01 TA - medRxiv PG - 2022.04.11.22273617 4099 - http://medrxiv.org/content/early/2022/04/13/2022.04.11.22273617.short 4100 - http://medrxiv.org/content/early/2022/04/13/2022.04.11.22273617.full AB - Aims Population datasets are increasingly used to study type 1 or 2 diabetes, and inform clinical practice. However, correctly classifying diabetes type, when insulin treated, in population datasets is challenging. Many different approaches have been proposed, ranging from simple age or BMI cut offs, to complex algorithms, and the optimal approach is unclear. We aimed to compare the performance of approaches for classifying insulin treated diabetes for research studies, evaluated against two independent biological definitions of diabetes type.Method We compared accuracy of thirteen reported approaches for classifying insulin treated diabetes into type 1 and type 2 diabetes in two population cohorts with diabetes: UK Biobank (UKBB) n=26,399 and DARE n=1,296. Overall accuracy and predictive values for classifying type 1 and 2 diabetes were assessed using: 1) a type 1 diabetes genetic risk score and genetic stratification method (UKBB); 2) C-peptide measured at >3 years diabetes duration (DARE).Results Accuracy of approaches ranged from 71%-88% in UKBB and 68%-88% in DARE. All approaches were improved by combining with requirement for early insulin treatment (<1 year from diagnosis). When classifying all participants, combining early insulin requirement with a type 1 diabetes probability model incorporating continuous clinical features (diagnosis age and BMI only) consistently achieved high accuracy, (UKBB 87%, DARE 85%). Self-reported diabetes type alone had high accuracy (UKBB 87%, DARE 88%) but was available in just 15% of UKBB participants. For identifying type 1 diabetes with minimal misclassification, using models with high thresholds or young age at diagnosis (<20 years) had the highest performance. An online tool developed from all UKBB findings allows the optimum approach of those tested to be selected based on variable availability and the research aim.Conclusion Self-reported diagnosis and models combining continuous features with early insulin requirement are the most accurate methods of classifying insulin treated diabetes in research datasets without measured classification biomarkers.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe Diabetes Alliance for Research in England (DARE) study was funded by the Wellcome Trust and supported by the Exeter NIHR Clinical Research Facility.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:1. Cohort name: Diabetes Alliance for Research in England (formerly Exeter Research Alliance for Diabetes) 2. Non-abbreviated, full name of Ethics Committee / Institutional Review Board (IRB) that assessed the ethics for the DARE cohort: Devon & Torbay Research Ethics Committee. REC Ref: 2002/7/118 3. Decision made by ethics oversight body: ApprovedI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesUK Biobank data are available through a procedure described at http://www.ukBiobank.ac.uk/using-the-resource/. DARE data are available through application to the Peninsula Research Bank https://exetercrfnihr.org/about/exeter-10000-prb/