RT Journal Article SR Electronic T1 Development and evaluation of codelists for identifying marginalised groups in primary care JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2024.09.11.24313391 DO 10.1101/2024.09.11.24313391 A1 Perchyk, Tetyana A1 de Vere Hunt, Isabella A1 Nicholson, Brian D A1 Mounce, Luke A1 Sykes, Kate A1 Lyratzopoulos, Yoryos A1 Lemanska, Agnieszka A1 Whitaker, Katriina L A1 Kerrison, Robert S YR 2024 UL http://medrxiv.org/content/early/2024/09/13/2024.09.11.24313391.abstract AB Background Primary care electronic health records provide a rich source of information for inequalities research. However, the reliability and validity of the research derived from these records depends on the completeness and resolution of the codelists used to identify marginalised populations.Aim The aim of this project was to develop comprehensive codelists for identifying ethnic minorities, people with learning disabilities (LD), people with severe mental illness (SMI) and people who are transgender.Design and setting This study was a codelist development project, conducted using primary care data from the United Kingdom.Method Groups of interest were defined a priori. Relevant clinical codes were identified by searching Clinical Practice Research Datalink (CPRD) publications, codelist repositories and the CPRD code browser. Relevant codelists were downloaded and merged according to marginalised group. Duplicates were removed and remaining codes reviewed by two general practitioners. Comprehensiveness was assessed in a representative CPRD population of 10,966,759 people, by comparing the frequencies of individuals identified when using the curated codelists, compared to commonly used alternatives.Results A total of 52 codelists were identified. 1,420 unique codes were selected after removal of duplicates and GP review. Compared with comparator codelists, an additional 48,017 (76.6%), 52,953 (68.9%) and 508 (36.9%) people with a LD, SMI or transgender code were identified. The frequencies identified for ethnicity were consistent with expectations for the UK population.Conclusion The codelists curated through this project will improve inequalities research by improving standards of identifying marginalised groups in primary care data.HOW THIS FITS INThe reliability and validity of primary care data for inequalities research depends on the comprehensiveness of the codes used to identify people from marginalised groups.This study set out to develop comprehensive codelists for the identification of four key groups, known to experience health inequalities.We developed comprehensive codelists for identifying ethnic minorities, learning disabilities, severe mental illness and people who are transgender, using a systematic approach.The codelists were validated by two general practitioners, assessed in a representative sample, and can now be used in primary care practice and research, both nationally and internationally.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work is funded by the National Institute for Health Research (NIHR) Policy Research Programme, conducted through the Policy Research Unit in Cancer Awareness, Screening and Early Diagnosis, NIHR206132. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. This work was supported by Breast Cancer Now [Grant Ref no: 2023FebIFS1615]. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe codelists developed through this work are available from Open Science Framework. https://osf.io/8skze/