Data availability
Genetic and phenotypic data for the 100KGP study participants, the 100KGP Pilot study participants and the GMS participants are available through the Genomics England Research Environment via the application at https://www.genomicsengland.co.uk/join-a-gecip-domain. Data pertaining to: WGS data were obtained for 78,132 100KGP participants, 4,054 100KGP Pilot participants, 32,030 GMS participants (v3) and 13,037 NBR participants; HPO phenotype data from the ‘rare_diseases_participant_phenotype’ table (Main Programme v14), ‘observation’ table (GMS v3) and ‘hpo’ table (Rare Diseases Pilot v3); Specific Disease class data from the ‘rare_diseases_participant_disease’ table (Main Programme v13); ICD10 codes from the ‘hes_apc’ table (Main Programme v13); pedigree information from the ‘rare_diseases_pedigree_member’ table (Main Programme v13), ‘referral_participant’ table (GMS v3), and ‘pedigree’ table (Rare Diseases Pilot v3); explained/unexplained status of cases from the ‘gmc_exit_questionnaire’ tables (Main Programme v18, GMS v3). Accession codes for NBR data are given in ref.19. CADD v.1.5 (https://cadd.gs.washington.edu/), gnomAD v.3.0 (https://gnomad.broadinstitute.org/) and Ensembl v.104 (http://may2021.archive.ensembl.org/index.html) were used for variant annotation. Expression data for U2 snRNA homologs were extracted from the file ‘GTEx_Analysis_2017-06-05_v8_RSEMv1.3.0_transcript_expected_count.gct.gz’ available from the GTEx Portal. Data presented in this paper were requested from the Genomics England Airlock on August 13, 2024 at 3:39am British Summer Time (BST). The manuscript was submitted to the Genomics England Publication Committee on August 21, 2024 at 23:51 BST and approved for submission on August 27, 2024 at 15:52 BST.