PT - JOURNAL ARTICLE AU - Hiatt, Laurel AU - Weisburd, Ben AU - Dolzhenko, Egor AU - VanNoy, Grace E. AU - Kurtas, Edibe Nehir AU - Rehm, Heidi L. AU - Quinlan, Aaron AU - Dashnow, Harriet TI - STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci AID - 10.1101/2024.05.21.24307682 DP - 2024 Jan 01 TA - medRxiv PG - 2024.05.21.24307682 4099 - http://medrxiv.org/content/early/2024/05/21/2024.05.21.24307682.short 4100 - http://medrxiv.org/content/early/2024/05/21/2024.05.21.24307682.full AB - Approximately 3% of the human genome consists of repetitive elements called tandem repeats (TRs), which include short tandem repeats (STRs) of 1–6bp motifs and variable number tandem repeats (VNTRs) of 7+bp motifs. TR variants contribute to several dozen mono- and polygenic diseases but remain understudied and “enigmatic,” particularly relative to single nucleotide variants. It remains comparatively challenging to interpret the clinical significance of TR variants. Although existing resources provide portions of necessary data for interpretation at disease-associated loci, it is currently difficult or impossible to efficiently invoke the additional details critical to proper interpretation, such as motif pathogenicity, disease penetrance, and age of onset distributions. It is also often unclear how to apply population information to analyses.We present STRchive (S-T-archive, http://strchive.org/), a dynamic resource consolidating information on TR disease loci in humans from research literature, up-to-date clinical resources, and large-scale genomic databases, with the goal of streamlining TR variant interpretation at disease-associated loci. We apply STRchive —including pathogenic thresholds, motif classification, and clinical phenotypes—to a gnomAD cohort of ∼18.5k individuals genotyped at 60 disease-associated loci.Through detailed literature curation, we demonstrate that the majority of TR diseases affect children despite being thought of as adult diseases. Additionally, we show that pathogenic genotypes can be found within gnomAD which do not necessarily overlap with known disease prevalence, and leverage STRchive to interpret locus-specific findings therein. We apply a diagnostic blueprint empowered by STRchive to relevant clinical vignettes, highlighting possible pitfalls in TR variant interpretation. As a living resource, STRchive is maintained by experts, takes community contributions, and will evolve as understanding of TR diseases progresses.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive direct funding. HD is supported by 5K99HG012796-02. LH is supported by 1F30CA284847-01. HR, BW, and GV were supported by NHGRI grant U01HG011755.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:STRchive is available at http://strchive.org/, with comprehensive data, metadata, and processing scripts available at https://github.com/dashnowlab/STRchive. All scripts for manuscript data analysis and figure generation are available at https://github.com/dashnowlab/STRchive_manuscript; publicly available data used for analyses is also hosted on this GitHub. gnomAD tandem repeat data, including allele frequency distributions, per-sample genotypes, and other sample metadata, can be explored online at https://gnomad.broadinstitute.org/short-tandem-repeats?dataset=gnomad_r3 and is also available for download on the gnomAD website under 'v3 Downloads > Short Tandem Repeats':https://gnomad.broadinstitute.org/downloads#v3-short-tandem-repeats The long-read data from the Human Pangenome Reference Consortium is available from SRA project PRJNA701308 or https://humanpangenome.org/data/.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesSTRchive is licensed under a Creative Commons Attribution 4.0 International License. STRchive is available at http://strchive.org/, with comprehensive data, metadata, and processing scripts available at https://github.com/dashnowlab/STRchive. All scripts for manuscript data analysis and figure generation are available at https://github.com/dashnowlab/STRchive_manuscript; publicly available data used for analyses is also hosted on this GitHub. gnomAD tandem repeat data, including allele frequency distributions, per-sample genotypes, and other sample metadata, can be explored online at https://gnomad.broadinstitute.org/short-tandem-repeats?dataset=gnomad_r3 and is also available for download on the gnomAD website under “v3 Downloads > Short Tandem Repeats”:https://gnomad.broadinstitute.org/downloads#v3-short-tandem-repeats The long-read data from the Human Pangenome Reference Consortium is available from SRA project PRJNA701308 or https://humanpangenome.org/data/.