Abstract
Population allele frequency is crucially important for accurate interpretation of known and novel variants in medical genetics. Recently, several large allele frequency databases, such as Genome Aggregation Database (gnomAD), have been created to serve as a global reference for such studies. However, frequencies of many rare alleles vary dramatically between populations, and population-specific allele frequency is often more informative than the global one. Many countries and regions, including Russia, remain poorly studied from the genetic perspective. Here, we report the first successful attempt to integrate genetic information between major medical genetic laboratories in Russia. We construct an open, large-scale reference set of genetic variants by analyzing 7,492 exome samples collected in two major Russian cities of Moscow and St. Petersburg. An approximately tenfold increase in sample size compared to previous studies allowed us to identify genetically distinct clusters of individuals within an admixed population of Russia. We highlight 47 known pathogenic variants that are overrepresented in Russia compared to other European countries. We also identify several dozen high-impact variants that are present in healthy donors despite either being annotated as pathogenic in ClinVar or falling within genes associated with autosomal dominant disorders. The constructed database of genetic variant frequencies in Russia has been made available to the medical genetics community through a variant browser available at http://ruseq.ru.
Competing Interest Statement
Analysis of NGS data and the construction of the catalogue of allele frequencies was funded jointly by the Genetics and Reproductive Medicine Center GENETICO Ltd. and CerbaLab Ltd. according to the partnership agreement dated 25.10.2021. All rights to the data and resources generated during the work are reserved to the aforementioned parties.
Funding Statement
The work was supported by the Systems Biology Fellowship to Y.A.B. the Presidential Fellowship for Young Scientists (grant no. SP-4503.2021.4) to Y.A.B., and D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, project 558-2019-0012 (AAAA-A19119021290033-1) of FSBSI.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study protocol was approved by the ethics committee of the D.O. Ott Institute of Obstetrics, Gynecology, and Reproductology (protocol 114 from 14 December 2021) and conducted in accordance with the Declaration of Helsinki.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
The manuscript was updated to include the expanded cohort (7452 samples) from three laboratories, as opposed to two laboratories in the v1 of this preprint
Data Availability
Variant frequency level data produced in the present study are available via form submission available at ruseq.ru.