RT Journal Article SR Electronic T1 Variability in proteoglycan biosynthetic genes reveals new facets of heparan sulfates diversity. A systematic review and analysis JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.04.18.22273971 DO 10.1101/2022.04.18.22273971 A1 Ouidja, Mohand Ouidir A1 Biard, Denis S.F. A1 Chantepie, Sandrine A1 Laffray, Xavier A1 Douaron, Gael Le A1 Huynh, Minh-Bao A1 Rebergue, Nicolas A1 Maïza, Auriane A1 Rubio, Karla A1 González-Velasco, Oscar A1 Barreto, Guillermo A1 De Las Rivas, Javier A1 Papy-Garcia, Dulce YR 2022 UL http://medrxiv.org/content/early/2022/04/25/2022.04.18.22273971.abstract AB Proteoglycans are complex macromolecules formed of glycosaminoglycan chains covalently linked to core proteins through a linker tetrasaccharide common to heparan sulfate proteoglycans (HSPG) and chondroitin sulfate proteoglycans (CSPG). Biosynthesis of a single proteoglycan requires the expression of dozens of genes, which together create the large structural and functional diversity reflected by the numerous diseases or syndromes associated to their genetic variability. Among proteoglycans, HSPG are the most structurally and functionally complex. To decrease this complexity, we retrieved and linked information on pathogenic variants, polymorphism, expression, and literature databases for 50 genes involved in the biosynthesis of HSPG core proteins, heparan sulfate (HS) chains, and their linker tetrasaccharide. This resulted in a new gene organization and biosynthetic pathway representation in which the phenotypic continuum of disorders as linkeropathies and other pathologies could be predictable. Moreover, ubiquitous NDST1, GLCE, HS2ST1, and HS6ST1 appeared to generate ubiquitous heparan sulfate (HS) sequences essential for normal development and homeostasis, whereas the tissue restricted NDST2-4, HS6ST2-3, and HS3ST1-6 appeared to generate specialized HS sequences mainly involved in responsiveness to stimuli. Supported by data on genetic polymorphism and clinical variants, we afford a new vision of HSPG involvement in homeostasis, disease, vulnerability to disease, and behavioral disorders.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work has received funding from the ANR SkelGAG and from the European Union's Horizon 2020 Research and Innovation Program (grant agreement No 737390).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:For this study we used ONLY openly available human data that were originally located at : 1) the ClinVar archive (database ClinVar: https://www.ncbi.nlm.nih.gov/clinvar/intro/), which gives information on correlations between genetic variations and overt phenotypes or health status with history of interpretations, 2) the Orphanet archive (Database Orphanet: https://www.orpha.net/), which considers clinical presentation based on published scientific articles and expert reviews, and 3) the dbSNP archive (database: https://www.ncbi.nlm.nih.gov/snp/), which is a free public archive for genetic variation within and across different species. For clinical variants, only monogenic variants (indel, deletions, duplications, insertions, and single nucleotide) were considered. Clinical significance was further confirmed in web platforms including Online Mendelian Inheritance in Man (OMIM) (https://www.omim.org/) and the Database of Genomic Variants Archive (DGVa database: https://www.ebi.ac.uk/dgva/). For transcript analysis we used ONLY openly available human data that were originally located at : 1) the RNA-seq 32Uhlen project (database 32Uhlen: http://www.proteinatlas.org/humanproteome), which analyzed 32 different tissues from 122 human individuals, 2) the RNA-Seq CAGE (Cap Analysis of Gene Expression) in the RIKEN FANTOM5 project (database FANTOM5: http://fantom.gsc.riken.jp/data/), which analyzed several healthy adult human tissues, and 3) the ENCODE strand-specific RNA-seq of 13 human tissues from Michael Snyder's lab (Database ENCODE: https://www.encodeproject.org/). All these RNA-seq databases are considered in the expression atlas (http://www.ebi.ac.uk/gxa). The original raw and processed data files can be found in the ArrayExpress platform (https://www.ebi.ac.uk/arrayexpress/). I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present work are contained in the manuscript