Abstract
Genome-wide assessment of genetic variation is becoming routine in human genetics, but functional interpretation of non-coding variants both in common and rare diseases remains extremely challenging. Here, we employed the massively parallel reporter assay ChIP- STARR-seq to functionally annotate the activity of >145 thousand non-coding regulatory elements (NCREs) in human neural stem cells, modelling early brain development. Highly active NCREs show increased sequence constraint and harbour de novo variants in individuals affected by neurodevelopmental disorders. They are enriched for transcription factor (TF) motifs including YY1 and p53 family members and for primate-specific transposable elements, providing insights on gene regulatory mechanisms in NSCs. Examining episomal NCRE activity of the same sequences in human embryonic stem cells identified cell type differential activity and primed NCREs, accompanied by a rewiring of the epigenome landscape. Leveraging the experimentally measured NCRE activity and nucleotide composition of the assessed sequences, we built BRAIN-MAGNET, a functionally validated convolutional neural network that predicts NCRE activity based on DNA sequence composition and identifies functionally relevant nucleotides required for NCRE function. The application of BRAIN-MAGNET allows fine-mapping of GWAS loci identified for common neurological traits and prioritizing of possible disease-causing rare non-coding variants in currently genetically unexplained individuals with neurogenetic disorders, including those from the Genomics England 100,000 Genomes project, identifying novel enhanceropathies. We foresee that this NCRE atlas and BRAIN-MAGNET will help reduce missing heritability in human genetics by limiting the search space for functionally relevant non-coding genetic variation.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
RD was supported by a China Scholarship Council (CSC) PhD Fellowship (201906300026 to RD) for her PhD studies at the Erasmus Medical Center, Rotterdam, The Netherlands. KL was supported by a ZonMw PSIDER Doorbraken grant (grant 10250042110005), a Brain and Behavior Research Foundation Young Investigator award (grant 30787) and a NWO Veni grant (grant 501100003246). JP was supported by the Clinician Scientist program PRECISE.net funded by the Else Kroener-Fresenius-Stiftung and by the intramural TUEFF program (3049-0-0). TBH was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation 418081722, 433158657), and the European Commission (Recon4IMD GAP 101080997). GR was supported by the ZonMw Veni grant 1936320. Part of this research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK, and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. Some of the analysis involved external data generated by the ENCODE and Roadmap projects, that received funding from the National Institutes of Health (NIH) (grants U01ES017166, U54HG004570, U41HG006992 and U01ES017155). The Barakat lab was supported by the Netherlands Organisation for Scientific Research (ZonMw Veni, grant 91617021; ZonMw Vidi, grant 09150172110002), a NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation, an Erasmus MC Fellowship 2017, and Erasmus MC Human Disease Model Award 2018, and acknowledges other ongoing support for rare disease research from Stichting 12q, EpilepsieNL, CURE Epilepsy, Spastic Paraplegia Foundation, Inc and the Sophia Research Foundation (Stichting Sophia Kinderziekenhuis Fonds). Funding bodies did not have any influence on study design, results, and data interpretation or final manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee of Erasmus MC University Medical Center gave ethical approval for this work
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵† current affiliation: Institute of Biophysics, CNR, Trento, Italy
-author affiliations updated -additional analysis added -two additional co-authors added