Balanced chromosomal rearrangements offer insights into coding and noncoding genomic features associated with developmental disorders ===================================================================================================================================== * Chelsea Lowther * Mana M. Mehrjouy * Ryan L. Collins * Mads C. Bak * Olga Dudchenko * Harrison Brand * Zirui Dong * Malene B. Rasmussen * Huiya Gu * David Weisz * Lusine Nazaryan-Petersen * Amanda S. Fjorder * Yuan Mang * Allan Lind-Thomsen * Juan M.M. Mendez * Xabier Calle * Anuja Chopra * Claus Hansen * Merete Bugge * Roeland V. Broekema * Teppo Varilo * Tiia Luukkonen * John Engelen * Angela M. Vianna-Morgante * Ana Carolina S. Fonseca * Juliana F. Mazzeu * Halinna Dornelles-Wawruk * Kikue T. Abe * Joris R. Vermeesch * Kris Van Den Bogaert * Carolina Sismani * Constantia Aristidou * Paola Evangelidou * Albert A. Schinzel * Damien Sanlaville * Caroline Schluth-Bolard * Vera M. Kalscheuer * Maren Wenzel * Hyung-Goo Kim * Katrin Õunap * Laura Roht * Susanna Midyan * Maria C. Bonaglia * Anna Lindstrand * Jesper Eisfeldt * Jesper Ottosson * Daniel Nilsson * Maria Pettersson * Elenice F. Bastos * Evica Rajcan-Separovic * Fatma Silan * Frenny J. Sheth * Antonio Novelli * Eirik Frengen * Madeleine Fannemel * Petter Strømme * Nadja Kokalj Vokač * Cornelia Daumer-Haas * Danilo Moretti-Ferreira * Deise Helena de Souza * Maria A. Ramos-Arroyo * Maria M. Igoa * Lyudmila Angelova * Peter M. Kroisel * Graciela del Rey * Társis A.P. Vieira * Suzanne Lewis * Wang Hao * Jana Drabova * Marketa Havlovicova * Miroslava Hancarova * Zdeněk Sedláček * Ida Vogel * Tina D. Hjortshøj * Rikke S. Møller * Zeynep Tümer * Christina Fagerberg * Lilian B. Ousager * Bitten Schönewolf-Greulich * Mathilde Lauridsen * Juliette Piard * Celine Pebrel-Richard * Sylvie Jaillard * Nadja Ehmke * Eunice G. Stefanou * Czakó Marta * Kosztolányi György * Ashwin Dalal * Usha R. Dutta * Rashmi Shukla * Fortunato Lonardo * Orsetta Zuffardi * Gunnar Houge * Doriana Misceo * Shahid M. Baig * Alina Midro * Natalia Wawrusiewicz-Kurylonek * Isabel M. Carreira * Joana B. Melo * Laura Rodriguez Martinez * Miriam Guitart * Lovisa Lovmar * Jacob Gullander * Kerstin B.M. Hansson * Cynthia de Almeida Esteves * Yassmine Akkari * Jacqueline R. Batanian * Xu Li * James Lespinasse * Asli Silahtaroglu * Christina Halgren Harding * Lotte Nylandsted Krogh * Juliet Taylor * Klaus Lehnert * Rosamund Hill * Russell G. Snell * Christopher A. Samson * Jessie C. Jacobsen * Brynn Levy * Ozden Altiok Clark * Asli Toylu * Banu Nur * Ercan Mihci * Kathryn O’Keefe * Kiana Mohajeri-Stickels * Ellen S. Wilch * Tammy Kammin * Raul E. Piña-Aguilar * Katarena Nalbandian * Sehime G. Temel * Sebnem Ozemri Sag * Burcu Turkgenc * Arveen Kamath * Adriana Ruiz-Herrera * Siddharth Banka * Samantha L.P. Schilit * Benjamin B. Currall * Naomi Yachelevich * Stephanie Galloway * Wendy K. Chung * Salmo Raskin * Idit Maya * Naama Orenstein * Nesia Kropach Gilad * Kayla R. Flamenbaum * Beverly N. Hay * Cynthia C. Morton * Eric Liao * Kwong Wai Choy * James F. Gusella * Peter Jacky * Erez Lieberman Aiden * International Breakpoint Mapping Consortium (IBMC) * Danish Cytogenetic Central Registry Study Group * Developmental Genome Anatomy Project (DGAP) * Iben Bache * Michael E. Talkowski * Niels Tommerup ## ABSTRACT Balanced chromosomal rearrangements (BCRs), including inversions, translocations, and insertions, reorganize large sections of the genome and contribute substantial risk for developmental disorders (DDs). However, the rarity and lack of systematic screening for BCRs in the population has precluded unbiased analyses of the genomic features and mechanisms associated with risk for DDs versus normal developmental outcomes. Here, we sequenced and analyzed 1,420 BCR breakpoints across 710 individuals, including 406 DD cases and the first large-scale collection of 304 control BCR carriers. We found that BCRs were not more likely to disrupt genes in DD cases than controls, but were seven-fold more likely to disrupt genes associated with dominant DDs (21.3% of cases vs. 3.4% of controls; *P =* 1.60×10−12). Moreover, BCRs that did not disrupt a known DD gene were significantly enriched for breakpoints that altered topologically associated domains (TADs) containing dominant DD genes in cases compared to controls (odds ratio [OR] = 1.43, *P =* 0.036). We discovered six TADs enriched for noncoding BCRs (false discovery rate < 0.1) that contained known DD genes (*MEF2C, FOXG1, SOX9, BCL11A, BCL11B*, and *SATB2*) and represent candidate pathogenic long-range positional effect (LRPE) loci. These six TADs were collectively disrupted in 7.4% of the DD cohort. Phased Hi-C analyses of five cases with noncoding BCR breakpoints localized to one of these putative LRPEs, the 5q14.3 TAD encompassing *MEF2C*, confirmed extensive disruption to local 3D chromatin structures and reduced frequency of contact between the *MEF2C* promoter and annotated enhancers. We further identified six genomic features enriched in TADs preferentially disrupted by noncoding BCRs in DD cases versus controls and used these features to build a model to predict TADs at risk for LRPEs across the genome. These results emphasize the potential impact of noncoding structural variants to cause LRPEs in unsolved DD cases, as well as the complex interaction of features associated with predicting three-dimensional chromatin structures intolerant to disruption. ## INTRODUCTION Balanced chromosomal rearrangements (BCRs), including translocations, insertions, and inversions, are a unique class of rare genomic variation that occur roughly five-fold more frequently in individuals with developmental disorders (DDs) than in the general population.1–6 Delineation of BCR breakpoints has long represented an approach to discover novel disease genes,7,8 and has been accelerated by innovative methods using whole genome sequencing (WGS) with long-inserts (liWGS) to capture BCR breakpoints.9–16 Our previous WGS analyses suggested that 26.6% of cytogenetically visible BCRs contribute to risk for DDs due to direct gene disruption.11 The observation, while substantial, also implies that alternative mechanisms of disease are likely to be mediating additional genetic risk for DDs due to BCRs. However, the lack of sufficient sample sizes, and the virtual absence of large cohorts of unaffected control BCR carriers with sequence-resolved breakpoints, have precluded a systematic evaluation of the full spectrum of pathogenic mechanisms associated with BCRs to date. Beyond the direct disruption of disease-associated protein-coding genes, emerging evidence has begun to implicate a small number of noncoding elements in the etiology of DDs. Recent analyses have emphasized the challenges with the statistically rigorous genome-wide discovery of rare and *de novo* noncoding regulatory risk variants, including their small average effect sizes, the lack of a cipher equivalent to trinucleotide codons for variant interpretation, and the large number of noncoding functional categories that could be tested.17–19 Nonetheless, there are examples of noncoding variants with strong regulatory consequences and considerable influence on risk for DDs,19–21 including long-range positional effects (LRPEs) that result from disruption of or topological associating domains (TADs)20,24–26 or long intergenic noncoding RNAs (lincRNAs).22,23 The disruption of TADs, which are megabase-sized regulatory domains of folded chromatin that contain most *cis*-regulatory interactions,27–30 can lead to loss of physical connections between enhancers and their target genes24,25 and/or the generation of ectopic enhancer-promoter contacts through a process known as “enhancer adoption”.31,32 While relatively few studies have systematically evaluated the contribution of rare noncoding variants to risk for disease, BCRs represent a unique class of highly penetrant genomic variation from which we might begin to understand the mechanisms of pathogenic noncoding variants in DDs given the outsized impact of BCRs on genome structure and function.11,32,33 In this study, we analyzed 1,420 BCR breakpoints from 710 unrelated individuals, including 406 DD cases as well as the first large-scale sequence-resolved cohort of 304 unaffected BCR carriers (*i*.*e*., controls), and evaluated a range of mechanisms by which BCRs may increase risk for DDs. Our analyses revealed a series of significant chromosomal, genic, and noncoding loci associated with DDs, as well as features that distinguished BCRs occurring in DD cases versus unaffected controls. In addition to refining DD risk estimates for BCRs directly disrupting dominant DD genes, we identified six TADs significantly associated with recurrent disruption by noncoding BCRs in DD cases, representing strong-effect LRPEs in DDs. Collectively, 7.4% of our DD cohort harbored a noncoding BCR breakpoint that disrupted one of these six TADs. We also defined a subset of core genomic features that, when considered together, can aid in the interpretation of DD-associated LRPEs throughout the genome. ## RESULTS ### International aggregation, sequencing, and genome-wide analyses of BCRs in 406 DD cases and 304 controls We have previously shown that chromosomal rearrangements that appear balanced at cytogenetic resolution can involve extensive complexity ranging from multiple cryptic breakpoints to balanced chromosomal shattering, or chromothripsis, at sequence resolution.11,15,34 Here, we focused analyses on the most interpretable classes of BCRs by aggregating a cohort of 710 unrelated individuals harboring a “simple” BCR (*i*.*e*., breakpoints at two genomic positions without significant imbalance or additional complexity) initially identified by cytogenetic methods and subsequently resolved using either short-insert or long-insert WGS (**Fig. 1A**; **Supplementary Fig. 1**; **Supplementary Methods**). This cohort included 406 cases diagnosed with a DD or congenital anomaly in which a BCR was confirmed to have arisen *de novo* or segregate with phenotype and 304 unaffected control adults with no early-onset pediatric phenotype (see **Fig. 1B** and **Supplementary Table 1** for complete descriptions). Over 55% of the BCR breakpoints have not been previously published. As a comparison for these 710 empirically-identified BCRs (*n =* 1,420 breakpoints; **Supplementary Table 2**), we also generated a set of 30,400 simulated BCRs under the null hypothesis that breakpoints should be randomly distributed throughout the genome. These simulated BCRs were randomly sampled *in silico* from the genome while matching properties of the 304 BCRs empirically identified in controls, including structural variant (SV) type, inversion size, as well as excluding N-masked regions known to be inaccessible to short read alignments (**Supplementary Methods**).35,36 ![Fig. 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/16/2022.02.15.22270795/F1.medium.gif) [Fig. 1](http://medrxiv.org/content/early/2022/02/16/2022.02.15.22270795/F1) Fig. 1 The properties of BCRs in the healthy human germline and DD cases (**A**) We mapped the breakpoints of 710 simple (*i*.*e*., two-breakpoint) BCRs that were originally detected with cytogenetic methods. Here, we provide the genome-wide BCR breakpoint density in 10Mb windows per chromosome. (**B**) These 710 BCRs were identified in the genomes of 406 individuals affected by DDs and 304 unaffected controls. For purposes of comparison, we also generated 30,400 synthetic BCRs *in silico* by resampling the distribution of control BCRs 100 times randomly from the genome. (**C**) BCR breakpoints were distributed across the chromosomes as expected between affected, control, and simulated subsets except for chromosome 14, which exhibited a significant enrichment of breakpoints in affected samples, and chromosome X, which was depleted of breakpoints in controls. All comparisons were Bonferroni-adjusted for 72 total tests. (**D**) Balanced translocations involving at least one sex chromosome were significantly depleted in control samples compared to either affected samples or simulated null expectations. (**E**) We conducted association tests per cytoband for BCR breakpoints in affected samples vs. simulated null expectations. After correcting for 862 independent tests, just one cytoband was significant at the Bonferroni-adjusted threshold of *P* ≤ 5.8×10−5): 17q24.3, which corresponds to a locus with well-described LRPEs in DDs involving *SOX9* and *KCNJ2*.24,25 (**F**) The distribution of autosomal BCR breakpoints in controls across a “meta-chromosome” arm (*i*.*e*., chromosome size-normalized position) was significantly different from those of affected samples or simulated null expectations, with controls exhibiting depletions near centromeres and enrichments very close to telomeres. (**G**) Loci corresponding to empirically identified autosome-autosome translocation breakpoints (*i*.*e*., those sequenced in affected or unaffected genomes) contacted each other in 3D within the nucleus of human embryonic stem cells 1.08-fold more frequently than expected vs. simulated null expectations. We first sought to understand the global patterns of BCRs throughout the genome by comparing the rates of breakpoints per chromosome between cases, controls, and simulations (**Supplementary Fig. 2**). We found that BCR frequency was approximately proportional to chromosome length with two exceptions: translocations were enriched on chromosome 14 in DD cases (*P =* 5.8×10−6 for cases vs. random simulations) and were depleted on chromosome X in controls (*P* = 1.4×10−6 for controls vs. random simulations) (**Fig. 1C** and **Supplementary Fig. 2**). The distribution of breakpoints across chromosome 14 did not appear to cluster in any particular location (**Supplementary Fig. 3**) and we did not find any features (*i*.*e*., compartment state, replication timing, recombination frequency, or gene disruption) that could account for the enrichment. In contrast, when we subset chromosome X, we observed that both cases and controls were 2.7-fold to 5.3-fold depleted for breakpoints on the q-arm (**Supplementary Fig. 2 and 3**). The Xq depletion in controls may be partly explained by the exclusion of males with oligo/ azoospermia and females with premature ovarian failure given their known association with X-autosome translocations.37,38 However, most of our DD cases were too young to be assessed for infertility, thus the Xq depletion in cases is unlikely to be related to a similar ascertainment bias. Overall, we identified that most (65%; 20/31) translocation breakpoints on chromosome X localized to the p-arm (length=58.6 Mb), which exhibited a 6.9-fold enrichment of case vs. control breakpoints (*P* = 0.002). Interestingly, 90.0% (18/20) of the Xp translocations identified in DD cases were found in females. Finally, controls were depleted for translocations involving either sex chromosome: just 2.1% of control translocations involved either chromosome X or Y, which was significantly less than translocations in DD cases (8.9%; *P* = 9.2×10−4) or randomly simulated BCR carriers (12.1%; *P* = 5.86×10−7) (**Fig. 1D**). Our assessments of BCR breakpoint distributions per chromosome led to two additional discoveries. First, a single cytoband on chromosome 17 was significantly enriched for BCRs in cases (*P* = 1.2×10−5 vs. random simulations) and surpassed a genome-wide significance threshold adjusted for all 862 cytobands tested across all chromosomes (**Fig. 1E**; **Supplementary Fig. 4**).39 This cytoband, 17q24.3, matches the location of a well-described pathogenic LRPE in DDs and congenital anomalies caused by SVs altering the local TAD organization and dysregulating *SOX9* and *KCNJ2*.24,25 Second, after transforming the position of each breakpoint into a percentile relative to the length of its corresponding chromosome arm (*i*.*e*., meta-chromosome), we found that translocation breakpoints in controls were biased towards the most distal ends of chromosomes (Kolmogorov-Smirnov test; *P* = 0.002 for control vs. simulation and *P* = 0.021 for control vs. cases; **Fig. 1F**; **Supplementary Fig. 5**). For example, control BCR breakpoints were roughly three-fold enriched within the terminal 2% of each chromosome arm: 2.7-fold vs. DD cases and 3.3-fold vs. random simulations. This might suggest that translocations occurring near telomeres–which do not rearrange most of the affected chromosome–are more likely to be tolerated in the general population without leading to severe disease. We also sought to identify genomic features that predispose to BCR formation by annotating all BCR breakpoints with features relating to chromosome maintenance (*e*.*g*., recombination rate, replication timing), chromatin accessibility, sequence context (*e*.*g*., repetitive elements, sequence homology), and three-dimensional (3D) nuclear organization (*e*.*g*., Hi-C contact frequency, nuclear compartment state).40 Most features showed no significant differences from expectations after correcting for multiple testing. One feature of note was that our empirically-observed translocations (*i*.*e*., those sequenced in DD cases and controls) were slightly more likely to form between pairs of chromosomes in close proximity to each other in 3D within the nucleus than predicted from simulated breakpoints that did not account for this biological organization (1.08-fold increase; *P* = 0.002; **Fig. 1G**). This result was true in a fetal lung fibroblast cell line (IMR90) and replicated in a second dataset derived from embryonic stem cells.40 These findings might suggest a weak influence on the formation of BCRs between chromosomal regions that co-localize within the nucleus, as suggested by analyses of tumor genomes and cytogenetic data from germline BCR carriers.41 ### BCRs in DD cases are strongly enriched for direct disruption of established disease genes Previous studies have demonstrated that BCRs confer substantial risk for DDs through direct disruption of haploinsufficient, developmentally critical genes.8,11,15 However, the absence of matched cohorts of unaffected control BCR carriers has historically hindered the quantification of disease risk contributed by gene-disruptive BCRs. Here, we annotated all BCR breakpoints for direct gene disruptions using Gencode v19 and compared the frequency of gene-disrupting autosomal BCRs between DD cases, controls, and random simulations.42 Most BCRs disrupted at least one protein-coding gene and there was no difference between cases and controls (68.1% of DD cases and 67.6% of controls) or expectations from random simulations (69.0% expected; **Fig. 2**). We further subdivided protein-coding genes into four tiers based on the evidence for their association with disease (**Supplementary Table 3**). Briefly, these included genes associated with dominant DDs (*i*.*e*., Tier 1, *n* = 812), genes associated with all other diseases (Tier 2, *n* = 3,129), mutationally constrained genes with no prior disease association (Tier 3; *n* = 1,257), and all remaining protein-coding genes (Tier 4; *n =* 15,188). We found a strong enrichment of cases with BCR breakpoints directly disrupting Tier 1 genes compared to controls (21.3% of cases versus 3.4% of controls; odds ratio [OR] = 7.45; 95% confidence interval [CI] = 3.74-16.50; Fisher’s exact test; *P =* 1.60×10−12) and compared to simulations (21.3% of cases vs. 6.6% of simulations; OR = 3.67; 95% CI = 2.80-4.76; *P =* 2.06×10−18), but not for Tiers 2-4. ![Fig. 2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/16/2022.02.15.22270795/F2.medium.gif) [Fig. 2](http://medrxiv.org/content/early/2022/02/16/2022.02.15.22270795/F2) Fig. 2 Risk conferred for DDs by BCRs disrupting genes and other transcribed loci We annotated all BCR breakpoints for predicted overlap with protein-coding and noncoding genes present in Gencode v19.14 Here, we further subset these BCRs based on the properties of the gene(s) disrupted at either breakpoint, including: any gene present in Gencode; protein-coding genes; “Tier 1” genes including those known to be associated with dominant DDs; “Tier 2” genes including all remaining disease-associated genes; “Tier 3” genes including all genes in the top decile of loss-of-function constraint47 but with no existing disease association; “Tier 4” genes including all remaining genes not captured in the preceding tiers; lincRNAs; all noncoding RNAs other than lincRNAs present in Gencode. For each subset of genes, we computed the rate of BCRs disrupting at least one qualifying gene between cases, controls, and simulated BCRs, and further computed the odds ratios of cases vs. controls and cases vs. simulated null expectations. Only BCRs disrupting Tier 1 genes were significantly enriched in cases vs. controls and cases vs. simulated BCRs after correcting for multiple comparisons. Motivated by the strong association between disruption of dominant DD genes and BCRs in cases, we systematically searched for genes that were recurrently disrupted by BCRs in cases beyond expectations by conducting association tests for each autosomal gene (**Supplementary Table 4**). These analyses identified four protein-coding genes disrupted in at least three independent DD cases and none in controls (**Table 1**). Among these, just one gene, *TCF4*,11,15 surpassed a strict exome-wide significance threshold (disrupted in 1.6% of DD cases vs. 0.01% of simulated BCRs; *P =* 4.3×10-10; OR = 149.5). Haploinsufficiency of *TCF4* is the dominant genetic cause of Pitt-Hopkins Syndrome and has been associated with autism spectrum disorder (ASD) and broadly defined neurodevelopmental disorders (NDDs).43,44 The three remaining protein-coding genes did not reach exome-wide significance but had suggestive (*P* ≤ 0.005) evidence of association with DDs in our analyses. These included two established DD genes (*AUTS2, MBD5*)45,46 and one candidate DD gene, *CDK6*, which was disrupted in three cases that presented with developmental delay (*n* = 3), speech delay (*n* = 2), microcephaly (*n* = 2), and cardiac defects (*n* = 1). *CDK6* is highly constrained against damaging point mutations,47 is ubiquitously expressed across tissues,48,49 and encodes a cyclin-dependent kinase with major roles in skin, blood, and breast cancers.50 *CDK6* has also been associated with a recessive form of primary microcephaly,51 but has not been previously associated with dominant germline disease. View this table: [Table 1](http://medrxiv.org/content/early/2022/02/16/2022.02.15.22270795/T1) Table 1 Genes recurrently disrupted by BCRs in DD cases A list of seven genes that are disrupted by BCRs from ≥3 DD cases and zero controls. The type (“biotype”), constraint information,47 P value (for case versus control and simulated breakpoint comparisons, respectively), and odds ratio (OR) for each gene are also shown. Only one gene, *TCF4*, met a strict exome-wide significance threshold. LOEUF, loss-of-function observed/expected upper bound fraction; Mis., missense; Pct, percent; BCR, balanced chromosomal rearrangement. ### Pathogenic positional effects from disruption of three-dimensional chromatin structures Our analyses identified an unambiguous, strong association between DDs and BCRs disrupting dominant DD genes; however, the majority of autosomal BCRs in DD cases did not disrupt a known disease gene (*n* = 289; 78.7%), and one-third (*n* = 113; 30.8%) did not disrupt any annotated protein-coding gene. We therefore considered three other models by which BCRs might confer DD risk through noncoding mechanisms based on: (i) their disruption of noncoding genes (*e*.*g*., lincRNAs), (ii) their linear distance from known disease genes, and (iii) the disruption of TADs containing known disease genes. We excluded all 78 (21.3%) cases and 10 (3.4%) controls with autosomal BCRs that directly disrupted a Tier 1 gene, which we reasoned would largely exclude the confounding influence of BCRs associated with pathogenic effects via direct gene disruption. We first tested whether direct disruption of noncoding genes could be responsible for pathogenic effects in DD cases and observed no difference in the fraction of cases versus controls that disrupted any subgroup of noncoding genes (**Fig. 2**). We next assessed whether pathogenic LRPEs could be predicted based on the absolute distance between disease genes and BCR breakpoints and observed no difference between DD cases and controls for proximity to any tier of genes (*e*.*g*., Tier 1 genes; Kolmogorov-Smirnov test; *P* = 0.159; **Fig. 3A-B**). These analyses confirm that linear distance to disease genes alone is insufficient to predict pathogenic LRPEs. However, when we tested the third model by comparing the fraction of cases to controls with a BCR breakpoint disrupting a TAD containing genes from each tier, we observed a significant effect for TADs containing Tier 1 genes (OR = 1.43; 95% CI = 1.68-3.18; Fisher’s exact test *P* = 0.033; **Fig. 3C-D**), supporting the role of 3D chromatin topology disruption in pathogenic LRPEs. Given these results, we next performed genome-wide analyses to define the TADs most strongly associated with DD phenotypes. ![Fig. 3](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/16/2022.02.15.22270795/F3.medium.gif) [Fig. 3](http://medrxiv.org/content/early/2022/02/16/2022.02.15.22270795/F3) Fig. 3 Disruption of TADs, and not proximity to known dominant DD genes, is predictive of LRPEs (**A-B**) Fraction of cases and controls with an autosomal BCR breakpoint in proximity to a Tier 1 dominant DD gene (**Supplementary Table 3**) when direct disruption of Tier 1 genes are included (**A**) and excluded (**B**). P value corresponds to a two-sample Kolmogorov-Smirnov test. (**C-D**) Fraction of cases compared to controls with a BCR breakpoint directly disrupting TADs40 containing each of the four gene tiers when direct disruption of genes within that gene tier are included (**C**) and excluded (**D**). We searched for specific TADs associated with risk for DDs by evaluating each autosomal TAD identified from a fetal lung fibroblast (IMR90) cell line40 for an enrichment of case BCRs against a Poisson null model fit to the distribution of control breakpoints. Overall, we identified 26 recurrently disrupted TADs with suggestive evidence for association with DDs based on a Benjamini-Hochberg false discovery rate (FDR) q ≤ 0.1 (**Fig. 4** and **Supplementary Table 5**). Five of these TADs surpassed a Bonferroni-adjusted genome-wide significance threshold of 2.2×10−5, including three known LRPE loci at *MEF2C, FOXG1*, and *SOX9* (**Fig. 5**).11,14,52–54 These five TADs also remained significant when we compared BCRs from DD cases against simulated breakpoints, suggesting that our models were not simply capturing rearrangement hotspots. Consistent with the finding that most TAD boundaries are tissue-invariant,28,29,40 tissue source had no impact on the genome-wide significant TADs (**Supplementary Fig. 6**). Given that we only removed cases and controls with a direct disruption of a Tier 1 gene, our TAD results represented a combination of true LRPEs, genic effects not previously associated with DDs, and other unknown mechanisms of disease etiology. For example, one of the genome-wide significant TADs was altered by three cases that all directly disrupted *CDK6*, a novel candidate DD gene identified from our exome-wide gene association analysis, suggesting that it likely represents a genic effect and not a LRPE. ![Fig. 4](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/16/2022.02.15.22270795/F4.medium.gif) [Fig. 4](http://medrxiv.org/content/early/2022/02/16/2022.02.15.22270795/F4) Fig. 4 Genome-wide enrichment of TADs disrupted by BCR breakpoints from DD cases (**A**) Genome-wide enrichment of BCR breakpoints in DD cases across 2,257 autosomal TADs.40 P values correspond to enrichments against a Poisson null model fit to the distribution of control breakpoints. The genome-wide significance threshold of 2.21×10−5 (denoted by the red line) was determined by correcting for the total number of autosomal TADs tested and the blue line represents the Benjamini-Hochberg FDR<0.1 cutoff. Known dominant DD genes (Tier 1) contained within each genome-wide significant TAD are reported in parentheses. The GM12878 Hi-C maps30 are shown for each of the six TADs significantly enriched for BCR breakpoints from DD cases: (**B**) 2p16.1 TAD containing *BCL11A*, (**C**) 5q14.3 TAD containing *MEF2C*, (**D**) 14q12 TAD containing *FOXG1* and *PRKD1*, (**E**) 14q32.2 TAD containing *BCL11B*, (**F**) 17q24.3 TAD containing *SOX9*, and (**G**) 2q33.1 TAD containing *SATB2*. The annotation tracks under the HiC maps include BCR breakpoints from DD cases and controls, Tier 1 DD genes (blue), all other protein-coding genes from Gencode v19 (gray),42 VISTA enhancers (pink),62 and UCEs (green).63 ![Fig. 5](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/16/2022.02.15.22270795/F5.medium.gif) [Fig. 5](http://medrxiv.org/content/early/2022/02/16/2022.02.15.22270795/F5) Fig. 5 Hi-C analysis of DD case with complex BCR disrupting the TAD containing MEF2C (**A**) The 3D resequencing pipeline starts by using Hi-C data to call short variants (SNPs and indels) against a haploid reference. In this paper we used the DRAGEN software,79 but similar results can be achieved with other publicly available variant callers such as GATK.83 We then use the Hi-C alignment data as generated by Juicer84 in conjunction with the 3D-DNA phaser to phase the variants and produce chromosome-length haploblocks. The phased variants enable the generation of molecule-specific contact maps, which in turn allow for molecule-specific annotation of SVs. Using the assembly tools from the 3D-DNA/Juicebox Assembly Tools ecosystem we then create assisted assemblies congruent with the annotated SVs and remap the contact data against the new reference to allow for phased diploid epigenetic analyses. (**B**) Molecule-specific Hi-C contact maps showing DNA-DNA interactions in the vicinity of the *MEF2C* promoter in LCLs derived from patient DGAP101: “normal” haplotype Hi-C data mapped to hg19 reference genome (left); haplotype with a chromothriptic chromosome 5 (middle, notice the numerous signal depletions along the diagonal corresponding to breaks and off-diagonal enrichments in the signal corresponding to fusion points); chromothriptic haplotype remapped against a reference that accounts for the chromotriptic rearrangements (right). The 1D tracks show the phased SNP density, highlight the syntenic regions between the three maps (rainbow colors are reserved for sequences in the vicinity of *MEF2C* in the “normal” reference, while hatching corresponds to sequences juxtaposed into the genomic segment of interest from elsewhere on chromosome 5 in the affected haplotype), as well as show the position of the promoter and 16 known enhancers61 in the ‘standard’ human reference as well as in the SV-corrected reference. (**C**) A dotplot of the whole chromosome 5 showing the correspondence between the affected and the normal molecules (100Kb synteny blocks are used, with direct synteny blocks colored red, and inverted blocks colored blue). The position of the *MEF2C* promoter is highlighted with dashed lines. To identify candidate pathogenic LRPE loci among the 26 TADs with q ≤ 0.1, we required that each TAD: (i) be disrupted by a noncoding BCR breakpoint (*e*.*g*., does not disrupt a protein-coding gene) in at least 50% of cases contributing to the signal, (ii) contain a Tier 1 gene representing a plausible target gene, and (iii) have multiple cases disrupting the same TAD that present with phenotypes that are frequently observed in cases with direct disruption of the candidate target gene (**Supplementary Table 6**). This resulted in six candidate pathogenic TADs containing the known DD genes *SATB2, MEF2C, FOXG1, SOX9, BCL11B*, and *BCL11A* (**Fig. 4**). Supporting our statistical enrichments, five of the six significant TADs have been previously associated with pathogenic LRPE loci (*SATB2, MEF2C, FOXG1, BCL11B* and *SOX9*) based on individual case reports,11,14,52,54-56 suggesting that we are accessing *bona fide* LRPE signals. The novel candidate LRPE at 2p16.1 contains *BCL11A*, which encodes a zinc finger protein involved in the BAF SWI/SNF chromatin remodeling complex, and has been previously associated with an intellectual disability syndrome.57 All four of the cases with noncoding BCRs disrupting the TAD containing *BCL11A* presented with DD or ASD.57 Overall, these six TADs were disrupted by 30 cases (7.4%) and one control (0.32%; Fisher’s exact test; OR = 21.2; 95% CI 4.5-500.1; *P =* 7.12×10−7), suggesting that they represent highly-penetrant LRPE loci. ### Hi-C analyses of individuals with noncoding BCRs disrupting the TAD containing MEF2C We previously implicated the 5q14.3 locus as a putative pathogenic LRPE with *MEF2C* as the target gene based on a statistically significant enrichment of noncoding BCR breakpoints that all disrupted the same TAD containing *MEF2C* and observed down-regulation of this gene in multiple cases harboring the noncoding BCRs.11 Based on these data, we hypothesized that the disruption of 3D topological organization could represent the underlying mechanism for this LRPE. To functionally validate this hypothesis, we generated high-throughput chromatin conformation capture (Hi-C) data from lymphoblastoid cell lines from five cases harboring BCRs disrupting the TAD containing *MEF2C* and developed a 3D resequencing workflow (see **Supplementary Methods** and **Fig. 5A**) to facilitate analysis of the resulting data. The goal of this workflow was to use Hi-C datasets to: (i) identify single nucleotide polymorphisms (SNPs), small insertions and deletions (indels), and SVs, (ii) phase these variants onto chromosome-length haploblocks, thereby reconstructing the end-to-end sequences of each molecule, and (ii) use the resulting diploid assembly to generate homolog-specific 3D contact maps. We applied this workflow to two simple (one inversion and one translocation) and three additional complex BCRs from prior studies with noncoding breakpoints disrupting the TAD containing *MEF2C*.11,14 Each rearrangement was genotypically distinct with different resultant derivative chromosomes. While complex BCRs were excluded from other aspects of this study, we included three in our Hi-C analysis because their impact on 3D genome organization has not been previously examined in a homolog-specific manner.59,60 Comparing the results of rearrangement detection using Hi-C to those using liWGS, we found that the 3D resequencing pipeline detected 92.7% (*n =* 51/55) of the breakpoints found by the combination of both methods. Three of the four breakpoints missed by Hi-C were short segments (<10kb) of DNA that had been rearranged and inserted into a new position. The missed breakpoints were visible in the Hi-C map but had not been identified by the computational analysis. Conversely, liWGS detected 96.4% (*n* = 53/55) of the Hi-C identified breakpoints, failing to detect a breakpoint associated with a short interval, as well as a 78kb deletion. Crucially, Hi-C was able to reliably order and orient the rearranged sequences on each homolog, even when a breakpoint was missed (**Supplementary Table 7, Supplementary Figs. 7A-E**). By contrast, it is challenging to reliably order and orient the rearranged sequencing using liWGS data alone if breakpoints are missed. Taken together, these results demonstrate that Hi-C can be used to robustly generate both homolog-specific sequences and architectural maps. In all five BCR cases we examined, 3D resequencing via Hi-C also revealed significantly altered 3D organization of the rearranged homolog (**Supplementary Figs. 7A-E**). Moreover, we observed reduced frequency of contact between the *MEF2C* promoter and 16 distal enhancers60 (**Supplementary Table 8**), consistent with the dysregulation of *MEF2C* expression observed in the same cases. In the majority of the cases, the reduction in contact frequency appeared to result from the BCR greatly increasing the distance between the promoter and its enhancers (**Supplementary Fig. 7**). However, in one case with chromothripsis (DGAP101), the BCR breakpoints only had a modest effect on the linear distance between the *MEF2C* promoter and its enhancers. Instead, the promoter and enhancers were separated into distinct architectural domains through the creation of a new boundary, which likely prevented physical contact between the promoter and enhancers (**Fig. 5B-C**). The observed 3D remodeling suggests a reduced frequency of contact between sequences that influence *MEF2C* expression, providing a plausible explanation for how a noncoding BCR breakpoint can result in a DD phenotype through disruption of 3D genome architecture. ### Genomic features predict TADs associated with pathogenic LRPEs Motivated by our discovery of multiple genome-wide significant LRPE loci in DDs, coupled with our validation of functional changes in *MEF2C* and alterations to the 3D organization associated with the putative 5q14.3 LRPE, we exploited this unique BCR dataset to identify additional features contributing to pathogenic TAD disruptions that could be used for future LRPE predictions. As described above, we demonstrated that the genic content of TADs alone is insufficient to unequivocally predict the pathogenicity of an individual BCR, as 43.0% of BCRs in controls disrupted a TAD containing a Tier 1 gene. We also found that 9.4% of all TAD boundaries encompassing a Tier 1 gene were overlapped by at least one large polymorphic deletion in the genome aggregation database (gnomAD),61 which excludes adults with a history of early onset developmental conditions. Thus, additional genomic features beyond the presence of disease-associated genes are required to predict TADs preferentially disrupted by BCRs in DD cases. To identify genomic features that characterize TADs intolerant to disruption, we annotated all autosomal TADs40 in the genome with 54 features that can be broadly grouped into five categories: genes, *cis*-regulatory elements, primary sequence conservation, repetitive elements, and ‘other’ (**Figs. 6A** and **Supplementary Methods**). We defined 45 “positive” training TADs (disrupted by ≥2 BCR cases and zero BCR controls) and 261 “negative” TADs (disrupted by ≥1 BCR control and no BCR cases) and performed a univariate logistic regression for each of the 54 features, which identified 26 features at a FDR<0.05 (**Supplementary Table 9**). Next, given that many genomic features are highly correlated (**Supplementary Fig. 8**), we trained an elastic net regression on the positive and negative training TADs that included all 26 features from the univariate analysis and identified six features that were individually associated with case status after controlling for the effects of all other features: VISTA enhancers,62 ultraconserved elements (UCEs),63 transposon-free elements,64 TAD size,40 the presence of at least one Tier 1 gene, and primary sequence conservation65,66 (**Fig. 6B**). ![Fig. 6](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/16/2022.02.15.22270795/F6.medium.gif) [Fig. 6](http://medrxiv.org/content/early/2022/02/16/2022.02.15.22270795/F6) Fig. 6 Identification of genomic features associated with pathogenic LRPEs (**A**) An overview of the framework used to generate a model to predict pathogenic LRPEs across the genome based on disruption of 3D genome architecture. (**B**) Features identified to be potentially associated with TADs preferentially disrupted by BCRs from DD cases based on a BH-FDR<0.05 from a univariate logistic regression performed for each feature, ordered by effect size. (**C**) Evaluation of the LRPE model using an independent set of 372 TADs. Performance of TADs overlapping other common functional annotations are shown as a comparison. (**D**) To determine a cut-off score that could be used to identify TADs that are especially intolerant to disruption, we compared the fraction of cases to controls and simulated breakpoints that disrupt a TAD in each percentile and identified the inflection point at which a case enrichment (OR>1.5) begins to emerge. We tested this model’s predictive accuracy on an independent set of 372 TADs (see **Supplementary Methods** for selection criteria) that were not used in our training data and determined that these six features alone were moderately predictive of LRPE pathogenicity (area under the receiver operating characteristics curve=0.633; **Fig. 6C**). We next defined a “LRPE pathogenicity” cutoff score of ≥0.43 (TADs ranked in the top 10th percentile from our model) based on the point in which case-control and case-simulation enrichments surpassed OR>1.5 (**Fig. 6D**). We demonstrated that 75.0% of previously established LRPE loci (**Supplementary Table 10**) in the human genome that were not represented in our training dataset surpassed this score, a highly significant enrichment when compared to all other TADs in the genome (Fisher’s exact test; *P =* 2.55×10−5). These orthogonal approaches collectively confirmed that this relatively simple six-feature regression model was able to prioritize TADs likely intolerant to disruption. While much larger cohorts will be required to power more sophisticated predictive models, these analyses demonstrate the potential of this approach to improve noncoding variant interpretation and shed greater light on the genomic features associated with noncoding mechanisms of disease. ## DISCUSSION It has been well-established that *de novo* BCRs are associated with increased risk of congenital anomalies and a broad range of DDs,2,4,6,10 yet little is known about the pathogenic mechanisms through which this risk occurs outside of direct gene disruption. Using a large, aggregated cohort of cytogenetically-defined simple BCRs from which we derived sequence-resolved breakpoints, including several hundred BCRs from unaffected controls, our analyses reveal new insights into the mechanisms through which BCRs confer risk for DDs. These data identified an enrichment of BCRs impacting chromosomes 14 and Xp in DD cases, the latter of which was predominantly driven by X-autosome translocations in females. A previous cytogenetic study observed an association between telomeric breakpoints on the X chromosome, particularly at Xp22 and Xq28, and DDs in females with X-autosome translocations.67 Our analyses confirm and broaden these results by suggesting that this enrichment extends beyond Xp22 and likely encompasses the entire p arm. We further demonstrated that there is a seven-fold enrichment of BCR breakpoints that directly disrupt known DD genes in affected cases compared to controls, but that roughly 79% of DD BCR carriers cannot be explained by the direct disruption of currently recognized disease genes. We discovered TADs containing known DD genes that were recurrently disrupted by noncoding BCRs in DD cases far more frequently than expected by chance, suggesting that additional risk for DDs from BCRs are mediated through noncoding mechanisms, which represents an enticing area for future investigations. These data implicate disruption of 3D chromatin domains as a mechanism likely mediating some of these noncoding effects. Our analyses identified six TADs enriched for BCR breakpoints in DD cases beyond what would be expected by chance. Notably, all three of the most significant TAD associations that exceeded genome-wide significant thresholds and matched previously established pathogenic LRPEs (*MEF2C, FOXG1*, and *SOX9*).11,52,54 The three additional LRPE loci are particularly compelling candidates given that they harbor well-known DD genes, *BCL11A, SATB2*, and *BCL11B*,56,57,68–70 and are disrupted by cases with phenotypes that match those seen in individuals with direct disruption of these DD genes. We note that, collectively, 7.4% of DD cases in this cohort harbor noncoding BCRs that disrupt these six LRPEs. This result is also certainly an underestimate of the pathogenic impact of disruptions of 3D topology as it is restricted to only the six most prominent LRPE loci identified in our dataset. Many decades of disease gene discovery have biased our findings considerably towards identifying and prioritizing pathogenic gene disruptions, whereas much less is known about the molecular mechanisms, pathogenic processes, and genomic features of pathogenic noncoding variation. We anticipate that the fraction of BCR carriers associated with identifiable pathogenic LRPEs will continue to increase as future studies increase sample size and further clarify the essential features necessary for *cis*-regulatory disruption of disease-associated genes. To investigate the potential underlying mechanism of LRPEs due to 3D topology disruption, we performed Hi-C analyses on five cases with noncoding BCRs that disrupted our top putative LRPE locus, the TAD containing *MEF2C*. While Hi-C has been performed on human cells from a small number of DD cases harboring SVs,58,59 none of these studies have explicitly isolated the impact of heterozygous SVs on 3D topology. This is largely due the fact that existing computational workflows do not effectively combine the phased *de novo* assembly of genomes with the discovery of SVs. Our 3D resequencing pipeline merged these tasks into a single workflow that was able to recapitulate the structure of each BCR detected by liWGS. In addition, the Hi-C analyses provided data on the 3D architecture at the 5q14.3 locus, revealing extensive disruption to the normal TAD structure of the region. The Hi-C analyses demonstrated that the 5q14.3 BCRs disrupted the physical contact between the *MEF2C* promoter and enhancers via two mechanisms, either by increasing the linear distance between them, or creating a new boundary that prevented the promoter from interacting with the enhancers. The creation of a novel boundary is consistent with a recent study that functionally dissected the TAD containing *SOX9*, one of our genome-wide significant TADs, and demonstrated that repositioning of the TAD boundary itself via inversion or insertion of novel CTCF sites created a new boundary that decoupled the *SOX9* promoter from its enhancers, resulting in downregulation of *SOX9* and abnormal phenotypic outcomes in mice.24 Additional Hi-C analyses of different SV classes at various loci will be critical for determining the full range of potential LRPE mechanisms, the relative prevalence of each, and what features predispose certain genomic regions to their pathogenic effects. Our study further illustrates the complexity of interpretation of TAD disruption in human disease. We find that the disruption of TADs containing DD genes by control BCRs usually does not result in an appreciable disease phenotype, nor does the deletion of boudaries from these TADs as identified in population controls.61 This result refutes the utility of interpretation approaches that simply seek to match TAD disruption with the presence of a DD gene within the domain, particularly when assessing DD phenotypes that can be plausibly linked to hundreds of dominant disease genes throughout the genome. To systematically identify additional genomic features that were predictive of TAD intolerance, we leveraged our BCR cohort to build a model that prioritized six genomic features independently associated with TADs preferentially disrupted by DD cases. In addition to DD-associated (Tier 1) genes we also identified TAD size as being positively correlated with risk for DDs, which is consistent with developmentally regulated genes having complex *cis*-regulatory landscapes that likely occupy greater genomic space.71 We discovered an enrichment of primary sequence conservation and UCEs in TADs disrupted by BCRs from DD cases, which aligns with the report of an enrichment of UCEs in the vicinity of SVs associated with NDDs.63 Despite this initial progress towards identifying features associated with TADs intolerant to disruption, this model lacks sufficient predictive power to discriminate individual BCRs identified in cases from controls. However, we anticipate that larger sample sizes and improved noncoding functional predictions will eventually power increasingly sophisticated statistical models to aid in the clinical interpretation of noncoding BCRs. These data, together with our Hi-C analyses from the 5q14.3 locus, suggest that LRPEs are likely to be modified by the type of structural rearrangement as well as the complex interplay of genomic features within the TAD. In conclusion, these data demonstrate that BCRs exert highly penetrant effects in DDs through both coding and noncoding mechanisms and that the disruption of 3D chromatin structures is associated with pathogenic LRPEs. We provide statistical evidence to support highly penetrant LRPEs at previously known and novel loci. Our feature selection analysis demonstrates that additional features within the TAD structure will also be critical for identifying novel LRPEs as well as provide insights into underlying molecular mechanisms through which this risk for disease occurs. The ongoing aggregation of population-scale datasets through international biobanks promises to further define the features associated with LRPEs and three-dimensional structures that are intolerant to disruption by SVs in the human genome. ## METHODS & SUPPLEMENTARY INFO Detailed methods and supplementary information for this manuscript have been provided in a separate document, which will be linked directly from *medRxiv*. ## Supporting information Supplementary Information [[supplements/270795_file03.pdf]](pending:yes) Supplementary Tables [[supplements/270795_file04.xlsx]](pending:yes) ## Data Availability All reported breakpoints and case phenotypes have been made available in Supplementary Tables 1 and 2. ## AUTHOR CONTRIBUTIONS **Study design:** C.L., M.M.M, R.L.C., M.C.B, O.D., H.B., C.C.M., E.L., J.F.G., P.J., E.L.A., I.B., M.E.T., and N.T. **Sample recruitment and data generation:** M.M.M, Z.D., M.B.R., L.N.P., A.S.F, Y.M., A.L.T., J.M.M.M., X.C., A.C., C.H., M.B., R.V.B., T.V., T.L., J.E., A.M.V.M., A.C.S.F., J.F.M., K.T.A., J.R.V., K.V.D.B., C.S., C.A., P.E., A.A.S., D.S., C.S.B., V.M.K., M.W., H.G.K., K.O., L.R., S.M., M.B., A.L.T., J.E., J.O., D.N., M.P., E.B., E.R.S., F.S., F.J.S., A.N., E.F., M.F., P.S., N.K.V., C.D.H., D.M.F., D.H.S., M.A.R.A., M.M.I., L.A., P.M.K., G.D.R., T.A.P.V., S.L., W.H, J.D., M.H., M.H., Z.S., I.V., T.D.H., R.S.M., C.F., L.B.O., B.S.G., M.L., J.P., C.P.R., S.J., N.E., E.S., C.M., K.G., A.D., U.R.D., R.S., F.L., O.Z., G.H., D.M., A.M., N.W.K., I.M.C., J.B.M., L.R.M., M.G., L.L., J.G., K.B.M.H., C.D.A.E., Y.A., J.R.B., X.L., J.L., A.S., C.H.H., L.N.K., J.T., K.L., R.H., R.G.S., C.A.S., J.C.J., B.L., A.T., B.N., E.M., O.A.C., E.S.W., T.K., R.E.P.A., S.G.T., A.K., A.R.H., S.B., S.L.P.S., N.Y., S.G., W.K.C., S.R., I.M., N.O., N.K.G., K.R.F., B.N.H., C.C.M., K.W.C., P.J., I.B., and N.T. **Individual case breakpoint identification:** M.M.M, M.C.B, C.L., M.B.R., H.B., Z.D., R.L.C., A.M.V.M., H.D.W., C.A., E.B., F.S., N.K.V., G.D.R., Z.T., L.B.O., C.A.S., J.C.J., K.N., S.G.T., S.O.S., B.T., K.W.C., P.J., I.B., S.C., S.C.T., A.Z., M.I.M., A.U., B.A. and N.T. **Aggregation data analysis and interpretation:** C.L., R.L.C., H.B., M.M.M, M.C.B, P.J., M.E.T., and N.T. **HiC data generation and/or analysis:** O.D., H.G., D.W., C.L., K.M.S., C.S.B., M.E.T., and E.L.A. **Generation of figures and writing of the manuscript:** C.L., R.L.C., M.M.M, M.C.B, O.D., H.B., H.G., K.M., E.L.A., M.E.T., and N.T. All authors revised the manuscript. ## COMPETING INTERESTS M.E.T. receives research funding and/or reagents from Levo Therapeutics, Microsoft Inc, and Illumina Inc. E.L.A. receives in-kind support from IBM and Illumina Inc. M.M.M. and A.C.S.F. are employees of Illumina Inc. A.S.F. is employed by HERAX. All other authors declare no competing interests. ## ACKNOWLEDGMENTS We thank the families and their clinicians for their participation in this study. This work was supported by grants from the The Danish National Research Foundation (WJC048 to N.T.), the Lundbeck Foundation (2007-1172; 2009-3999; 2010-6206; 2013-14290 to N.T.), the Danish Council for Independent Research (4183-00482B to N.T.), the National Institutes of Health (GM061354 to M.E.T, C.C.M, J.F.G., and E.L.; HD081256 to M.E.T.; MH115957 to M.E.T.; HD099547 to M.E.T.; HD091797 to M.E.T.; UM1HG009375 to E.L.A; RM1HG011016-01A1 to E.L.A.; HD090780 to S.L.P.S.; R00DE026824 to H.B.; GM007748 and DC012466 to B.B.C.; T32HG002295 to R.L.C.), the Simons Foundation for Autism Research (#573206 to M.E.T.), the National Science Foundation (NSF PHY-2019745 to E.L.A.; GRFP #2017240332 to R.L.C), Massachusetts General Hospital (Fund for Medical Discovery Fundamental Research Fellowship Award to C.L.), the Canadian Institutes of Health Research (Postdoctoral Fellowship to C.L.), the Welch Foundation (Q-1866 to E.L.A.), a McNair Medical Institute Scholar Award (to E.L.A.), a US-Israel Binational Science Foundation Award (2019276 to E.L.A.), the Behavioral Plasticity Research Institute (NSF DBI-2021795 to E.L.A.), the Investigator Grant Award Program (IGAP) BC Children’s Hospital Research Institute (to S.L.), the Czech Ministries of Health and Education (NU22-07-00165 and LM2018132 to Z.S.), the Estonian Research Council grant (PRG471 to K.Õ.), the Genome BC Grant (GR007838 to S.L.), the New Zealand eScience Infrastructure (to C.A.S.), the The IHC Foundation, Rutherford Discovery Fellowship administered by the Royal Society of New Zealand (to J.C.J.), the São Paulo Research Foundation (Fundação de Amparo à Pesquisa do Estado de São Paulo to A.C.S.F. and A.M.V.M.), the Council of Scientific & Industrial Research (to L.R.K.), the Fundamental Research Grant Scheme of the Malaysian Ministry of Higher Education (No. FRGS/1/2019/SKK08/UKM/02/9 to S.C.T.), the Polish National Centre of Science (No 2020/37/B/NZ5/00549 to M.K.), the funding support provided by Caroline Jones-Carrick and Collin Carrick (to H.G.K.), and the startup funding of the Qatar Biomedical Research Institute at Hamad Bin Khalifa University (to H.G.K.). We also thank Annemette Friis Mikkelsen and Bjarke Thomsen for their expert technical assistance; the Coriell Institute for Medical Research for providing biomaterials; and the Broad Institute Genomics Platform for generating a subset of the WGS data. * Received February 15, 2022. * Revision received February 15, 2022. * Accepted February 16, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## REFERENCES 1. 1.Jacobs, P. A., Melville, M., Ratcliffe, S., Keay, A. J. & Syme, J. A cytogenetic survey of 11,680 newborn infants. Ann. Hum. Genet. 37, 359–376 (1974). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1469-1809.1974.tb01843.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=4277977&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1974T031600001&link_type=ISI) 2. 2.Marshall, C. R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2007.12.009&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18252227&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000253223900019&link_type=ISI) 3. 3.Nielsen, J. & Wohlert, M. Chromosome abnormalities found among 34,910 newborn children: results from a 13-year incidence study in Arhus, Denmark. Hum. Genet. 87, 81–83 (1991). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF01213097&link_type=DOI) 4. 4.Warburton, D. Outcome of cases of de novo structural rearrangements diagnosed at amniocentesis. Prenat. Diagn. 4 Spec No, 69–80 (1984). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/pd.1970040706&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=6463034&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 5. 5.Funderburk, S. J., Spence, M. A. & Sparkes, R. S. Mental retardation associated with ‘balanced’ chromosome rearrangements. Am. J. Hum. Genet. 29, 136–141 (1977). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=848489&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1977DB50700002&link_type=ISI) 6. 6.Warburton, D. De novo balanced chromosome rearrangements and extra marker chromosomes identified at prenatal diagnosis: clinical significance and distribution of breakpoints. Am. J. Hum. Genet. 49, 995–1013 (1991). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1928105&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1991GM87300011&link_type=ISI) 7. 7.Tommerup, N. Mendelian cytogenetics. Chromosome rearrangements associated with mendelian disorders. J. Med. Genet. 30, 713–727 (1993). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo5OiJqbWVkZ2VuZXQiO3M6NToicmVzaWQiO3M6ODoiMzAvOS83MTMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMi8xNi8yMDIyLjAyLjE1LjIyMjcwNzk1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 8. 8.Fantes, J. A. et al. FISH mapping of de novo apparently balanced chromosome rearrangements identifies characteristics associated with phenotypic abnormality. Am. J. Hum. Genet. 82, 916–926 (2008). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2008.02.007&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18374296&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000255103900010&link_type=ISI) 9. 9.Chen, W. et al. Mapping translocation breakpoints by next-generation sequencing. Genome Res. 18, 1143–1149 (2008). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjk6IjE4LzcvMTE0MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAyLzE2LzIwMjIuMDIuMTUuMjIyNzA3OTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 10. 10.Halgren, C. et al. Risks and Recommendations in Prenatally Detected De Novo Balanced Chromosomal Rearrangements from Assessment of Long-Term Outcomes. Am. J. Hum. Genet. 102, 1090–1103 (2018). 11. 11.Redin, C. et al. The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies. Nat. Genet. 49, 36–45 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3720&link_type=DOI) 12. 12.Talkowski, M. E. et al. Next-generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research. Am. J. Hum. Genet. 88, 469–481 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2011.03.013&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21473983&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 13. 13.Schluth-Bolard, C. et al. Breakpoint mapping by next generation sequencing reveals causative gene disruption in patients carrying apparently balanced chromosome rearrangements with intellectual deficiency and/or congenital malformations. J. Med. Genet. 50, 144–150 (2013). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToiam1lZGdlbmV0IjtzOjU6InJlc2lkIjtzOjg6IjUwLzMvMTQ0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTYvMjAyMi4wMi4xNS4yMjI3MDc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 14. 14.Schluth-Bolard, C. et al. Whole genome paired-end sequencing elucidates functional and phenotypic consequences of balanced chromosomal rearrangement in patients with developmental disorders. J. Med. Genet. 56, 526–535 (2019). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToiam1lZGdlbmV0IjtzOjU6InJlc2lkIjtzOjg6IjU2LzgvNTI2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTYvMjAyMi4wMi4xNS4yMjI3MDc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 15. 15.Talkowski, M. E. et al. Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell 149, 525–537 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2012.03.028&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22521361&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 16. 16.Chen, W. et al. Breakpoint analysis of balanced chromosome rearrangements by next-generation paired-end sequencing. Eur. J. Hum. Genet. 18, 539–543 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ejhg.2009.211&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19953122&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000276955500008&link_type=ISI) 17. 17.Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0107-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29700473&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 18. 18.An, J. Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, (2018). 19. 19.Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature25983&link_type=DOI) 20. 20.Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/426833&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15549674&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000225643600002&link_type=ISI) 21. 21.Visel, A., Rubin, E. M. & Pennacchio, L. A. Genomic views of distant-acting enhancers. Nature 461, 199–205 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature08451&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19741700&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000269654600032&link_type=ISI) 22. 22.Allou, L. et al. Non-coding deletions identify Maenli lncRNA as a limb-specific En1 regulator. Nature 592, 93–98 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-021-03208-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 23. 23.Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrg3074&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22094949&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 24. 24.Despang, A. et al. Functional dissection of the Sox9-Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet. 51, 1263–1271 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0466-z&link_type=DOI) 25. 25.Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature19800&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27706140&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 26. 26.Lupianez, D. G., Spielmann, M. & Mundlos, S. Breaking TADs: How Alterations of Chromatin Domains Result in Disease. Trends Genet. 32, 225–237 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.tig.2016.01.003&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26862051&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 27. 27.Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0195-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30202056&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 28. 28.Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMjYvNTk1MC8yODkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMi8xNi8yMDIyLjAyLjE1LjIyMjcwNzk1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 29. 29.Schmitt, A. D. et al. A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. Cell Rep. 17, 2042–2059 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.celrep.2016.10.061&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27851967&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 30. 30.Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2014.11.021&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25497547&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000347922500019&link_type=ISI) 31. 31.Lupianez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2015.04.004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25959774&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 32. 32.Lettice, L. A. et al. Enhancer-adoption as a mechanism of human developmental disease. Hum. Mutat. 32, 1492–1499 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/humu.21615&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21948517&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 33. 33.Ibn-Salem, J. et al. Deletions of chromosomal regulatory boundaries are associated with congenital disease. Genome Biol. 15, 423 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-014-0423-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25315429&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 34. 34.Chiang, C. et al. Complex reorganization and predomnant non-homologous repair following chromosomal breakage in karyotypically balanced germline rearrangements and transgenic integration. Nat. Genet. 44, 390–7, S1 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2202&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22388000&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 35. 35.Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci. Rep. 9, 9354 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-019-45839-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31249361&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 36. 36.Navarro Gonzalez, J. et al. The UCSC Genome Browser database: 2021 update. Nucleic Acids Res. 49, D1046–D1057 (2021). 37. 37.Röpke, A. & Tüttelmann, F. MECHANISMS IN ENDOCRINOLOGY: Aberrations of the X chromosome as cause of male infertility. Eur. J. Endocrinol. 177, R249–R259 (2017). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiZWplIjtzOjU6InJlc2lkIjtzOjEwOiIxNzcvNS9SMjQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTYvMjAyMi4wMi4xNS4yMjI3MDc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 38. 38.Di-Battista, A., Moysés-Oliveira, M. & Melaragno, M. I. Genetics of premature ovarian insufficiency and the association with X-autosome translocations. Reproduction 160, R55–R64 (2020). 39. 39.Cheung, V. G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/35057192&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11237021&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 40. 40.Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376– 380 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature11082&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22495300&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000304099100044&link_type=ISI) 41. 41.Engreitz, J. M., Agarwala, V. & Mirny, L. A. Three-dimensional genome architecture influences partner selection for chromosomal translocations in human disease. PLoS One 7, e44196 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0044196&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23028501&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 42. 42.Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gky955&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30357393&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 43. 43.Zweier, C. et al. Haploinsufficiency of TCF4 causes syndromal mental retardation with intermittent hyperventilation (Pitt-Hopkins syndrome). Am. J. Hum. Genet. 80, 994–1001 (2007). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/515583&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17436255&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 44. 44.Fu, J. M. et al. Rare coding variation illuminates the allelic architecture, risk genes, cellular expression patterns, and phenotypic context of autism. medRxiv (2021) doi:10.1101/2021.12.20.21267194. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMS4xMi4yMC4yMTI2NzE5NHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTYvMjAyMi4wMi4xNS4yMjI3MDc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 45. 45.Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(14)61705-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25529582&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 46. 46.Kaplanis, J. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2832-5&link_type=DOI) 47. 47.Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434– 443 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2308-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32461654&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 48. 48.GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318– 1330 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjkvNjUwOS8xMzE4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTYvMjAyMi4wMi4xNS4yMjI3MDc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 49. 49.Costa, D. et al. Balanced and unbalanced translocations in a multicentric series of 2843 patients with chronic lymphocytic leukemia. Genes Chromosomes Cancer (2021) doi:10.1002/gcc.22994. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gcc.22994&link_type=DOI) 50. 50.Nebenfuehr, S., Kollmann, K. & Sexl, V. The role of CDK6 in cancer. Int. J. Cancer 147, 2988–2995 (2020). 51. 51.Hussain, M. S. et al. CDK6 associates with the centrosome during mitosis and is mutated in a large Pakistani family with primary microcephaly. Hum. Mol. Genet. 22, 5199–5214 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddt374&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23918663&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000327800400010&link_type=ISI) 52. 52.Mehrjouy, M. M. et al. Regulatory variants of FOXG1 in the context of its topological domain organisation. Eur. J. Hum. Genet. 26, 186–196 (2018). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 53. 53.Wunderle, V. M., Critcher, R., Hastie, N., Goodfellow, P. N. & Schedl, A. Deletion of long-range regulatory elements upstream of SOX9 causes campomelic dysplasia. Proc. Natl. Acad. Sci. U. S. A. 95, 10649–10654 (1998). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiOTUvMTgvMTA2NDkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMi8xNi8yMDIyLjAyLjE1LjIyMjcwNzk1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 54. 54.Wagner, T. et al. Autosomal sex reversal and campomelic dysplasia are caused by mutations in and around the SRY-related gene SOX9. Cell 79, 1111–1120 (1994). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0092-8674(94)90041-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8001137&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1994PY08600020&link_type=ISI) 55. 55.Rainger, J. K. et al. Disruption of SATB2 or its long-range cis-regulation by SOX9 causes a syndromic form of Pierre Robin sequence. Hum. Mol. Genet. 23, 2569–2579 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddt647&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24363063&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 56. 56.Lessel, D. et al. BCL11B mutations in patients affected by a neurodevelopmental disorder with reduced type 2 innate lymphoid cells. Brain 141, 2299–2311 (2018). 57. 57.Dias, C. et al. BCL11A Haploinsufficiency Causes an Intellectual Disability Syndrome and Dysregulates Transcription. Am. J. Hum. Genet. 99, 253–274 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2016.05.030&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27453576&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 58. 58.Middelkamp, S. et al. Molecular dissection of germline chromothripsis in a developmental context using patient-derived iPS cells. Genome Med. 9, 9 (2017). 59. 59.Melo, U. S. et al. Hi-C Identifies Complex Genomic Rearrangements and TAD-Shuffling in Developmental Diseases. Am. J. Hum. Genet. 106, 872–884 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2020.04.016&link_type=DOI) 60. 60.D’haene, E. et al. A neuronal enhancer network upstream of MEF2C is compromised in patients with Rett-like characteristics. Hum. Mol. Genet. 28, 818–827 (2019). 61. 61.Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2287-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32461652&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 62. 62.Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser--a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–92 (2007). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkl822&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17130149&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000243494600019&link_type=ISI) 63. 63.McCole, R. B. et al. Structural disruption of genomic regions containing ultraconserved elements is associated with neurodevelopmental phenotypes. bioRxiv 233197 (2017) doi:10.1101/233197. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czo4OiIyMzMxOTd2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAyLzE2LzIwMjIuMDIuMTUuMjIyNzA3OTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 64. 64.Simons, C., Makunin, I. V., Pheasant, M. & Mattick, J. S. Maintenance of transposon-free regions throughout vertebrate evolution. BMC Genomics 8, 470 (2007). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2164-8-470&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18093339&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 65. 65.Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjIwLzEvMTEwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTYvMjAyMi4wMi4xNS4yMjI3MDc5NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 66. 66.Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1001025&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21152010&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 67. 67.Schmidt, M. & Du Sart, D. Functional disomies of the X chromosome influence the cell selection and hence the X inactivation pattern in females with balanced X-autosome translocations: A review of 122 cases. American Journal of Medical Genetics vol. 42 161–169 (1992). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ajmg.1320420205&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1733164&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1992GY98900004&link_type=ISI) 68. 68.Punwani, D. et al. Multisystem Anomalies in Severe Combined Immunodeficiency with Mutant BCL11B. N. Engl. J. Med. 375, 2165–2176 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1509164&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27959755&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 69. 69.Prasad, M. et al. BCL11B-related disorder in two canadian children: Expanding the clinical phenotype. Eur. J. Med. Genet. 63, 104007 (2020). 70. 70.Zarate, Y. A. & Fish, J. L. SATB2-associated syndrome: Mechanisms, phenotype, and practical recommendations. Am. J. Med. Genet. A 173, 327–337 (2017). 71. 71.Grubert, F. et al. Landscape of cohesin-mediated chromatin loops in the human genome. Nature 583, 737–743 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2151-x&link_type=DOI) 72. 72.Hanscom, C. & Talkowski, M. Design of large-insert jumping libraries for structural variant detection using Illumina sequencing. Curr. Protoc. Hum. Genet. 80, 7 22 1–9 (2014). 73. 73.Collins, R. L. et al. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome. Genome Biol. 18, 36 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-017-1158-6&link_type=DOI) 74. 74.Gilling, M. et al. Breakpoint cloning and haplotype analysis indicate a single origin of the common Inv(10)(p11.2q21.2) mutation among northern Europeans. Am. J. Hum. Genet. 78, 878–883 (2006). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/503632&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16642442&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000236756200013&link_type=ISI) 75. 75.Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq033&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20110278&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000275243500019&link_type=ISI) 76. 76.Turro, E. et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583, 96–102 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2434-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32581362&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 77. 77.Hoencamp, C. et al. 3D genomics across the tree of life reveals condensin II as a determinant of architecture type. Science 372, 984–989 (2021). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzIvNjU0NS85ODQiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMi8xNi8yMDIyLjAyLjE1LjIyMjcwNzk1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 78. 78.Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjExOiIzNTYvNjMzMy85MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAyLzE2LzIwMjIuMDIuMTUuMjIyNzA3OTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 79. 79.Dudchenko, O. et al. The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv 254797 (2018) doi:10.1101/254797. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czo4OiIyNTQ3OTd2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAyLzE2LzIwMjIuMDIuMTUuMjIyNzA3OTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 80. 80.Miller, N. A. et al. A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases. Genome Med. 7, 100 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13073-015-0221-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26419432&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 81. 81.Gürsoy, G. et al. Data Sanitization to Reduce Private Information Leakage from Functional Genomics. Cell 183, 905– 917.e16 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2020.09.036&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33186529&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 82. 82. Roadmap Epigenomics, Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature14248&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25693563&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F16%2F2022.02.15.22270795.atom) 83. 83.Knight, P. A. & Ruiz, D. A fast algorithm for matrix balanci g. IMA J. Numer. Anal. 33, 1029–1047 (2012). 84. 84.Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv 201178 (2018) doi:10.1101/201178. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czo4OiIyMDExNzh2MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAyLzE2LzIwMjIuMDIuMTUuMjIyNzA3OTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 85. 85.Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cels.2016.07.002&link_type=DOI)