ABSTRACT
Background Rare copy number variants (CNVs) are pathogenic for neurodevelopmental disorders (NDDs) and effect neurocognitive impairment. In aggregate, NDD CNVs may present in up to 2% of population cohorts with implications for neuropsychiatric disease risk and cognitive health. However, analyses of NDD CNVs in biobanks or population cohorts have been hindered by limited clinical or cognitive phenotypes, and a lack of ancestral diversity. In the current proof-of-concept study, NDD CNV carriers were recontacted from BioMe, a multi-ancestry biobank derived from the Mount Sinai healthcare system, to enable ‘deep phenotyping’ beyond electronic health record outcomes.
Methods From BioMe biobank, 892 adult participants were recontacted, including 335 harboring NDD CNVs, 217 with schizophrenia and 340 neurotypical controls as comparators. Clinical and cognitive assessments were administered to each recruited participant.
Results Seventy-three participants completed study assessments (mean age=48.8 years; 66% female; 36% African, 26% European, 34% Hispanic), or 8% of the recontacted subset, including 30 NDD CNV carriers across 15 loci. Among NDD CNV carriers, assessments indicated 40% with mood and anxiety disorders, 30% with learning disorders, and 13% with a history of special education. NDD CNV carriers were significantly cognitively impaired compared to controls on digit span backwards (Beta=-1.76, FDR=0.04) and digit span sequencing (Beta=-2.01, FDR=0.04).
Conclusions Feasibility of “recall-by-genotype” from a multi-ancestry biobank was established for NDD CNV carriers, along with comparator groups. The current study corroborated past reports of NDD CNVs effects of cognitive impairment, while elucidating clinical phenotypes for recalled individuals. Future “recall-by-genotype” studies may further facilitate clinical characterization of disease-relevant genomic variants.
INTRODUCTION
Rare copy number variants (CNVs), genomic microdeletions or microduplications greater than one kilobase, are known to be pathogenic for multiple neurodevelopmental disorders (NDDs), including schizophrenia, autism spectrum disorder, intellectual disability, and developmental delay.1–5 NDD CNVs are variable in penetrance and expressivity, ranging from unaffected to severely affected, with reports indicating that genome-wide polygenic risk and CNV burden may influence penetrance.6–8 Overall, pathogenic NDD CNVs likely underlay up to 20% of ASD cases, 14% of cases of developmental delay and intellectual disability, and a smaller fraction, 1-3%, of schizophrenia cases.9–13
NDD CNVs are also known to effect neurocognition, a quantitative trait, that may independently result in functional impairments and mediate neuropsychiatric disease risk.9, 14 Variable neurocognitive impairments of NDD CNV carriers have been reported in clinically-ascertained cohorts, including for 22q11.2 deletion underlying velocardiofacial syndrome, perhaps the most investigated NDD CNV.14, 15 Yet, population-cohort and biobank studies have demonstrated that NDD CNVs also effect neurocognition in neurotypical, healthy controls.16–18 Within an Icelandic population cohort, 167 controls harboring neuropsychiatric CNVs were impaired in cognitive performance across neurocognitive domains, intermediary between controls without CNVs and individuals with idiopathic schizophrenia.16 Effect size varied by CNV locus, and six NDD CNV loci were associated with verbal or performance IQ with large effects (0.73-3.51SD). A more recent UK biobank analysis of ∼150,000 participants queried the effect of NDD CNVs at 53 loci on seven cognitive tests, finding impairment across cognitive domain, with a reduction from 0.1 to 0.5SD compared to non-NDD CNV carriers, varying by CNV locus.17 In a more recent analysis of ∼420,000 UK biobank participants without a neurodevelopmental disorder, 24 of 33 NDD CNVs were associated with reduced performance on at least one of seven cognitive tests, including processing speed, general intelligence, working memory and executive function.18
Though individual NDD CNVs are rare, their combined prevalence may be up to 2% of population cohorts.17–21 Therefore, in populations unselected for neuropsychiatric disorders, studies of neurocognitive effects of NDD CNVs may have important public health implications for cognitive health, including early cognitive interventions, and may elucidate factors mediating neuropsychiatric disease risk. However, previous studies of neurocognition of NDD CNVs in research biobanks derived from healthcare systems or population cohorts have several limitations. First, many studies include individuals of mostly European ancestry, precluding generalizability and limiting elucidation of cognitive effects in diverse ancestries.16–18 Second, some neurocognitive domains have not been assessed, such as social cognition, which encompasses mentalizing, cognitive empathy, emotional perception and processing, often impaired in multiple NDDs, contributing to unique variance in functional outcomes.22, 23 Third, many biobanks or population cohorts do not contain neurocognitive data, and the feasibility of recontacting individuals for secondary assessments has not been clearly established.
A recent analysis of the BioMe biobank, a multi-ancestry biobank derived from the Mount Sinai healthcare system, with genotype data linked to electronic health records (EHR) of primarily adults with a mean age of 50 years, reported 2.5% prevalence of NDD CNVs among approximately 25,000 participants.24 For NDD CNV carriers, enrichments for congenital disorders and major depressive disorder were identified, as well as associations with obesity and increased body mass index. However, the previous report was limited to EHR outcomes, so some phenotypes relevant to NDD CNVs were not reported, including neurocognition, diagnoses from structured scales, and retrospective developmental histories. Therefore, in the current study, adult NDD CNV carriers were recalled from the BioMe biobank, by implementing a novel, ‘genotype-first’ recruitment strategy, for expanded ‘deep phenotyping’. The clinical and neurocognitive assessments of NDD CNV carriers were compared to individuals with idiopathic schizophrenia and neurotypical controls, each without NDD CNVs, also recalled from the biobank and included as comparators, as per previous reports.16–18
METHODS AND MATERIALS
IRB Approval
The BioMe Biobank is an EHR-linked biobank of ∼55,000 participants derived from the Mount Sinai Health System (New York, NY), with recruitment from clinics across various specialties since 2007, predominantly outpatient medical clinics, as approved by the Institutional Review Board (IRB) of the Mount Sinai School of Medicine.24, 25 As per initial IRB-approval, BioMe participants consented to be ‘recontacted’ for future research studies. For the current study, the IRB approved BioMe data access and chart review, and recontact of participants for study assessments, but without return of genetic information nor neurocognitive results. The current study was conducted from November 12, 2020, until March 23, 2023.
Study Recruitment
Within the BioMe biobank, three groups of adults, ages 18-65 were identified for recontact with the following criteria: (i) NDD-CNVs: Harboring NDD CNVs.24 (ii) Schizophrenia: At least two documented International Classification of Diseases (ICD10) codes for schizophrenia or schizoaffective disorder, and without NDD CNVs. (iii) Neurotypical controls: Without ICD-10 code for neuropsychiatric nor neurodevelopmental disorders and without NDD CNVs. An upper age limit of age 65 was applied to minimize age-related cognitive decline.
Biobank participants in each group were contacted by mail, email or phone (Supplementary Methods). Individuals who responded were screened for study eligibility, with three exclusion criteria: 1) active, unstable, severe medical illness for which there is no current medical care; 2) substance use disorder or alcohol use disorder, active in the past 6 months; 3) a known medical or neurological condition that could affect cognition. Prior to study assessment, chart review of medical and psychiatric history was performed (by board-certified psychiatrist, RB), to confirm schizophrenia or control status, and to phenotype NDD CNV carriers, whether unaffected or affected by neuropsychiatric disorder, as per accessible EHR records (Supplementary Methods).
Clinical and Cognitive Assessments
Due to the COVID-19 pandemic, study assessments were performed remotely, rather than in-person (Supplementary Methods). A clinical assessment included the Mini International Neuropsychiatric Interview (MINI v 7.0.2) brief structured diagnostic interview for DSM-V, as well as a medical, psychiatric, and retrospective developmental history (administered by one board-certified psychiatrist, RB).26 A cognitive assessment (administered by NZ) included multiple cognitive domains (Table 1, Supplementary Methods). The reliability and validity of each cognitive test as well as evidence to support remote administration has been previously demonstrated for use in schizophrenia spectrum disorders and in other populations (Supplementary Methods).27–34
Statistical Analyses
Cognitive raw scores were used as outcomes for linear regression analyses and group comparisons, with age, sex and ancestry included as covariates. All analyses were performed using R, version 4.0.4. Statistical significance was reported using the Benjamini-Hochberg false discovery rate (FDR) for multiple testing.
RESULTS
Study Recruitment
From the BioMe biobank, 892 participants were recontacted for study participation (Figure 1): 335 individuals harboring NDD CNVs, 217 individuals with schizophrenia, and 340 neurotypical controls. There was an initial response rate of 18% (15% of NDD CNV carriers, 20% of individuals with schizophrenia, 19% of controls), a yield of 160 participants. However, 53 individuals were not screened for eligibility, due to subsequent lack of response or an expressed lack of interest, including 26 from the control group. Of 107 individuals who were screened for eligibility, 13% were identified to be ineligible, due to active substance use, or illness that could confound cognitive measures, with the highest rate of exclusion within the schizophrenia group. A total of 93 individuals were consented, of which an additional 20 were excluded after clinical assessment, 18 due to group diagnostic misclassification (study assessment did not corroborate chart review). Thus, 73 individuals, or 8% of the overall recontacted subset, were retained and completed the study, including 30 NDD CNV carriers (9% of recontacted), 20 individuals with schizophrenia (9.2% of recontacted), and 23 neurotypical controls (6.8% of recontacted).
Utilization of the healthcare system did not bias recruitment, an average of 125 encounters for the study sample, compared to an average of 133 encounters for others recontacted (t=0.30, p=0.76). Proximity to the most recent clinical encounter correlated with successful recruitment, as enrolled study participants were seen on average 12 weeks prior to recontact, compared to the recontacted subset not enrolled, who had a clinical encounter on average 57 weeks prior to recontact (t=7.1, p=1.1×10-10). A female skew in the overall biobank (62% female) increased for the recontacted NDD CNV carriers (65% female) and further, for the 30 NDD CNV carriers recruited (87% female) (Table 2, Supplementary Table 1). There was an ancestry skew in the those recontacted for the current study compared to the overall biobank, within the schizophrenia group, relatively increased in African (47%) and decreased in European (8%) ancestries, and the control group, relatively decreased in European (15%) and increased in Hispanic (40%) (Supplementary Table 1).
Of 73 participants who completed study assessments, with a mean age of 48.8 years, there was no significant difference in age across the three comparator groups (p=0.58) (Table 2, Supplementary Table 1). In contrast to the NDD CNV group (87% female), the schizophrenia group was mostly male (45% female), resulting in a significant difference in sex across the groups (p=0.003). Self-reported ancestry (overall 36% African, 26% European, 34% Hispanic) did not differ significantly across groups overall (p=0.55) or by ancestry group, African (p=0.68), European (p=0.16) and Hispanic (p=0.49).
Study Sample, NDD-CNV Status
Among 30 NDD CNV carriers, 16 harbored duplications and 14 harbored deletions, across 15 unique NDD CNV loci (Supplementary Table 2). Some NDD CNV loci were enriched in the subset that completed assessments compared to the overall biobank including for example, TAR duplication, 15q11.2 deletion and 16p13.11 deletion (Supplementary Table 2). None of the participants with schizophrenia nor neurotypical controls harbored NDD CNVs.
Clinical Assessments
(Table 3, Supplementary Table 3): Of 30 NDD CNV carriers, twelve had a mood or anxiety disorder upon initial chart review. The MINI scale did not corroborate EHR-derived diagnoses for three individuals but identified mood or anxiety disorders in an alternative three individuals unreported in EHRs. Therefore overall, 40% of NDD CNV carriers had a mood or anxiety disorder, as validated by MINI, including five with major depressive disorder and two with obsessive compulsive disorder (Supplementary Table 3). One TAR duplication carrier had bipolar disorder with psychotic features, but there were no other cases of psychosis nor schizophrenia among the NDD CNV carriers. One 16p11.2 distal deletion carrier had a history of seizure disorder. Childhood speech delay warranting speech therapy was reported in three NDD CNV carriers, and one 1q21.1 deletion carrier reported global developmental delay including motor skills of walking. Further, formally diagnosed learning disorders were reported by approximately 30% of NDD CNV carriers, and a history of special education was reported for 13%, from elementary school through high school, across multiple years and multiple subject areas.
For 20 individuals with schizophrenia, MINI scale administration confirmed a psychotic disorder. One participant with schizophrenia reported a history of seizure disorder. Fifty percent reported a history of special education and an additional 10% reported extreme difficulty in school, but no diagnosed learning disorder. Twenty percent reported speech therapy during childhood, and one individual reported developmental delay of motor skills, ’walking late’. For 23 neurotypical controls, MINI scale corroborated absence of major psychiatric disorders. None had a history of seizure disorder. There was no history of developmental delay, nor history of speech therapy during childhood.
Overall, the highest level of education differed across group, schizophrenia with the lowest mean of 11.9 years, NDD CNV carriers intermediate with 14.5 years, and controls with the highest of 15.3 years (F-statistic=6.24, p=0.003) with pairwise significant differences between schizophrenia and NDD CNVs (p=0.02), and schizophrenia and controls (p=0.003), but not between NDD CNVs and controls (p=0.68). As for occupational status, within the NDD CNV and control groups, each participant reported a history of employment without disability, however 20% within the schizophrenia group reported disability and lack of employment. All NDD CNV carriers and controls reported the ability to live independently, but the schizophrenia group had a lower functional status, with 20% living in supervised residences or unable to live independently.
The study also elucidated medical phenotypes qualitatively, by EHR review and clinical interview (Supplementary Table 4), in contrast to a previous ICD-code based analysis.24 Some medical phenotypes known to be significant from previous biobank and case reports were observed: obesity (TAR duplication and 16p11.2 distal deletion), hypertension (16p13.11 duplication and 16p11.2 deletion), and neuropathy (17p12 (HNPP) deletion), as well as cardiac anomaly (22q12. deletion). Three NDD CNV carriers reported congenital anomalies, two cardiac anomalies (22q11.2 deletion and 2q13 (NPHP1) deletion) and one renal anomaly (15q11.2 deletion), none within schizophrenia and controls. No participants in the current study were of short stature.
Neurocognitive Assessments
To maximize power, NDD CNV carriers were collapsed across loci, into one group. Each of seven cognitive tests was regressed against group while adjusting for age, sex, and ancestry. There were no significant differences between NDD CNV carriers and controls (when aggregating across all 15 NDD CNV loci), though for each of seven tests, NDD CNV carriers trended in intermediate directionality, impaired compared to controls and higher performing than schizophrenia (Figure 2A). The schizophrenia group performed significantly worse in cognitive performance compared to the NDD CNV group on the digit span forward (Beta=-2.04, FDR=0.03), digit span backwards (Beta=-1.96, FDR=0.03), and HVLT-R, Total Recall (Beta=-5.04, FDR=0.01) (Figure 2A, Supplementary Table 5). The schizophrenia group also performed significantly worse than controls on digit span forward (Beta=-3.10, FDR= 4.56×10-4), digit span backwards (Beta=-3.24, FDR= 2.79×10-4), digit span sequencing (Beta=-3.07, FDR= 4.56×10-4), and HVLT-R Total Recall (Beta=-6.12, FDR= 4.56×10-4), (Supplementary Table 5). The performance of each NDD CNV carrier across seven cognitive domains is summarily ranked (Supplementary Table 6, Supplementary Figure 1), with varying cognitive performance by locus, notably the 22q11.2 deletion carrier among the worst performing of the NDD CNV group and 1q21.1 duplication the highest performing.
In an alternative analysis, the NDD CNV group was subset to loci included in previously-reported UK Biobank (UKBB) analyses of NDD CNVs, thereby excluding four loci (15q13.3 (CHRNA7) duplication, 17p12 deletion, and 2q13 NPHP1) deletion/duplication), resulting in 22 NDD CNV carriers of 11 unique loci (Supplementary Table 7).17, 18 Cognitive analyses of this more stringent subset of 22 NDD CNV carriers again found intermediate performance, with controls performing significantly higher than NDD CNVs on two tests: digit span backwards (Beta=1.76, FDR=0.04) and digit span sequencing (Beta=2.01, FDR=0.04), and schizophrenia significantly impaired compared to NDD CNVs on HVLT-R Total Immediate Recall (Beta=-4.5, FDR=0.05) (Figure 2B, Supplementary Table 5).
In the current sample, ancestry was correlated with social cognition (r2=0.36, p=6.5×10−7) and sex correlated with both social cognition (r2=0.12, p=0.003) and verbal learning (r2=0.11, p=0.004) (Supplementary Figure 2). Including education as an additional covariate in the analyses, which is correlated with group status as well, reduced the significance of the between group differences for each cognitive test, while maintaining the trend in directionality of between group effects (Supplementary Table 8).
DISCUSSION
The current report describes a novel, genotype-first, targeted recruitment of NDD CNV carriers recalled from BioMe, a multi-ancestry, healthcare-system derived biobank, to enable ‘deep phenotyping’ beyond EHR outcomes. Notably to date, the advent of research biobanks, with genotype data linked to EHRs, have yielded numerous in silico genomic and phenotypic analyses, as well as the development of novel analytic methods; in contrast, however, “recall-by-genotype” studies have been exceedingly rare despite their potential importance in elucidating genomic risk factors.35–38 As research biobanks (from healthcare-systems and population cohorts) continue to proliferate and increase in scale, opportunities for “recall-by-genotype” will become increasingly apparent across disease and variant class, including neuropsychiatry, though feasibility has not yet been established. In the current, proof-of-concept “recall-by-genotype” study, 15% of recontacted adult NDD CNV carriers responded to recruitment, as well as 20% of recontacted individuals with schizophrenia and 19% of recontacted controls. Within the NDD CNV group, more females than males responded to recontact, furthering a sex bias in biobank participation. Subsequent dropout and study exclusions reduced the study sample to 8% of the initially recontacted subset, including 9% of the initially contacted NDD CNV group. The recontact rates were higher than predicted, given that the healthcare system from which the biobank is derived is not contained, that is biobank participants may receive care from other healthcare systems or relocate geographically after biobank enrollment.
To date, the few published biobank “recall-by-genotype” reports have focused on pathogenic variants for diseases for which actionable therapeutic or prophylactic options may be available, such as for American College of Medical Genetics and Genomics (ACMG) secondary or other findings.39 For example, 21 Estonia biobank participants who harbor rare, deleterious variants pathogenic for familial hypercholesterolemia within LDLR, APOB, or PCSK9 genes, were recontacted for counseling, intervention with statin treatment and longitudinal follow-up.40, 41 In another study germane to the current report, the prevalence and penetrance of 31 CNVs pathogenic for neuropsychiatric disorders were analyzed in the Geisinger MyCode biobank, among 90,595 individuals, mostly adults of European ancestry. Though neurocognitive outcomes were not within the EHRs, 66.4% of CNV carriers having at least one EHR code for a neuropsychiatric disorder, including mood and anxiety disorders.20 The genetic findings of CNVs for nine loci were returned to a subset of 141 Geisinger MyCode participants, with overall positive reactions reported to the disclosure of this genetic information. Less common however, such as herein described, are “recall-by-genotype” studies for secondary research, without immediate actionability or specific return of results. Notably the current pilot study did not return CNV results to adult participants, in contrast to the Geisinger MyCode CNV study, as per local IRB protocol. Additional studies are needed to further benchmark “recall-by-genotype”, to further elucidate feasibility in recontact, including limitations, as well as alternative approaches for genetic disclosure. For example, future “recall-by-genotype” study designs may recall participants for longitudinal follow-up studies, rather than cross-sectional, as herein described, or recontact family members for further query of genomic variants of interest.
The clinical phenotypes of 30 NDD CNV carriers across 15 unique CNV loci was under-powered for statistical associations (as previously reported for ICD codes for NDD CNV carriers in this biobank), but rather, permitted an in-depth, qualitative evaluation. Overall, 70% of the 30 NDD-CNV carriers harbored at least one neuropsychiatric or developmental phenotype, including 40% with mood or anxiety disorders, 13% with speech or motor developmental delays, 27% with learning disorders, and 13% with a history of special education. There was one case of bipolar disorder with psychosis, but no other psychotic disorders identified among the NDD CNV carriers, even for carriers of the CNV loci reported to confer schizophrenia risk at especially high effect (i.e. 1q21.1 deletion, 16p11.2distal_deletion, 22q11.2 deletion). Overall these findings corroborate past reports of variable penetrance of NDD CNVs, and also highlights the potential role of NDD CNVs in mood and anxiety disorders, suggested in some past reports, but with less well-established evidence than for neurodevelopmental disorders.9, 42, 43 Interestingly, half of the mood or anxiety disorders identified during the study assessment were discordant with EHRs, and furthermore EHRs did not contain the developmental history elicited. This suggests potential shortcomings of solely relying on EHR-data for characterizing genetic variants within biobanks and indicates a role for recall of individuals for supplemental clinical evaluations.
Corroborating past reports, cognitive analyses indicated that NDD CNV carriers were significantly impaired compared to neurotypical controls on tests of executive function and working memory (digit span backwards and digit span sequencing), but higher performing than individuals with schizophrenia on immediate recall (verbal learning).16–18 As a validation of study measures, individuals with schizophrenia were significantly impaired compared to neurotypical controls on tests of working memory and executive function, attention and immediate recall, concordant with reports of individuals with schizophrenia performing (on average one standard deviation) below controls.16, 44 The cognitive analyses of a multi-ancestry cohort was more diverse than many past reports of NDD CNV carriers of predominantly European ancestry.16–18 The study also tested social cognition (not included in previous UKBB analyses), albeit no group differences were detected. A modest correlation of social cognition with ancestry was observed. Education level differed across groups, as expected, a confounding variable associated with group status and cognitive outcome. Comparatively, previously reported UKBB analyses did not include years of education as a covariate in analyzing cognitive effects of NDD CNVs.17, 18
This study had several limitations. The clinical assessments were qualitative, under-powered for statistical associations, or for evaluation of NDD CNV pathogenicity. While the cognitive analyses yielded some significant between group differences, other differences may not have been detected due to sample size. Sex and ancestry were included as covariates in pooled analyses, but the study was not powered for sex or ancestry stratified analyses. Remote assessments were conducted due to the COVID pandemic, so future recall studies may incorporate in-person assessments including physical or neurological exams, and further phenotyping. Population normative data for cognitive tests were derived from in-person testing. Some of the clinical assessments relied on participant self-report, subject to recall bias. The study excluded some conditions known to result in cognitive impairment, but did not control for other potential confounders, such as psychotropic medication burden. Cognitive analyses included a subset of known, recurrent NDD CNV loci, but overall genome-wide CNV burden, rare non-recurrent CNVs was not considered. Cognitive analyses were not stratified by CNV locus, and included individuals both affected and unaffected by neuropsychiatric and developmental phenotypes.
Overall, the current study demonstrated feasibility of “recall-by-genotype” from a healthcare system derived biobank, for individuals harboring NDD CNVs of rare frequency, as well as individuals from comparator groups, schizophrenia and controls, while identifying some factors influencing recruitment, such as time lapse since most recent healthcare encounter. The proof-of-concept, recall study supplements past biobank and other reports of NDD CNVs, by expanding the range of phenotypes assessed, including retrospective developmental history and cognitive data, in a multi-ancestry context. Interestingly, the current “deep phenotyping” identified clinical features for some participants that were discordant with EHR-derived diagnoses. Cognitive results corroborated past reports, and with NDD CNVs across multiple loci effecting impairment of executive function and working memory. Future studies may further implement “recall-by-genotype” strategies in sequenced cohorts to identify individuals harboring NDD CNVs or other pathogenic variants, for more thorough clinical characterization and follow-up research opportunities.
Data Availability
The de-identified clinical data summarized herein was obtained from biobank participants in a research study, but is not publicly available, as per IRB-approval and guidance.
DISCLOSURES
The authors report no biomedical financial interests or potential conflicts of interest.
ACKNOWLEDGEMENTS
The study was supported by K23MH112955 (PI: Birnbaum). Dr. Birnbaum is also supported by R21MH137536. Dr. Mahjani is supported by a grant from the Beatrice and Samuel A. Seaver Foundation.
We thank individuals within the BioMe Biobank for their participation. We thank colleagues from the Institute of Personalized Medicine at the Icahn School of Medicine for facilitating recall of the BioMe biobank participants: Amanda Merkelson, Sheryl Cruz and Alanna Gomez. We are grateful Dr. Zhongyang Zhang for his perusal of the manuscript and comments.