Abstract
Parkinson’s disease (PD) etiology is associated with genetic and environmental factors that lead to a loss of dopaminergic neurons. However, the functional interpretation of PD-associated risk variants and how other midbrain cells contribute to this neurodegenerative process are poorly understood. Here, we profiled >41,000 single-nuclei transcriptomes of postmortem midbrain tissue from 6 idiopathic PD (IPD) patients and 5 matched controls. We show that PD-risk variants are associated with glia- and neuron-specific gene expression patterns. Furthermore, Microglia and astrocytes presented IPD-specific cell proliferation and dysregulation of genes related to unfolded protein response and cytokine signalling. IPD-microglia revealed a specific pro-inflammatory trajectory. Finally, we discovered a neuronal cell cluster exclusively present in IPD midbrains characterized by CADPS2 overexpression and a high proportion of cycling cells. We conclude that elevated CADPS2 expression is specific to dysfunctional dopaminergic neurons, which have lost their dopaminergic identity and unsuccessful attempt to re-enter the cell cycle.
Introduction
Parkinson’s disease (PD) is a neurological disorder that is characterized by a progressive loss of neuromelanin-containing dopaminergic neurons (DaNs) in the substantia nigra (SN)1,2. Age, genetic, and environmental factors contribute to PD pathogenesis, but disease pathology and etiology remain mostly unknown3. Approximately 95% of PD patients do not harbor an interpretable genetic cause; therefore, they are classified as idiopathic PD (IPD)4. This lack of knowledge about the molecular mechanisms underlying neurodegeneration in PD represents a major challenge for developing effective therapies5.
Our current understanding of IPD cellular and molecular pathophysiology largely relies on experimental models that lack adequate representation of the disease complexity. For instance, toxin-induced animal models neither capture the human brain’s nature nor the multifactorial aspect of the disease6. Also, IPD-patient derived pluripotent stem cell (iPSC) models recapitulate molecular mechanisms underlying IPD pathogenesis but lack the cellular composition dynamics found in the human brain. Moreover, several transcriptomic studies using human postmortem midbrain tissue have investigated the transcriptional programs disrupted in IPD. Most of these studies used bulk RNA-seq approaches, which fail to dissect cell-type-specific contributions to disease pathology7. The recent development of single-cell sequencing technologies offers the possibility to overcome these challenges. In particular, transcriptional profiling of single cells (scRNA-seq) or nuclei (snRNA-seq) has proved to be an effective strategy to obtain a global view of disease-associated changes at an unprecedented resolution8.
In this study, we performed snRNA-seq of postmortem adult human midbrain tissue of IPD patients and age-matched healthy controls to obtain an unbiased and global view of the cell-type composition and cellular phenotype of IPD at single-cell resolution.
Results
We sampled adult human postmortem midbrain from five IPD cases, for which pathology reports describe a severe neuronal loss in the SN and no family history of PD (table S1). We confirmed the PD cases’ idiopathic nature by SNP-Chip profiling of 179.467 known variants associated with neurological diseases, including PD9, which did not reveal a genetic etiology (table S2). We sampled six control midbrain to match the IPD patient characteristics. The average age of IPD patients and healthy control individuals were ∼77 and ∼81 years, respectively, and both groups had similar postmortem intervals, IPD ∼22, and controls ∼16 hours (table S1).
We sequenced single nuclei from frozen ventral sections of human postmortem midbrains (Fig. 1A). Using six to eight sections of each individual’s left hemi-midbrain, we prepared single nuclei suspension by scraping off the tissue from the glass slides and using a modified version of the standard 10X Genomics nuclei isolation protocol (see methods). Single-nuclei barcoded cDNA library was prepared using the 10X Genomics Chromium System and sequenced on the NovaSeq 6000 Illumina platform following the manufacturer recommended protocols. We obtained ∼2000-6000 high-quality nuclei per sample with an average of ∼7,600 transcripts and ∼2,700 genes per nucleus after filtering out poorly sequenced nuclei and potential duplets (Fig. 1B). They comprise 22,433 and 19,002 single nuclei from control individuals and IPD patients, respectively (Fig. 1C).
We embedded the 41,435 nuclei transcriptomes into two dimensions using the uniform manifold approximation and projection (UMAP) algorithm. UMAP plots are shown before and after batch correction (fig S1A, B). We found that this overall cluster structure was mostly driven by cell-type identity and the inter-sample variability. Interestingly, patient and control cells segregated together within the main clusters (fig. S1B). To account for this inter-individual variation for cell-type identification, we followed the Seurat3 sample integration protocol (see methods) (Fig. 1D). Using this corrected PCA embedding and the unsupervised and network-based Louvain clustering approach, we found that the human midbrain comprises 12 major cell types (Fig. 1D, E, fig S1 C).
The human midbrain is composed of glial, neuronal, vascular, and microglia cells (Fig. 1D, E). We annotated most cell clusters by manually comparing well-known marker genes in the literature and the identified marker genes of each cell cluster (fig. S1C, table S3). Oligodendrocytes, the most abundant cell-type in the midbrain (Fig. 1F), are characterized by the expression of MOBP10, oligodendrocyte precursors cells (OPCs) highly express VCAN11. Expression of AQP4 is characteristic of astrocytes12 and FOXJ1 of ependymal cells13 (Fig. 1E-G, fig. S1C). Also, immune and vascular cells displayed a highly specific expression of well-known marker genes. CD74 in microglia14, CLDN5 in endothelial cells15,16, and PDGFRB in pericytes17 (Fig. 1E-G, fig. S1C). Regarding neuronal cells, we identified four cell-types, excitatory (SLC17A6)18, inhibitory (GAD2)18, GABAergic (GAD2/GRIK1)19,20 and, dopaminergic neurons (TH)21 (Fig. 1E-G, fig. S1C). We also discovered a neuronal cluster of 120 cells, which we could not annotate, characterized by high expression of CADPS2 (CADPS2high cells) (Fig. 1E-F, fig. S1C). These cells almost exclusively originated from IPD patients (IPD, 98.4%; control, 1.6%) (Fig. 1G).
To gain mechanistic insight into genetic risk factors for PD, we evaluated if PD-associated variants are enriched in genomic regions with genes whose expression pattern is cell-type-specific. We found that PD risk variants are associated with microglia and neuronal-specific genes (dopaminergic, excitatory, and inhibitory) (Fig. 1H, table S4). They are less associated with astrocytes and OPCs cell-type-specific genes (Fig. 1H, table S4). We prioritized the cell-type-specific and PD-risk associated genes based on their enrichment contribution for each cell-type. For instance, we found that LRRK2 showed the highest association with microglia (Fig. 1I, table S5) and SNCA was the most prominent PD-associated gene in DaNs (Fig. 1J). These findings are in line with previous reports, which identified that PD-associated mutations in alpha-synuclein promote Lewy bodies formation in DaNs22 and supports the role of LRRK2 mutations in the activation of microglia in PD23. These results, therefore, constitute a valuable resource to prioritize PD-associated variants functionally.
To investigate the cell-type composition changes of the midbrain associated with IPD, we followed three approaches. We compared IPD and control cell density distributions in the two-dimensional UMAP representation (Fig. 2A-B) and compared the IPD and control distributions of the cell-type proportions per sample (Fig. 2C, 2F, 4A, and S3). Altogether, these results revealed an increase in the fraction of microglia and astrocytes and a decreased fraction of oligodendrocytes in IPD midbrains compared to controls (Fig. 2, S3, and table S6). Performing multi-labeling immunofluorescence analysis24, we confirmed the increased fraction of microglia and astrocytes in IPD midbrains. We examined paraformaldehyde-fixed paraffin-embedded sections from the right hemi-midbrain of the age and gender-matched pair C3 and IPD3 (cp. table S1). We labeled microglia and astrocytes with antibodies against their marker proteins IBA1 and GFAP, respectively (Fig. 2D, G). Automated image analysis confirmed an increase in microglia and astrocyte areas as well as in IBA1 and GFAP protein abundance in IPD3 compared to C3 (Fig. 2E, H). Interestingly, the microglia increased fraction was higher in the SN compared to other midbrain regions (Fig. 2E upper panel; fig. S2 and table S7). Moreover, image analysis of the microglia cellular shape (fig. S2D) revealed an IPD-related decrease in microglial branching exclusively in the SN, indicating cellular activation25 (Fig. 2E lower panel and table S8). We also confirmed the reduced fraction of oligodendrocytes in IPD (fig. S3, table S6) by quantitative immunofluorescence analysis targeting the PLP1 protein (Fig. S4). We detected a significant reduction of the PLP1-positive area and PLP1-intensity in the IPD3 midbrain sub-areas SN, and T/T compared to C3 (fig. S4, table S7). In contrast, the other midbrain cell types, OPCs, pericytes, ependymal, excitatory, inhibitory, and GABAergic cells, did not display significant differences associated with IPD (fig. S3). To our surprise, also, no loss in the number of DaNs was evident in the IPD samples, which is thought to be a critical factor in the pathogenesis of PD (fig. S3).
A closer look at the number of nuclei profiled indicated that DaNs only comprised 0.18% of the total cell count impeding a reliable comparison between IPD patients and controls (Fig. 1F). Therefore, we performed quantitative immunofluorescence imaging analysis of neuromelanin-positive areas in IPD3 and C3 and could confirm a 45.9% reduction in neuromelanin-positive nigral DaNs in IPD patients compared to controls (fig. S5A; table S9). Moreover, in our study, 15 µm tissue sections, rather than tissue blocks, were used to isolate nuclei from control and IPD midbrain tissue. With a diameter of 10-20 µm, DaN nuclei are among the largest in the brain. Thus, when analysing nigral sections of IPD3 and C3 for TH-positive areas with and without nuclei, it became apparent that only a small subset of DaNs also contained (intact) nuclei (fig. S5B, C). This technical limitation likely contributed to the low number of DaNs detected by snRNA-seq. To validate DaN degeneration in the IPD tissue with an independent approach, we quantified the area of MAP2+/TH+/NM+ and MAP2+/TH-/NM+ neurons (fig. S5D). This analysis confirmed a significant decrease in DaNs and revealed an increase in the abundance of neuromelanin-containing neurons positive for MAP2 but negative for TH in IPD.
We also investigated how other clinical characteristics, in addition to condition (IPD), affect the midbrain cellular composition. For this, we modeled the percentage of each cell-type as a function of age, PMI, sex, and condition. We used beta-regression modeling to estimate the coefficients of these clinical features (fig. S6). Sex and condition (IPD) appeared to be the sample characteristics with the highest impact on the midbrain cellular composition. For instance, the most significant coefficients were the loss of DaNs and the gain of CADPS2high cells associated with IPD (fig. S6). Male sex was negatively associated with the neuronal cells, inhibitory, excitatory, GABA and DaNs. Also, negatively related to the fraction of ependymal cells. Microglial abundance, in addition to the IPD condition, was positively associated with male sex. Together, these data suggest that PD pathophysiology might affect the midbrain cellular composition in a sex-dependent manner.
To assess if the increased fraction of microglia and astrocytes in IPD is associated with a specific molecular phenotype, we sub-clustered these cell-types to independently identify cellular subpopulations and reconstruct their activation trajectories. We identified seven microglia subpopulations characterized by the expression of a few marker genes (Fig. 3A). The three biggest subpopulations are defined by the high expression of P2RY12, GPNMB, and HSP90AA1 (Fig. 3B). Given that these three sub-populations conform a continuum in the UMAP projection and both GPNMB and HSP90AA1 genes are markers of microglia activation, we estimated a cell trajectory structure comprising these major sub-populations using the DDRTree method of monocle3 (Fig. 3C). We then organized cells along this trajectory (pseudotime), starting from the trajectory node that maximizes the distance to the GPNMB and HSP70AA1 trajectory branch tips (Fig. 3C). This microglia activation trajectory spans from P2RY12high cells towards two activation branches, one containing GPNMBhigh cells and another with cells highly expressing HSP90AA1 or IL1B (Fig. 3C). IPD cells differentially distribute along the microglia activation trajectory and IPD microglia tend to be in an activated state compared to controls (Fig. 3D). While P2RY12 is highly abundant in resting microglia26, GPNMB27, HSP9028 and IL-1β29 are involved in the inflammatory response and have previously been linked to neurodegeneration28,30,31, supporting the notion that the IPD-specific upregulation of GPNMB and HSP90AA1 are makers of microglial activation. To further characterize the molecular phenotype of these two activated microglia states, we identified genes whose expression was associated with the activation trajectory and functionally enriched them to gene-ontology molecular functions (Fig. 3D, E).
This analysis revealed that these subpopulations are involved in the secretion of cytokines and the stress response to unfolded proteins (Fig. 3E). Next, we identified the genes whose expression was differentially upregulated in IPD microglia across the activation trajectory. We intersected the upregulated genes and the activation-trajectory associated genes in microglia and identified 29 genes linked with the IPD differential activation of the midbrain microglia (Fig. 3F), several of which have previously been associated with IPD32–34.
Following this approach we also characterized the astrocytes subpopulations, reconstructed the astrocyte activation trajectory and identified gene signature associated with IPD differential activation (Fig. 3G-L). We identified five astrocytes subpopulations characterized by high expression of VAV3, LRRC4C, ELMO1, ADGRV1, and CD44 (Fig. 3G-H). We recovered the astrocyte activation trajectory based on the main cell types comprising VAV3high, LRRC4Chigh, and CD44/S100A6high subpopulations (Fig. 3I). Given that CD44 expression implicates reactive astrogliosis35 we ordered cells on the activation trajectory by setting the root in the trajectory graph-node that maximizes the distance from the CD44high branch end. These results implied an astrocyte activation transition from LRRC4Chigh to high CD44high subpopulations (Fig. 3I). Indeed, IPD astrocytes were highly enriched at the end of the astrogliosis trajectory compared to control astrocytes (Fig. 3J). We further characterized the molecular phenotype of the CD44high astrocyte activated stated by enriching GO molecular functions to the highly upregulated genes across the astrocytes activation trajectory (Fig. 3J-K). The CD44high subpopulation was related to the unfolded protein response (UPR) pathway, which has recently been linked to a specific astrocyte reactivity state that is detrimental to the survival of neurons36 (Fig. 3K). Next, we calculated IPD-differentially upregulated genes, which were also highly expressed towards the end of the astrogliosis trajectory (Fig, 3L) and identified 34 genes associated with IPD differential astrogliosis (Fig. 3L). These genes include several heat-shock proteins that have previously been shown to co-localize with alpha-synuclein deposits in the human brain37.
Finally, with this cell-type-specific approach, we investigated the oligodendrocyte diversity and reconstructed its differentiation trajectory (Fig. S7). We identified five subpopulations characterized by the expression of ATP6V02, OPALIN, TRPM3, ST6GAL1, and RBFOX1 (fig. S7A-B). The inferred trajectory based on subpopulations recovers differentiation trajectory spanning from FRY/OPALINhigh cells towards RBFOX1/S100Bhigh cells (fig. S7B-C). OPALIN (also named as Tmem) is a marker of myelinating oligodendrocytes38, while S100B has been associated with glial stress response in PD postmortem midbrain39. When comparing IPD oligodendrocyte density across this trajectory, we found a reduced fraction of myelinating OPALINhigh cells compared to controls (Fig. S7D). An overlay of the IPD differentially expressed genes and such defining the oligodendrocyte trajectory identified 216 and 330 down-regulated and upregulated genes in the IPD and across the trajectory. Downregulated genes are associated with neuronal maintaining pathways, while upregulated genes are related to the response to unfolded protein pathways (fig. S7E-F).
We then focused on deciphering the identity of the CADPS2high cells, which were almost exclusively detected in the IPD patients (Fig. 4A). This cell population pseudo-bulk-transcriptome clusters together with the neuronal cell types, excitatory, inhibitory, GABAergic, and DaNs (fig. S1I). In accordance with this global transcriptome similarity, CADPS2high cells positively express the neuronal marker genes MAP2 and SCN2A (Fig. 4B). Performing immunofluorescence labeling with antibodies for CADPS2, MAP2, and TH, we confirmed the existence of a neuronal cell population with an IPD-specific increase of CADPS2 expression in the SN (Fig. 4C, E). Primary, SN is the midbrain region where we detected a CADPS2 intensity higher in IPD patients than control subjects (Fig. 4C). Therefore, we then characterized the CADPS2 intensity in the SN neurons. We found that CADPS2 intensity was higher in IPD than controls in the TH-negative neurons harboring neuromelanin deposits (MAP2+/TH-/NM+) (Fig. 4D-E). Given that the TH positive neurons (MAP2+/TH+/NM+) do not display CADPS2 intensity differences (Fig. 4D), we concluded the CADPS2high cells represent nonfunctional dopamine neurons. In accordance with their snRNA-seq profiles, cells with high CADPS2 expression were positive for MAP2 but had low TH levels (Fig. 4B, E). Besides, CADPS2high cells showed high TIAM1 expression (Fig. 4B), a key regulator of neuronal differentiation of dopaminergic precursor cells40. In agreement with this finding, cell-cycle labeling analysis indicates that while the majority of excitatory, inhibitory, GABAergic, and DaNs neurons were in a resting state (G0/G1), a considerable proportion of CADPS2high neurons had entered the cell cycle (G2/M/S) (Fig. 4F). We then co-embedded the DaN and CADPS2high cell populations with the developmental trajectory of the human dopamine neurons (6-11 weeks-old). The DaNs from our study predominantly clustered with differentiated DaNs (Fig. 4G, H) while the CADPS2high neurons were transcriptionally more similar to embryonic neuroblasts (Fig. 4G, H). These results might indicate a cell cycle re-entry and partial re-activation of the developmental machinery in CADPS2high neurons.41
Discussion
In this study, we describe a single-cell atlas of the human midbrain from IPD patients and age-matched controls. Applying snRNA-seq to postmortem midbrain tissue, we profiled the transcriptomes of more than 41,000 single-nuclei with the aim to identify cell type- and disease-specific molecular signatures associated with IPD.
In the current dataset glia made up ∼80% of all sequenced cells, enabling an in-depth analysis of their contribution to the pathogenesis of IPD. We identified a disease-specific upregulation of microglia, which mediate the innate immune defense in the brain. During microgliosis, microglia amplify, secrete cytokines and undergo morphological changes42. Suggestive of an activated state, we detected less ramified microglia in IPD postmortem tissue using a quantitative immunofluorescence approach. Moreover, we identified a significant PD risk variant enrichment in microglia showing the strongest association with the PD gene LRRK2. The kinase LRRK2 is most abundant in immune cells and may contribute to inflammasome formation via the phosphorylation of Rab GTPases43. In addition, by inferring the activation trajectories of the microglial subpopulations, we observed a transition of cells from resting into an activated state. In agreement with this finding, pathway analyses highlighted cytokine signalling and, likely upstream of this, an induction of the UPR pathway. We also found chaperones and heat-shock proteins overexpressed along the disease trajectory, which when they are released from the cell can act as damage-associated molecular patterns (DAMPs) that trigger an immune reaction44.
Astrocytes can equally act as immune effector cells in the brain by releasing proinflammatory cytokines45, likely explaining why we observed a PD risk variant enrichment in these cells. Furthermore, when modelling astroglial activation trajectories, we detected reactive astrogliosis specifically in patient cells35. As for microglia, pathway analysis along the trajectory identified the UPR pathway, which has recently been described to influence the astrocytic secretome.36 Neurotrophic factors released from reactive astrocytes were shown to accelerate neuronal demise36 - a disease mechanisms that has not gained much attention in PD research so far.
Our snRNA-seq data also showed a trend towards decreased oligodendrocyte numbers in IPD midbrain - a finding that we could validate by immunohistochemistry. In the white matter, oligodendrocytes generate myelin sheets, which provide insulation of axons and ensure saltatory conduction46. However, since PD has long been considered a “gray matter” disease, oligodendrocytes only recently gained attention in the field. Combining the latest genome-wide association study (GWAS)47 with snRNA-seq results from IPD and control midbrain sections, we did not observe a significant PD risk association for oligodendrocytes.
However, trajectory inference analysis revealed a transition from high OPALIN to high S100B expression subpopulations. While S100B was shown to control the maturation process of oligodendrocytes48, the protein has also been linked to neurodegeneration. S100B overexpression in response to cytokine injections mediates dystrophic neurite formation in an Alzheimer rat model49. Accordingly, the oligodendrocyte-specific upregulation of S100B observed in the IPD midbrains may be the result of enhanced cytokine release from microglia. Taken together, these results further implicate glial cells in the propagation of neuroinflammatory and neurodegenerative processes in IPD.
When focusing on DaNs in our snRNA-seq study, we did not observe a significant loss in IPD tissue. However, a meaningful comparison in IPD patients and controls was likely hampered by the low abundance of DaNs in general - they only constituted 0.18% of the total cell count. By contrast, automated image analysis of immunofluorescence labelled IPD and control midbrain sections indicated a significant loss of neuromelanin containing DaNs. This result was in line with the neuropathological reports, which described severe DaN degeneration in all IPD patient samples (cp. table S1). Thus, technical limitations may have caused the underrepresentation of DaNs in the transcriptomic data: first, we used midbrain slices of a thickness of ∼15µm. By contrast, with an average diameter of 10-20 µM, DaN nuclei are rather large. Hence, a considerable proportion of nuclei may not have remained intact during the sectioning process - a prerequisite for high-quality snRNA-seq results. Second, when we previously applied scRNA-seq to midbrain organoids, the detected number of TH-positive neurons was well below the actual DaN count50. One possible explanation for this discrepancy could be that the majority of TH mRNA transcripts lack or have a very short poly(a) tail and are therefore not recognized by snRNA-seq51. Despite these constraints, our cell-type specific analysis for PD variants in midbrain tissue revealed an enrichment in DaNs.
Finally, we discovered a new disease-specific cell state that is defined by translational similarity to midbrain DaNs but with low TH levels and high CADPS2 expression. Immunofluorescence analysis revealed that CADPS2high neurons are localized to the SN and that they harbour neuromelanin deposits. CADPS2 (Calcium-dependent activator protein for secretion 2) functions in the uptake and storage of vesicular monoamines52,53 such as dopamine and has previously been linked to genetic PD54. In the presence of the LRRK2 mutation G2019S, CADPS2 expression was found to be enhanced54. However, in our patient samples LRRK2 mutations were excluded. In addition to high levels of CADPS2, the IPD-specific neuronal population was characterized by elevated TIAM1 expression. TIAM1 (T-cell lymphoma invasion and metastasis 1) regulates midbrain DaN differentiation via the Wnt/Dvl/Rac1 pathway40. Wnt signalling is crucial in the development of DaNs and has become a major target for regenerative treatment approaches in PD55. Accordingly, increased TIAM1 expression in CADPS2high neurons may suggest that these neurons revert to an earlier developmental stage. This is in line with our observation that, unlike DaNs, a large proportion of CADPS2high neurons was found in a non-resting state. An incomplete cell-cycle re-entry of DaNs has previously been reported in postmortem brain samples from IPD patients41. Further, this study showed that the inability to re-start a developmental program mediates DaN demise41. Here, we may have observed a similar phenomenon: DaNs that are highly metabolically active accumulate neuromelanin in an accelerated fashion until a certain threshold is reached. Beyond this tipping point, the neuromelanin deposits become detrimental56, resulting in a partial activation of the developmental machinery and, subsequently, the death of these DaNs. Supporting this hypothesis, CADPS2high neurons clustered with neural progenitor cells and neuroblasts when we co-embedded our single-cell results onto scRNA-seq data from human embryos. Thus, we speculate that the TIAM1/CADPS2high population of cells represents DaNs that are in the process of dying.
In summary, our study revealed several novel aspects of IPD pathology. Using snRNA-seq in postmortem midbrain tissue from patients and matched controls, we identified a disease-specific upregulation of microglia and astrocytes as well as a loss of oligodendrocytes. In addition, we discovered a novel neuronal cell state that was almost exclusively identified in IPD midbrain tissue. These cells were characterized by low TH levels but high expression of CADPS2 and TIAM1. Based on our scRNA-seq and quantitative imaging data, we conclude that these neurons lost their dopaminergic identity and unsuccessfully attempted to re-enter the cell cycle. Our findings strengthen the role of neurodevelopmental and neuroinflammatory mechanisms in the pathogenesis of PD opening new avenues for novel therapeutic strategies.
Materials and Methods
Human brain tissue cryosectioning
Frozen human postmortem midbrain tissue sections and associated clinical and neuropathological data were supplied by the Parkinson’s UK Brain Bank and the Newcastle Brain Tissue Resource. According to the neuropathological procedure, after removing the brainstem and cerebellum, the brain hemispheres were divided down the midline with the hemi-midbrain associated with each hemisphere. The left hemi-midbrain was removed with a transverse section by taking a line from just behind the mammillary body through the superior colliculus. This midbrain block was then snap-frozen at −120°C and cryosectioned at ∼15 μm thickness in the transverse plane. The resulting sections were stored at −80°C.
Patients and controls gave written informed consent with the brain banks which, together with the ethics review panel of the University of Luxembourg, approved the study.
Sample preparation for nuclei isolation
Six to eight sections were combined from one individual for nuclei isolation. Nuclei were isolated by adapting the published 10X Genomics® protocol for ‘Isolation of Nuclei for Single Cell RNA Sequencing’ In brief, the tissue was lysed in a chilled lysis buffer (10 mM Tris-HCl, 10mM NaCl, 3mM MgCl2, 0,1% NonidetTM P40). Then, the suspension was filtered and nuclei were pelleted by centrifugation. Nuclei pellets were then washed in ‘nuclei wash and resuspension buffer’ (1xPBS, 1% BSA, 0.2U/μl RNase inhibitor), filtered and pelleted again. Nuclei pellets were suspended in DAPI solution (1.5 μM DAPI in 1xPBS) and incubated for 5 minutes prior to sorting. After dissociation, single DAPI-positive nuclei were FACS-sorted using a FACSDiva Cell Sorter (BD Biosciences).
Library preparation and sequencing
Sorted nuclei were processed using the Chromium Next GEM Single Cell 3’ Kit v3.1 to generate the cDNA libraries. The quality of cDNA was assessed using the Agilent 2100 Bioanalyzer System. Sequencing was performed on Illumina NovaSeq 6000-S2.
Transcript quantification and filtering
FASTQ files were generated from the raw base call (BCL) outputs with the Cell Ranger (10X Genomics) mkfastq pipeline v.3.0. From this, we obtained a gene-barcode UMI count matrix per sample using the Cell Ranger (10X Genomics) count pipeline v.3.0 using default parameters. The Cell Ranger count pipeline only considers exon-mapping reads during UMI-counting. Also, single-nuclei sequencing readouts are enriched in intronic regions. To account for this, we used the Cell Ranger recommended variation of the human reference transcriptome (hg38), where introns are annotated as exons. We retained barcodes with more than 1500 UMIs and 1000 genes, as well as less than 10% of mitochondrial-encoded (mtDNA) and 10% of ribosomal gene counts. We kept genes detected in more than three barcodes. Also, we filtered out ribosomal and mtDNA-encoded genes. We then used Scrubblet57 to identify potential multiplet barcodes. We kept barcodes with an estimated duplet score smaller than 0.15 for downstream analysis.
Normalization, sample integration, and cell clustering
To identify the major cell types comprising the human midbrain, we combined the samples in a single embedding following the Seurat v358 integration workflow. First, each sample was normalized using the SCTransform approach59. Cell-cycle phase assignment was performed based on this normalized expression matrix. We used the Seurat CellCycleScoring function and the Seurat v3 reference genes for the S and G2/M cell-cycle phases. To determine the between-sample cell-anchors, we used the FindIntegrationAnchors Seurat function with the top 4000 consistently, most variable genes in all samples, which were identified with the SelectIntegrationFeatures function. We then used the IntegrateData Seurat function to obtain a combined and centered expression matrix. Principal component (PC) analysis was done on this centered expression matrix. The top 25 PCs were used to build a Shared Nearest Neighbor (SNN) cell graph, which was then clustered using the Louvain algorithm (resolution = 1.5) implemented in the Seurat FindClusters function. The top 25PCs were embedded into two dimensions using the Uniform Manifold Approximation and Projection (UMAP) algorithm60. We identified marker genes for each cluster by using the ROC method of the Seurat FindAllMarkers function. The top marker genes were used to assign cell-type annotations manually for each cell cluster. We compared the cell types by correlating their pseudo bulk profiles. The resulting gene-cell-type matrix was normalized (Transcript Per Million) and log2 transformed. The Pearson correlation estimates among the normalized cell-type profiles were used as the input distance matrix for hierarchical clustering.
Differential cell-type composition
We estimated the differential cell type composition by comparing the UMAP embeddings and the cell type proportions between IPD and control samples. We considered the two-dimensional kernel cell density of the IPD and control cells independently on the first two UMAP components using the kde2d function (bins = 100) implemented in the MASS R package61. The IPD log2 differential UMAP density was calculated. Also, for each cell type, we compared the proportion of cells per sample between the IPD patients and control individuals. We assessed this difference with the Student’s t-test implemented in the t.test function of the R stats package62. Furthermore, we used the beta-regression model to estimate the contribution of the sample clinical features (e.g., condition, postmortem interval [PMI], age, sex) on the cell proportion variation. We modeled the cell type proportion using the betareg R package63.
Sub-clustering, trajectory reconstruction, and differential gene expression in microglia, astrocytes and oligodendrocytes
We subset cell-type-specific UMI raw counts. To identify the cell type subpopulations, we integrated cells from different samples following the Seurat3 Reciprocal Principal Component Analysis based protocol considering the top 1000 highly variable genes for astrocytes and oligodendrocytes or the top 500 highly variable genes. Then we used the unsupervised and network-based Louvain clustering approach based on the top 25 principal components of the integrated datasets. Marker genes were defined as described before. We reconstructed the cellular activation trajectories following the monocle3 approach. Briefly, cells from different samples were integrated and factor size normalized. The sample effect was removed using the Mutual Nearest Neighbor method64. Then the highly variable genes defined before were embedded in the first 25 principal components used for dimensionality reduction and trajectory inference using the learn_graph function implemented in the monocle v3 R package8. Pseudotime ordering was done in a supervised manner by rooting the trajectory in the graph node that maximizes the distance to the known activated cell subpopulation. We identified cell-type-specific perturbed genes in IPD by fitting negative binomial distributions to each gene using the fit_models function implemented in the monocle v3 R package8. IPD differential expression coefficient with q < 0.05 were considered as differentially expressed genes. Highly variable genes associated with the cell trajectories were identified by using the spatial correlation analysis Moran’s I approach implemented in the graph_test function of the monocle v3 R package8. Functional enrichment analysis of the differentially perturbed genes was done using Enrichr65.
Human embryonic and adult dopamine neurons co-embedding
We merged the raw UMI count matrix comprising the human developmental trajectory of dopamine neurons available in GSE76381 GEO accession and the UMI count matrix containing the adult dopamine and CADPS2high neurons reported in this study in a single dataset. To remove the study effect, we followed the Seurat3 Reciprocal Principal Component Analysis based Seurat3 integration protocol using the embryonic cells as reference and considering the top 4000 highly variable genes among the embryonic cells. Based on the integrated top 25 Principal Components, we embedded these cells into two UMAP dimensions. Cluster analysis of these cell-types was performed using the BuildClusterTree implemented in Seurat3.
Automated image analysis
Immunofluorescence images of human postmortem midbrain sections were acquired with Carl Zeiss Axio Observer Inverted Microscope Z1 and analyzed in Matlab (Version 2019B, Mathworks). Automated in-house developed image analysis algorithms segment the fluorescent cell areas (neurons, astrocytes, microglia, oligodendrocyte) extracting features such as area, perimeter and mean fluorescence intensity. The segmentation of all neurons was computed by convolving the raw MAP2 channel with a Gaussian filter. By setting a pixel threshold, all high intensity neurite areas were detected and subtracted from the image. The MAP2 mask was generated by setting the pixel threshold, followed by bwareaopen to remove small connected components. The detected areas were further segmented with watershed to dissociate small objects from the cell areas, which are removed by another bwareaopen filter in order to generate the final MAP2 mask. The segmentation of dopaminergic neurons was computed by convolving the raw TH channel with a Gaussian filter. By setting a pixel threshold, TH-positive cell areas were detected followed by bwareaopen to remove small connected components in order to generate a TH area mask. The neuromelanin mask was computed by identifying areas below the selected pixel threshold and removing the small connected components with bwareaopen. In addition, we identified the MAP2+/TH+/NM+ and MAP2+/TH-/NM+ area masks by subtracting and adding the respective individual mask areas. The protein level of CADPS2 was identified by overlaying a respective mask (MAP2+, MAP2+/TH+/NM+ and MAP2+/TH-/NM+ mask) over the raw CADPS2 channel and detecting the mean intensity of the pixels in the mask area. Each measured intensity was then normalized to the area size. The segmentation of astrocytes and microglia was computed by selecting a pixel threshold, followed by an imfill filter in order to generate the cell area masks for GFAP or IBA1, respectively. Further, the skeleton of the IBA1 mask was generated with a thinning function to identify the branching of the mask. Because of the massive population of oligodendrocytes, we generated the mask by selecting a pixel threshold to identify all the PLP1 positive areas without segmentation. The results were visualized with ggplot2 in R 4.0.0.
Genotyping of PD cases using NeuroChip
DNA samples from all PD cases underwent genotyping at the Institute of Human Genetics at the Helmholtz Zentrum München using the Illumina (San Diego, CA) NeuroChip66. Standard genotype data quality control (QC) steps were carried out67. SNP imputation was carried out on our NeuroChip data using the Michigan Imputation Server68 to produce a final list of common (minor allele frequency ≥ 1%) variants for further analyses. Imputed SNP positions were based on Genome Reference Consortium Human 37/human genome version 19 (GRCh37/hg19). All cases were screened for disease-associated variants in known major PD genes (SNCA, LRRK2, DJ-1, PRKN, GBA, PINK1, ATP13A2, VPS35, MAPT, DCTN1, DNAJC6, SYNJ1, VPS13C and MAPT) covered by the NeuroChip.
Cell type association with genetic risk of PD
Association analysis of cell type-specific expressed genes with genetic risk of PD was performed using Multi-marker Analysis of GenoMic Annotation (MAGMA) v1.07, in order to identify disease-relevant cell types in the midbrain. MAGMA is a gene set enrichment analysis method, which tests the joint association of all SNPs in a gene with the phenotype, while accounting for LD structure between SNPs69. Competitive gene set analysis was performed on SNP p-values from the GWAS summary statistics of Nalls et al. (excluding 23andMe)47 and the publicly available European subset of 1000 Genomes Phase 3 was used as a reference panel to estimate LD between SNPs. SNPs were mapped to genes using NCBI GRCh37 build (annotation release 105). Gene boundaries were defined as the transcribed region of each gene. An extended window of 35 kb upstream and 10 kb downstream of each gene was added to the gene boundaries70. ZSTAT is the Z-score (mean zero and unit SD) for the gene, based on its p-value. It was used to determine the gene association in the gene level analysis. If a gene is not differentially expressed, Z-scores will have approximately zero mean and unit variance. Gene sets used in MAGMA were filtered for: FDR-corrected p-values < 0.05, percentage of cells of the cluster where the expression was detected > 0.5, and logFC > 0.25.
Machine learning cross-validation of cell type annotation
To quantitatively assess the cell-cluster definition and annotation, we implemented a stratified cross-validation machine learning approach. Briefly, we removed the sample effects on the combined UMI count dataset using Harmony71. For normalization, we used the loess transform72 to fit a smooth curve between mean and variance using the log transformed data. We then scaled the data with the fitted mean and standard deviation. The identified marker genes (Table S3) were selected as the features of the model. We considered the median cell number of each cell-type to subsample the dataset as we had few cell-types (DaNs and CADPS2high) underrepresented. To ensure equal label composition in the training and the test sets, we split the data using scikit-learn73 StratifiedKFold with 70% of the data as training and 30% as test dataset and a 5-fold cross validation. We did a dimensional reduction with truncatedSVD to 30 components. These 30 components were classified based on scikit-multilearn’s ensemble classification74 which uses louvain-based clustering75,76 and a random forest classification to account for the clustered and sparse nature of the scRNAseq data. The predicted cell types were then compared to the manually curated cell label assignments using a confusion matrix.
Data Availability
Raw data for the 11 samples presented in this study are available in the Gene Expression Omnibus (GEO) with accession number GSE157783. Single-cell human embryonic midbrain dataset was available in the GSE76381 GEO accession.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157783
General
We express our gratitude to the tissue donors and their families for their generous participation.
Funding
S.Sm. and S.P. received funding from the Luxembourg National Research Fund (FNR) within the PARK-QC DTU (PRIDE17/12244779/PARK-QC). A.G. is supported by the FNR within the framework of the ATTRACT (Model IPD, FNR9631103) career development program. C.M.M. is supported by grants to the Newcastle Brain Tissue Resource from UK MRC (MR/L016451/1), the Alzheimer’s Society and Alzheimer’s Research Trust through the Brains for Dementia Research Initiative, and from National Institutes for Health Research Biomedical Research Centre Newcastle. Moreover, A.G., P.M. and Z.L. obtained FNR funding as part of the National Centre of Excellence in Research on Parkinson’s disease (NCER-PD, FNR11264123) grant, and A.G. and P.M. from the FNR Core MiRisk-PD (C17/BM/11676395) grant. M.S. is supported by grants from the Deutsche Forschungsgemeinschaft (DFG) (SP1532/3-1, SP1532/4-1 and SP1532/5-1), the Max Planck Foundation and the Deutsches Zentrum für Luft-und Raumfahrt (DLR 01GM1925). The Parkinson’s UK Brain Bank is funded by Parkinson’s UK, a charity registered in England and Wales (258197) and in Scotland (SC037554).
Author contributions
S.Sm., C.A.P-M., A.G. and M.S. designed the study. Experiments were performed by S.Sm., C.D., S.Sa., J.J. and J.H.; sequencing was performed by B.T.; gene expression data was analysed by C.A.P-M and S. B.. Imaging data analysis was performed by S.Sm., S.P., J.J. and P.A.; tissue collection and neuropathological analysis was carried out by C.M.M.; IPD genetic profiling was performed by S.P., Z.L. and P.M. PD genetic variant enrichment was accomplished by Z. L., C.A.P-M., and P. M.. S.Sm., C.A.P-M., J.C.S., P.M., A.G. and M.S. wrote the manuscript; all authors contributed to the critical revision of the manuscript.
Competing interests
There are no competing interests among the authors.
Data and materials availability
Raw data for the 11 samples presented in this study are available in the Gene Expression Omnibus (GEO) with accession number GSE157783. Single-cell human embryonic midbrain dataset was available in the GSE76381 GEO accession.