Abstract
Microglia and perivascular macrophages, myeloid-origin resident immune cells in the human brain, play crucial roles in Alzheimer’s disease (AD)1–4. However, the field lacks a unified taxonomy describing their heterogeneity and plasticity5. To address this, we applied single-cell profiling to two independent, demographically diverse cohorts. The first comprises 543,012 viable myeloid cells from 137 unique postmortem brain specimens, while the second consists of 289,493 myeloid nuclei from 1,470 donors. Collectively, they cover the human lifespan and varying degrees of AD neuropathology. We identify 13 transcriptionally distinct myeloid subtypes, including the “GPNMB” subtype that proliferates with AD. We distinguish two contrasting homeostatic microglial states in AD and with aging: the first (“FRMD4A”) wanes over time, while the second (“PICALM”) becomes more prevalent. By prioritizing AD-risk genes, including PTPRG, DPYD, and IL15, and placing them into a regulatory hierarchy, we identify common upstream transcriptional regulators, namely MITF and KLF12, that regulate the expression of AD-risk genes in the opposite directions. Through the construction of cell-to-cell interaction networks, we identify candidate ligand-receptor pairs, including APOE:SORL1 and APOE:TREM2, associated with AD progression. We show polygenic risk for AD predisposes and prioritize the GPNMB subtype as a therapeutic target of early intervention. Our findings delineate the relationship between distinct functional states of myeloid cells and their pathophysiological response to aging and AD, providing a significant step toward the mechanistic understanding of the roles of microglia in AD and the identification of novel therapeutics.
Main
Despite the quantifiable neuropathology of β-amyloid plaques (Aβ) and neurofibrillary tangles (NFTs)6, the exact neurobiological mechanisms underlying Alzheimer’s disease (AD) remain elusive. Brain myeloid-origin immune cells, including microglia and perivascular macrophages (PVMs), play crucial roles in the pathogenesis of AD1–4,7–10, providing neuroprotective benefits by clearing lesions, but also exacerbating the disease through the induction of excessive neuroinflammation11. While previous studies utilizing single-nucleus/-cell RNA sequencing (snRNA-seq/scRNA-seq) have made significant progress describing complex functional roles of murine and human microglia in AD4,12–15, challenges with characterizing the wide spectrum of microglial heterogeneity and identifying more nuanced AD-associated subtypes still remain, largely due to limited sample sizes and differences in the single-cell technologies used. Among the issues that arise is the failure of nuclear fractions in snRNA-seq from frozen tissue to capture key genes related to microglial adaptation and response to pathogenic lesions16. Moreover, microglia are highly reactive cells, and describing their adaptive nature using scRNA-seq in cells isolated from fresh tissue is challenging17. To overcome those limitations, we present two independent human myeloid cohorts generated at single-cell resolution from the prefrontal cortex (PFC). In the first cohort, we isolated viable ex-vivo human myeloid cells from fresh postmortem PFC and deeply profiled both nuclear and cytoplasmic RNA. The second cohort focused on the breadth of the transcriptome, profiling human myeloid nuclei from a large number of demographically diverse frozen cortical tissues. By considering both the depth and the breadth of the human myeloid transcriptome, we establish a reproducible taxonomy and demonstrate the importance of microglia and PVM plasticity throughout the lifespan, across different stages of AD pathological and clinical severity, and genetic liability.
Uniformly processed atlas of human myeloid cells
In total, we profiled 832,505 human myeloid cells from the PFC of 1,607 unique donors. The first dataset, named FreshMG, includes samples from fresh autopsy tissue specimens of 137 unique postmortem donors recruited from two brain banks and contains individuals displaying varying degrees of AD neuropathology as well as controls (Figure 1A). FreshMG donors are aged between 26 and 107 years (average 80.7 years), comprising 76 females and 61 males (Figure S1A-J). To enrich for myeloid cells, viable CD45+ cells were isolated via fluorescence-activated cell sorting (FACS). In addition, for a subset (n=3 donors, each with 8 technical replicates), we profiled surface-level protein markers using CITE-seq18, using a panel of 163 unique antibodies, resulting in a total of 161 scRNA-seq libraries from fresh brain specimens. Following rigorous QC and initial clustering, we found a large, relatively homogeneous, cluster of myeloid cells along with small subsets of co-purified immune cells, such as monocytes, neutrophils, T, NK, and B cells. The myeloid cluster consisted of 543,012 microglia and PVMs robustly expressing 23,740 genes (Figure S2A).
The second dataset, named PsychAD, consists of frozen prefrontal cortex specimens and includes cases and controls from a cohort of 1,470 unique donors (Figure 1B). PsychAD donors were aged between 0 and 108 years, (average 71.3 years), comprising 761 females and 709 males (Figure S1M-V). Frozen samples were subject to snRNA-seq profiling from which microglia and PVMs were sorted in silico after basic clustering. After rigorous QC, we identified 289,493 microglia and PVM nuclei robustly expressing 34,890 genes (Figure S2B). Next, we aligned and harmonized the scale of clinical variables to facilitate annotation of both datasets (Methods) and saw a strong positive correlation with measures of the severity of AD neuropathology, namely diagnostic certainty of AD, Consortium to Establish a Registry for Alzheimer’s Disease (CERAD)19, and Braak stage20 (Figure S1K, W). In contrast, the clinical measures of dementia severity were less well correlated with AD.
Establishing a reproducible taxonomy of human myeloid cells
Our primary objective was to establish a comprehensive cellular taxonomy that is robust and reproducible; however, cross-validating these independent and large-scale single-cell datasets, each with a distinct transcriptomic origin (whole cell vs. nuclei), posed technical challenges. To overcome these, we devised an iterative cross-validation strategy, which involved establishing a reference state and validating it independently until both datasets were in agreement (Methods). Utilizing the FreshMG dataset, which provides comprehensive transcriptomic profiles from both nuclear and cytosolic fractions, we identified functionally distinct phenotypes of microglia and PVMs. Subsequently, we cross-validated the presence of these reference subtypes in the frozen specimen snRNA-seq PsychAD dataset. Our iterative process converged on 13 functionally distinct subtypes of human myeloid cells (Figure 1C, S2C), and comparison between FreshMG and PsychAD revealed a high degree of consistency between the two cohorts, as evidenced by an average Pearson correlation of 0.77 across all identified subtypes (Figure 1D). This rigorous methodology ensured the accuracy and reliability of our cellular taxonomy, laying a solid foundation for further analyses.
We then grouped the cells using two levels of taxonomic hierarchy; the 13 distinct subtypes under six broad functional subclasses of human myeloid cells: Homeostatic (green), Adaptive (blue), Proliferative (yellow), AD-Associated or ADAM (red), ex-vivo Activated Microglia or exAM (pink), and PVM (orange) (Figure 1C). Each subtype is associated with specific markers that not only aid in their identification but also hint at their functional significance (Figure 1E, S3A). Within the homeostatic microglia subclass, we highlight two subtypes, FRMD4A and PICALM, both of which are associated with the regulation of GTPase activity. Homeostatic microglia make up the largest proportion of myeloid cells (Figure 1F) and express microglia-specific canonical markers such as P2RY12 and CX3CR1. The FRMD4A subtype uniquely expresses CECR2 and NAV2, with other genes pointing towards cellular maintenance, phagocytosis, cell migration, and adhesion. The PICALM subtype shows elevated expression of PICALM and ELMO1, suggesting roles in the regulation of the immune response. We also identified 7 specialized microglial subtypes, each exhibiting unique adaptive responses to neuro-environmental cues. In general, the gene signatures across these adaptive microglia underscored an enhancement in antigen processing and presentation programs and the facilitation of MHC protein complex assembly. The CCL3 subset is characterized by the upregulation of chemotactic genes, most notably the inflammatory cytokines CCL3, CCL4, and interleukin 1 beta (IL1B). In addition, the IFI44L subtype is enriched in interferon-inducible genes, like IFIT1, IFIT2, and IFIT3, suggesting a role in the antiviral innate immune response. The AIF1, HIF1A, and HIST clusters share a common gene program related to immunoglobulin-mediated immune response, while the TMEM163 cluster focuses on antigen processing and presentation via MHC II. The final adaptive cluster, HSPA1A, is enriched for gene signatures responsible for adaptive response to unfolded protein, which is characterized by elevated activity of heat shock proteins and cellular stress response, with a potential role in AD neuropathology21. In addition, we identified a subtype, the GPNMB, which is predominantly observed in individuals with AD22,23. These AD-associated microglia (ADAM) feature elevated expression of GPNMB, MITF, and PTPRG genes, and functional enrichment analysis suggests increased phagocytic activity is a hallmark of these cells. Consistent with previous studies12, we also identified a cluster of proliferative cells, MKI67, that is highly enriched in cell-cycle dependent genes (STMN1, MKI67, TOP2A). Lastly, we report a cluster, ERN1, showing specific expression of ERN1 and PLK2 genes that resemble activation patterns of exAM17 (Supplementary Information). In addition to microglial subtypes, we identified a PVM cluster, named CD163, expressing a unique set of known PVM-specific markers, notably CD163 and F13A1. The CD163 cluster displayed a significant enrichment of genes involved in endocytic processes, emphasizing its priming for receptor-mediated endocytosis and phagocytosis.
While we observe a close similarity between ADAM and PVM clusters (Figure S2C), we found a clear separation between the two when we enriched for conserved murine disease-associated microglia (DAM) signatures as well as human DAM signature from iPSC-derived microglia23,24 (Figure S2F).
Validation of the myeloid taxonomy using CITE-seq, human brain biopsies, and spatial phenotyping analysis
Since the cellular taxonomy is primarily derived from the transcriptome, we assessed the preservation of the functional hierarchical structure at the protein level using CITE-seq, jointly quantifying the transcriptome and 163 unique cell-surface proteins. Using this approach, we identified unique patterns of protein markers for each myeloid subtype (Figure S2D). For example, within the homeostatic microglia subclass, the FRMD4A subtype expressed CD99 and ITGB3, while the PICALM subtype expressed CLEC4C and TNFRSF13C. Likewise, the PVM cluster showed distinct surface markers, CD163 and CCR4, while the ADAM cluster was specific for CD9 and CD44. To validate the spatial context of PVMs, we conducted deep single-cell phenotyping and spatial analysis using multiplexed imaging assay (Akoya PhenoCycler) and demonstrated, via staining for CD163, that PVMs colocalize around blood vessels (Figure 1G).
To further validate the robustness of our human myeloid taxonomy, in addition to the FreshMG and PsychAD cohorts, we generated an additional, independent, scRNA-seq dataset, called LivinMG. For this dataset, brain biopsies were obtained from 25 unique human donors (26 libraries; 97,828 cells after QC) diagnosed with spontaneous intracerebral hemorrhage (ICH)25. The brain tissue was collected during treatment and processed in an identical manner to the fresh autopsy material. It’s important to note that cortical biopsy samples were obtained from a site distal to the site of the hemorrhage and, in the absence of a secondary diagnosis, are considered neurotypical controls. To assign microglial subtype labels, we employed scANVI26 to transfer the established myeloid taxonomy from the FreshMG dataset (Figure S2E). We confirmed that all 13 subtypes were present in all three datasets, thus validating that the taxonomy is consistent, irrespective of the tissue source (Figure 1F).
Variation in human myeloid subtypes during aging
After determining 13 distinct subtypes of human brain myeloid cells, we examined the compositional variation of myeloid subtypes that are associated with aging in a subset of neurotypical donors who were free of dementia and diagnostic neuropathology from the FreshMG and PsychAD datasets. We normalized the subtype count ratio data using the centered log-ratio transformation and modeled using a linear mixed model, accounting for technical and demographic variables (Methods). Notably, the two homeostatic microglia subtypes displayed opposing trajectories with respect to aging (Figure 2A, B). The FRMD4A subtype showed progressive decline while the PICALM subtype showed a gradual increase with age. In addition, we saw an overall increase in the proportions of the ADAM and PVM subtypes with age (Figure 2A left, B). In contrast, we observed an age-related decline in the CCL3 subtype, indicating a possible reduction of chemotactic microglia in older brains. In parallel, we investigated sex-dependent variation in human myeloid subtypes, with or without taking age into consideration, but did not find any statistically significant compositional differences between males and females (Figure 2A middle, right).
Next, we investigated differential gene expression patterns associated with normal aging (Figure 2C). One notable finding was the up-regulation of the MS4A6A gene, a member of the MS4A family of cell membrane proteins, which are involved in the regulation of calcium signaling and have been implicated in neurodegenerative processes27. The age-related gene expression changes for both homeostatic subtypes were enriched with actin filament-based process and actin cytoskeleton organization pathways, supporting their proposed roles in cell adhesion and migration (Figure S3B). The CD163 subtype was associated with the upregulation of cell adhesion processes as well as pathways related to cell proliferation. The gene signatures in the GPNMB subtype were enriched with immune response and activation. Overall, the increased involvement of PVM and ADAM subclasses indicated an upregulation of inflammatory responses in older individuals.
Variation in transcriptional landscape of human myeloid cells during onset and progression of AD
Next, we examined the variation of gene expression during the early stages of AD and its progression. To minimize the effect of younger brains, we limited the analysis to donors 40 years and older, resulting in a dataset composed of 134 donors from the FreshMG and 1,314 donors from the PsychAD cohort. We first evaluated the involvement of myeloid subtypes using the centered log-ratio transformed count ratio data after accounting for technical and demographic variables (Methods).
Overall, irrespective of different measures of AD phenotypes (dx_AD, CERAD, Braak, and Dementia), we observed robust changes in subtype proportions in both FreshMG and PsychAD cohorts (Figure 3A). Similar to normal aging, two homeostatic subtypes showed opposing trends, where the FRMD4A subtype showed a progressive decline with increasing AD burden while the PICALM subtype showed a gradual increase (Figure 3B). The deviation point where the two subtypes diverged was observed during the early stages of Braak, between 0 and 1, likely before the onset of AD. Likewise, we observed a consistent increase in the proportion of the PVM subtype.
The overall compositional variation of AD phenotypes was comparable to aging except for the GPNMB subtype (Figure 3C). The GPNMB subtype was an outlier and showed the largest effect size across all 4 AD phenotypes, suggesting that proliferation of the GPNMB subtype is a hallmark of AD.
Next, we evaluated genes exhibiting differential expression patterns across four different measures of AD neuropathology (Figure 3D). Our analysis led us to discover candidate AD risk genes, including PTPRG, DPYD, and IL15, which displayed upregulation across all measures of AD severity. Pathway enrichment analysis revealed the PICALM and GPNMB subtypes share common pathways related to the regulation of cell adhesion (Figure S3C). In contrast, both the FRMD4A and CD163 subtypes appear to be associated with negative regulation of cell projection organization.
Given the strong compositional shifts and gene signatures for AD phenotypes, we tested the presence of AD signatures using bulk microglia RNA-seq data (BulkMG; Methods). First, we created myeloid subtype signatures from both the FreshMG and PsychAD datasets by aggregating gene expression by subtype. We then compared the resulting subtype signatures to BulkMG gene expression data, stratified by AD case and control status. Interestingly, the Pearson correlation between subtypes and AD diagnosis clearly reflected the compositional shifts we observed across multiple AD phenotypes (Figure 3E). The FRMD4A, TMEM163, CCL3, and HSPA1A signatures closely correlated with the BulkMG from controls, while the PICALM, CD163, GPNMB, and HIF1A signatures closely matched those from AD cases. These results independently validate the observed changes in the myeloid transcriptome during the onset and progression of AD.
To understand the dynamic changes that take place during the onset and progression of AD at a molecular level, we expanded our analysis from using discrete donor-level clinical variables, such as CERAD scores or Braak staging, to a continuous pseudotime measure by ordering cells along a disease trajectory. Here, we estimated Braak-stage-informed ancestor-progenitor relations between observations through transport maps between neighboring disease stages using Moscot28, a toolbox for optimal transport-based analyses of single-cell data. We then quantified cell-cell transition probabilities, computed putative drivers, and constructed the disease-stage-informed pseudotime with CellRank 229 (Figure S5A; Methods; Supplementary Information). Overall, we observed an increase in pseudotime with disease progression (Figure S5E). Stratified by subtypes, we observed that PICALM homeostatic microglia were assigned larger pseudotime values, compared to FRMD4A homeostatic cells (p-value < 0.001, Figure 3F), indicating their association with disease progression and aligning with the compositional variation of AD phenotypes observed earlier. To identify potentially critical time points for disease progression, we compared changes in pseudotime across disease stages for each myeloid subtype (Methods). This analysis revealed that the change was most pronounced starting from Braak stage 3 (Figure 3G). Similarly, when the disease pseudotime was stratified by CERAD stages, the largest change occurred between CERAD stages 3 and 4.
Upstream master regulators of human myeloid cells in AD
After identifying potential AD risk genes, we analyzed myeloid gene regulatory networks (GRNs) in an attempt to discover key upstream transcriptional regulators. Using SCENIC30,31, we constructed GRNs based on expression data and known transcription factor (TF) binding motifs and defined units of regulatory hierarchy (regulons) (Figure 4A). Subsequently, we assessed the enrichment of the regulon for each myeloid subtype independently (Methods), revealing high concordance between the FreshMG and PsychAD cohorts (Figure 4B). We then derived combined regulon enrichment scores using meta-analysis (Methods) and observed strong regulon subtype-specificity (Figure 4C). The FRMD4A and PICALM homeostatic subtypes were defined by enrichment of KLF12, GLIS3, and BACH2 regulons, while the PICALM, CD163, and GPNMB subtypes displayed exclusive enrichment of the MITF regulon. To link inferred regulons to differentially expressed AD genes, we performed enrichment tests using 4 different types of AD risk signatures (Methods). Notably, the target genes of MITF, KLF12, and GLIS3 TFs were significantly associated with AD risk profiles in the PICALM, FRMD4A, GPNMB, and HIF1A subtypes (Figure 4D). MITF was preferentially enriched with upregulated AD signatures, whereas KLF12 and GLIS3 were more preferentially associated with downregulated AD signatures. Visualization of the joint MITF-KLF12-GLIS3 regulon network with AD risk genes revealed coordinated modulation of both up and down-regulated candidate risk genes (Figure 4E). These findings collectively suggest the coordinated activity of MITF, KLF12, and GLIS3 in regulating AD risk gene expression in disease-associated microglia states. Functional enrichment analysis revealed MITF, KLF12, and GLIS3 target genes involved in key biological processes for microglia function such as phagocytosis, cytokine production, and cellular response (Figure 4F). We find that MITF distinctly regulates phagocytic-related pathways in line with previous findings from in-vitro models23. In summary, by integrating SCENIC network inference with the analysis of differentially expressed genes in AD, we nominate MITF, KLF12, and GLIS3 as potential upstream master regulators of gene expression changes relevant to AD pathogenesis.
Cellular crosstalk between human myeloid subtypes
To gain mechanistic insights into how different human myeloid subtypes communicate with each other and mediate AD risks, we investigated the change of cell-to-cell interactions (CCIs) at different stages of AD using the LIANA framework32 (Figure S6C). For each individual, we inferred the magnitude, specificity, and directionality of cell-to-cell communication using gene expression profiles and known ligand-receptor interactions. As expected, we observed strong concordance between the magnitude of CCI activities from FreshMG and PsychAD cohorts (Figure 5A). The correlation analysis revealed homeostatic, PVM, and ADAM subtypes had better consistency and sufficient statistical power, whereas rare subtypes like MKI67 and CCL3 had lower activity and inconsistency possibly due to limited statistical power. By evaluating the CCI magnitude scores as a function of all 4 AD phenotypes using a linear mixed model, we identified differential CCIs associated with AD (Figure 5B, S6D). We found contrasting candidate CCIs (APOE→SORL1 and MRC1→PTPRC) working in opposing directions. The APOE→SORL1 interactions were more frequent and up-regulated in AD and were prioritized as the top AD-relevant CCI, while MRC1→PTPRC interactions were down-regulated in AD. Notably, we also observed the APOE→TREM2 interaction among the top CCIs associated with AD. To test for genetic association, we stratified the CCIs by AD case-and control-associated ligands and receptors and performed the gene-set analyses using GWAS data (Figure 5C). Interestingly, we found a strong association of AD-associated receptors with AD risk but not with ligands. Visualizing the CCIs as directional networks in the context of different myeloid subtypes placed the GPNMB subtype as the hub mediating AD risks (Figure 5D). Finally, the pathway enrichment analysis indicated interactions involved with the ADAM_GPNMB subtype were enriched with lipid metabolism and regulation of proteolysis (Figure 5E).
Polygenic disease risk scoring of human myeloid subtypes
We estimated the polygenic disease risk at single-cell resolution using GWAS disease-associated genes and scored myeloid subtypes responsible for heritable risks (scDRS; Methods; Figure S7A). We extended the analysis to a set of the brain related diseases beyond AD including schizophrenia (SCZ), bipolar disorder (BD), major depressive disorder (MDD), autism spectrum disorder (ASD), multiple sclerosis (MS), amyotrophic lateral sclerosis (ALS), and Parkinson’s disease (PD). The polygenic risk scores for each trait were highly reproducible between the FreshMG and PsychAD cohorts with AD and MS having the greatest correlation (Figure S7B). The meta-analysis of both FreshMG and PsychAD cohorts indicated that the 9 subtypes of myeloid cells were significantly associated with heritable AD risk, which was the largest of all brain diseases followed by MS, SCZ, and MDD (Figure 6A). Notably, the GPNMB subtype had the widest coverage showing significant heritable risks for all 8 diseases. When we evaluated the association of polygenic AD risk scores with the compositional variation observed in AD, we observed a positive correlation (Pearson’s r=0.55) indicating cells with greater heritable risks become enriched in AD (Figure 6B). This also suggested heritable risks might play a role in driving the compositional changes of myeloid subtypes. To ensure that the heritable risk estimates were not driven from the AD cases, we also calculated the AD risk scores after splitting donors by AD cases and controls (case-control concordance, Methods; Figure S7A). We showed that the polygenic AD risk scores were concordant between AD cases and controls, and thus showing that the estimates were independent from AD status suggesting the heritable risks could be inherent to myeloid subtypes (Figure S7C).
Next, we calculated per-donor AD polygenic risk scores (PRS) and evaluated their association with changes in myeloid subtype composition using a linear mixed model (PRS, Methods; Figure S7A). The GPNMB subtype was among the most positively and significantly changed (Figure S7D), and we observed a similar compositional variation between AD phenotype (dx_AD) and PRS, suggesting that changes in myeloid subtypes were partially driven by AD genetic risk predisposition (Figure 6C circle). We further showed this association was not purely driven by the AD status alone as the same compositional variation was observed using a disease-free subset (Figure 6C triangle).
Discussion
The human myeloid cell atlas presented here underscores the importance of their functional plasticity throughout life, reflecting their ability to adapt to their microenvironment progressively and reversibly. Of particular interest is their behavior in the context of normal aging and disease. The contrasting regulatory interplay between two homeostatic microglial subtypes exemplifies the dual role of myeloid cells, providing neuroprotective benefits in the early stages of AD, while exacerbating the phenotype, via neuroinflammation, as the disease progresses. We find that the shift in balance between FRMD4A and PICALM homeostatic subtypes is associated with the onset of AD and suggest that this imbalance might play a role in compromising the efficacy of their respective neuroprotective roles. Identifying these early changes in gene expression could be crucial for the early diagnosis and treatment of AD.
Our comprehensive analyses uncover striking similarities between normal aging and AD pathology. We speculate that the natural aging process is accelerated in AD, and follows a similar trend for all subtypes, with the exception of ADAM. We show that the ADAM subset is enriched with AD-associated gene signatures and the related transcriptomic changes are unique to AD and go beyond normal age-related changes. Through our GRN analysis, we prioritize MITF as the master regulator of AD risk signatures, governing the expression of numerous AD-associated genes, including APOE, DPYD, TREM2, and PTPRG13,33,34. The MITF network is specifically enriched for phagocytic activity markers and has previously been studied as a key regulator of homeostatic microglia functions, in particular for driving autophagic states, and for allowing microglia to move, sense, and clear Aβ/Tau proteinopathies22,23,35–37. Moreover, through our CCI analysis, we prioritize ligand-receptor interactions mediating the aggravation of AD progression and place ADAM as key players. The significant enrichment of AD genetic risk loci (APP, TREM2, SORL1, SORT1, ABCA1, TSPAN14) within the prioritized receptors of AD-associated CCIs suggests potential mechanisms behind the genetic susceptibility to AD. Similarly, our heritability analysis further strengthens the hypothesis that there might be a genetic basis for the observed molecular changes. Finally, based on polygenic risk scores, we show that a portion of AD-associated transcriptomic changes are inherent, independent of AD diagnosis.
Taken together, our findings shed light on the intricate molecular changes underlying the plasticity of human myeloid cells that predispose them to AD. As the number of people living with AD is expected to double by 2050, there is an urgent need to address the enormous societal impact and serious challenges that AD poses to our healthcare and economic systems. The data presented here are a valuable resource for identifying novel therapeutic strategies to combat this debilitating neurodegenerative disorder.
Methods
Sources and description of human biosamples
All brain specimens were obtained through informed consent via brain donation programs at the respective organizations. All procedures and research protocols were approved by the respective ethical committees of our collaborator’s institutions. The FreshMG samples (n=137) were taken from 96 fresh postmortem autopsy samples obtained at the Mount Sinai/JJ Peters VA Medical Center NIH Brain and Tissue Repository (NBTR) in the Bronx, NY. An additional set of 41 fresh postmortem autopsy samples was obtained from participants in the Religious Orders Study or Rush Memory and Aging Project (ROSMAP) at Rush Alzheimer’s Disease Center (RADC) in Chicago, IL. Both studies were approved by an Institutional Review Board of Rush University Medical Center and all participants signed informed and repository consents and an Anatomic Gift Act38. The PsychAD cohort comprises 1,470 donors from three brain banks, Mount Sinai NIH Brain Bank and Tissue Repository (MSSM; 1,023 samples), NIMH Human Brain Collection Core (HBCC; 295 samples), and ROSMAP (RUSH; 152 samples). Finally, LivingMG biopsies were collected from patients undergoing procedures for intracerebral hemorrhage evacuation (STUDY-18-01012A), as described previously10.
Collection and harmonization of clinical, pathological, and demographic metadata
Since the brain tissue specimens were collected from three different sites, the available clinical data varies as a function of source. As such, we used the following scheme to harmonize available clinical, pathological, and demographic metadata: the CERAD scoring scheme for neuritic plaque density19 was harmonized for consistency across multiple brain banks, where the scores range from 1 to 4, with increasing CERAD number corresponding to an increase in AD burden; 1=no neuritic plaque (normal brain), 2=sparse (possible AD), 3=moderate (probable AD), 4=frequent (definite AD). Samples from ROSMAP used consensus summary diagnosis of no cognitive impairment (NCI), mild cognitive impairment (MCI), and dementia and its principal cause, Alzheimer’s dementia39–41. MSSM/VA samples used clinical dementia rating (CDR), which was based on a scale of 0-5; 0=no dementia, 0.5=questionable dementia (very mild), 1=mild dementia, 2=moderate dementia, 3=severe dementia, 4=profound dementia, 5=terminal dementia. After consulting with clinicians, we created a harmonized ordinal variable where dementia is categorized into three levels of cognitive decline, independent of AD diagnosis; 0=no cognitive impairment, 0.5=MCI (mild cognitive impairment), and 1=dementia. In addition to AD phenotype, we collected comprehensive demographic (age, sex, and genetic ancestry) and technical variables (brain bank, sequencing facility, sequence pooling information, postmortem interval (PMI; measured in minutes), APOE genotype) to describe each cohort (Figure S1, Table S1).
Clinical diagnosis of AD
For analysis comparing donors with AD cases and neurotypical controls, a binary clinical diagnosis variable for AD, dx_AD, was defined as follows. Individuals with CERAD 2, 3, or 4, Braak ≥ 3, and CDR ≥ 1 for MSSM/VA or Alzheimer’s dementia for ROSMAP were classified as AD cases. Controls were defined as individuals with CERAD 1 or 2 and Braak 0, 1, or 2.
Measuring AD neuropathology
For analysis comparing donors with pathologic AD, the following variables were used to measure the severity of AD neuropathology. CERAD score19. A quantitative measure of Aβ plaque density where 1 is normal, 2 is possible AD, 3 is probable AD, and 4 is definite AD39. Braak AD-staging score measuring progression of neurofibrillary tangle neuropathology (Braak & Braak-score, or BBScore). A quantitative measure of the regional patterns of neurofibrillary tangle (NFT) density across the brain, where 0 is normal and asymptotic, 1-2 indicate initial stages where NFT begins to appear in the locus coeruleus and the transentorhinal region, 3-4 indicate progression to limbic regions, such as the hippocampus and amygdala, and 5-6 indicate NFT are widespread, affecting multiple cortical regions42–44.
Measuring cognitive impairment
For analysis comparing donors with AD-related dementia, the following variable was used to measure the severity of cognitive impairment.
Clinical assessment of dementia
A harmonized variable of cognitive status based on CDR scale for MSSM/VA or NCI, MCI, Alzheimer’s dementia for ROSMAP. We used the three-level ordinal categories of clinical dementia to measure the severity of dementia, in which 0 indicates no dementia, 0.5 indicates minor cognitive impairment, and 1.0 indicates definite clinical dementia.
Isolation and fluorescence-activated cell sorting (FACS) of microglia from fresh brain specimens (FreshMG and LivingMG)
Fresh brain tissue specimens were placed in tissue preservation buffer (Miltenyi Biotech) and stored at 4°C for ≤48hrs before processing using the Adult Brain Dissociation Kit (Miltenyi Biotech), according to manufacturer’s instructions. RNase inhibitors (Takara Bio) were used throughout cell preparation. Following de-myelination (Miltenyi de-myelination kit – Miltenyi Biotech) cells were incubated in antibody (CD45: BD Pharmingen, Clone HI30 and CD11b: BD Pharmingen, Clone ICRF44) at 1:500 for 1 hour in the dark at 4°C with end-over-end rotation. Prior to fluorescence-activated cell sorting (FACS), DAPI (Thermoscientific) was added at 1:1000 to facilitate the selection of viable cells. Viable (DAPI negative) CD45/CD11b positive cells were isolated by FACS using a FACSAria flow cytometer (BD Biosciences). Following FACS, cellular concentration and viability were confirmed using a Countess automated cell counter (Life technologies).
Isolation and fluorescence-activated nuclear sorting (FANS) of nuclei from frozen brain specimens (PsychAD), with hashing
All buffers were supplemented with RNAse inhibitors (Takara). 25mg of frozen postmortem human brain tissue was homogenized in cold lysis buffer (0.32M Sucrose, 5 mM CaCl2, 3 mM Magnesium acetate, 0.1 mM, EDTA, 10 mM Tris-HCl, pH8, 1 mM DTT, 0.1% Triton X-100) and filtered through a 40 µm cell strainer. The flow-through was underlaid with sucrose solution (1.8 M Sucrose, 3 mM Magnesium acetate, 1 mM DTT, 10 mM Tris-HCl, pH8) and centrifuged at 107,000 g for 1 hour at 4°C. Pellets were resuspended in PBS supplemented with 0.5% bovine serum albumin (BSA). 6 samples were processed in parallel. Up to 2M nuclei from each sample were pelleted at 500 g for 5 minutes at 4°C. Nuclei were re-suspended in 100 µl staining buffer (2% BSA, 0.02% Tween-20 in PBS) and incubated with 1 µg of a unique TotalSeq-A nuclear hashing antibody (Biolegend) for 30 min at 4°C. Prior to FANS, volumes were brought up to 250 µl with PBS and 7aad (Invitrogen) added according to the manufacturer’s instructions. 7aad positive nuclei were sorted into tubes pre-coated with 5% BSA using a FACSAria flow cytometer (BD Biosciences).
scRNA-seq and CITE-seq library preparation (FreshMG and LivingMG)
Following FACS, 10,000 cells were processed using 10x Genomics single cell 3’ capture reagents (10x Genomics), according to the manufacturer’s instructions. In parallel, CITE-seq was performed on a subset of samples (n=3 donors, n=8 replicates per donor) using the TotalSeq™-A Human Universal Cocktail (BioLegend) with 154 unique cell surface antigens, including principal lineage antigens, and includes 9 isotype control antibodies to survey surface antigens. CITE-seq was performed according to the manufacturer’s instructions. For the CITE-seq experiment, a total of 80,000 cells were loaded on 10x Genomics B chips (10,000 of each uniquely barcoded sample aliquot per B chip lane), with a total targeted recovery of around 40,000 cells.
snRNA-seq and hashing library preparation (PsychAD)
Following FANS, nuclei were subjected to 2 washes in 200 µl staining buffer, after which they were re-suspended in 15 µl PBS and quantified (Countess II, Life Technologies). Concentrations were normalized and equal amounts of differentially hash-tagged nuclei were pooled. A total of 60,000 (10,000 each) pooled nuclei were processed using 10x Genomics single cell 3’ v3.1 reagents (10x Genomics). Each pool was run across x2 10x Genomics lanes to create a technical replicate. At the cDNA amplification step (step 2.2) during library preparation, 1 µl 2 µm HTO cDNA PCR “additive” primer v3.1 was added (PMID: 30567574). After cDNA amplification, supernatant from 0.6x SPRI selection was retained for HTO library generation. cDNA library was prepared according to the 10x Genomics protocol. HTO libraries were prepared as previously described (PMID: 30567574). cDNA and HTO libraries were sequenced at NYGC using the Novaseq platform (Illumina).
Processing of scRNA-seq data (FreshMG)
We developed a tracking platform to record all technical covariates (such as 10x Genomics lot kit number, dates of different preparations, viable cell counts, etc.) and quality metrics derived from data preprocessing.
Alignment
Paired-end scRNA-seq reads were aligned to the hg38 reference genome and the count matrix was generated using 10x cellranger count (v7.0.0). Subsequently, we used the CellBender45 to carefully separate out true cells from empty droplets with ambient RNA from raw unfiltered cellranger output.
QC
We performed the downstream analysis by aggregating gene-count matrices of multiple samples. A battery of QC tests was performed to filter low-quality libraries and non-viable cells within each library using Pegasus (v1.7.0)46. Viable cells were retained based on UMI (1,000 ≤ n_UMI ≤ 40,000), gene counts (500 ≤ n_genes ≤ 8,000), and percentage of mitochondrial reads (percent_mito ≤ 20). We also checked for possible contamination from ambient RNA, a fraction of reads mapped to non-mRNA like rRNA, sRNA, pseudogenes, and known confounding features such as lncRNA MALAT1. Further filtering was carried out by removing doublets using the Scrublet method47. After filtering, the retained count matrix was normalized and log-transformed. Batch correction. We assessed the correlation between all pairs of technical and biological variables using Canonical Correlation Analysis and used the Harmony method48 to regress out unwanted confounding variables such as the source of brain tissue. Clustering. From the kNN graph calculated from the PCA, we clustered cells in the same cell state using Leiden49 clustering. We use UMAP50 for the visualization of resulting clusters. Cells identified as T cells, NK cells, monocytes, neutrophils, oligodendrocytes, and astrocytes were removed, and those identified as microglia and PVMs were carried forward for subsequent taxonomic analysis.
Processing of snRNA-seq data (PsychAD)
Alignment
Samples were multiplexed by combining 6 donors in each nuclei pool using hashing, and each biosample was processed in duplicate to produce technical replicates. Paired-end snRNA-seq libraries were aligned to the hg38 reference genome using STAR solo51,52 and multiplexed pools were demultiplexed using genotype matching via vireoSNP53. After per-library count matrices were generated, the downstream processing was performed using pegasus v1.7.046 and scanpy v1.9.154.
QC
We applied rigorous three-step QC to remove ambient RNA and retain nuclei for subsequent downstream analysis. First, the QC is applied at the individual nucleus level. A battery of QC tests was performed to filter low-quality nuclei within each library. Poor-quality nuclei were detected by thresholding based on UMI (1,179 ≤ n_UMI ≤ 200,000; determined based on median absolute deviation of n_UMI distribution), gene counts (986 ≤ n_genes ≤ 15,000; determined based on median absolute deviation of n_genes distribution), and percentage of mitochondrial reads (percent_mito ≤ 1). We also checked for possible contamination from ambient RNA, the fraction of reads mapped to non-mRNA like rRNA, sRNA, and pseudogenes, as well as known confounding features, such as the lncRNA MALAT1. Second, the QC was applied at the feature level. We removed features that were not robustly expressed in at least 0.05% of nuclei. Lastly, the QC was applied at the donor level. We removed donors with very low nuclei counts, which can introduce more noise to the downstream analysis. We also removed donors with low genotype concordances. Further filtering was carried out by removing doublets using the Scrublet method47.
Batch correction
We assessed the correlation between all pairs of technical variables using Canonical Correlation Analysis and used the Harmony method48 to regress out unwanted variables such as the effect of brain tissue sources.
Clustering
Highly variable features were selected from mean and variance trends, and we used the k-nearest-neighbor (kNN) graph calculated on the basis of harmony-corrected PCA embedding space to cluster nuclei in the same cell type using Leiden49 clustering algorithms. We used UMAP50 for the visualization of the resulting clusters.
Isolation of myeloid cells
Identified cell-type clusters were annotated based on manual curation of known gene marker signatures obtained from Human Cell Atlas and human DLPFC study55. Classes of immune cells, including Microglia and PVM, were isolated and subjected to myeloid subtype annotation and downstream analysis.
Processing of bulk RNA-seq data (BulkMG)
Fresh brain tissue specimens were placed in tissue preservation buffer (Miltenyi Biotech) and stored at 4°C for ≤48hrs before processing using the Adult Brain Dissociation Kit (Miltenyi Biotech), according to manufacturer’s instructions. RNase inhibitors (Takara Bio) were used throughout the cell prep. Following de-myelination (Miltenyi de-myelination kit – Miltenyi Biotech) cells were incubated in antibody (CD45: BD Pharmingen, Clone HI30 and CD11b: BD Pharmingen, Clone ICRF44) at 1:500 for 1 hour in the dark at 4°C with end-over-end rotation. Prior to fluorescence-activated cell sorting (FACS), DAPI (Thermoscientific) was added at 1:1000 to facilitate the selection of viable cells. Viable (DAPI negative) CD45/CD11b positive cells were isolated by FACS using a FACSAria flow cytometer (BD Biosciences). Following FACS, cellular concentration and viability were confirmed using a Countess automated cell counter (Life technologies).
RNA was extracted from aliquots of up to 100,000 FACS-sorted microglia using the Arcturus PicoPure RNA isolation kit (Applied Biosystems). RNA-sequencing libraries were generated using the SMARTer Stranded Total RNA-Seq Kit v2 (Takara Bio USA). Libraries were quantified by Qubit HS DNA kit (Life Technologies) and by quantitative PCR (KAPA Biosystems) before sequencing on the Hi-Seq2500 (Illumina) platform obtaining 2x100 paired-end reads.
Count matrices were generated using Kallisto pseudo-mapping56 using the standard Genecode v38 reference (starting with 235,227 transcripts for 60,535 unique genes). For gene-level analyses, 21,856 features were retained for downstream analyses after filtering for features with CPM>1 in at least 15% of samples. Correct identity of the samples was confirmed by concordance between the genetic variants obtained from RNA-seq with those obtained from ATAC-seq, or directly available genotypes, as available.
Spatial validation using Akoya PhenoCycler
FFPE sections from both AD and control cases were used for the Akoya PhenoCycler experiment. The experiments were performed according to the manufacturer’s protocol, with the Neuroinflammation Module, Neuroscience Core Panel and Immune Module provided by Akoya. Briefly, samples were deparaffinized and hydrated. For antigen retrieval, samples were boiled in Tris-EDTA pH 9 for 20 minutes in a programmable pressure cooker. Samples were stained in Antibody Cocktail Solution containing antibodies (Table S5) and PhenoCycler Blocking Buffer. Following staining, samples were washed, fixed, and loaded on the PhenoCycler, with data generated using the automatic workflow. Akoya PhenoCycler results were saved as .qpproj files. and protein expression quantified using QuPath57. After the sections were annotated, cells were segmented with the QuPath extension StarDist fluorescent cell detection script, with dsb2018_paper.pb as a training model. Protein expression was quantified using raw channel intensity with spatial boundaries of cells inferred by export measurement and export detection commands using QuPath.
Compositional variation analysis
We applied Crumblr method (https://diseaseneurogenomics.github.io/crumblr) for testing the variation of cell type composition. Analysis of count ratio data (i.e., fractions) requires special consideration since data is non-normal, heteroskedastic, and spans a low-rank space. While counts can be considered directly using Poisson, negative binomial, or Dirichlet-multinomial models for simple regression applications, these can be problematic since they 1) can be very computationally expensive, 2) can produce poorly calibrated hypothesis tests, and 3) are challenging to extend to other applications. The widely used centered log-ratio (CLR) transform from compositional data analysis makes count ratio data more normal and enables use the of linear models and other standard methods. Yet CLR-transformed data is still highly heteroskedastic: the precision of measurements varies widely. This important factor is not considered by existing methods. crumblr uses a fast asymptotic normal approximation of CLR-transformed counts from a Dirichlet-multinomial distribution to model the sampling variance of the transformed counts. crumblr enables incorporating the sampling variance as precision weights to linear (mixed) models in order to increase power and control the false positive rate. crumblr also uses a variance stabilizing transform (vst) based on the precision weights to improve the performance of PCA and clustering.
Differential gene expression analysis
We applied Dreamlet for differential expression analysis. Building from the previously developed statistical tool Dream58, it applies linear mixed models to the differential expression problem in single-cell omics data. It starts by aggregating cells by the donor using a pseudobulk approach59,60 and fits a regression model and cell. For each feature and cell cluster, the following mixed model was applied: Gene expression ∼ Intercept + age + (1|sex) + (1|ancestry) + PMI + (1|batch) + (1|source) + phenotype, where categorical and numerical variables were modeled as random and fixed effects, respectively. We ran gene set analysis using the full spectrum of gene-level t-statistics61.
Constructing gene regulatory networks
We inferred gene regulatory networks with the pySCENIC (v 0.12.1)30,31 pipeline using a concatenated dataset of FreshMG and PsychAD microglia cohorts. We followed the standard SCENIC expression preprocessing; log normalizing expression counts and selecting highly variable genes (3,192 total) while accounting for batch correction between datasets with scanpy (v1.9.3). The pySCENIC’s GRNboost2 (arboreto v0.1.6) method was utilized for gene regulatory network inference. The pySCENIC’s cisTarget function with Human motif database v10 (https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl) was used to enrich for gene signatures and pruned based on cis-regulatory cues using default settings. The “aucell” positional argument was utilized to find the enrichment of regulons across single cells.
To compare regulon enrichment between subtypes, the resulting AUCell matrix was z-score normalized. To assess the concordance between regulon enrichment across subtypes between FreshMG and PsychAD cohorts, we computed the Pearson correlations between normalized z-scores using the pandas corrwith function. For each dataset, we computed the normalized regulon enrichment Z-score, and performed a meta-analysis between two datasets using Stouffer’s method.
To test whether there is a significant association between the target genes of a TF and disease signatures, we performed Fisher’s exact tests between SCENIC GRN target genes and AD risk gene signatures, based on Dreamlet DEG analysis. Disease-associated genes with FDR < 0.05 were selected based on four different measures of AD severity, namely, case-control diagnosis, Braak, CERAD, and dementia status. The top 3 regulons were prioritized based on the overall enrichment of the AD DEG signature. The similarity between target genes of regulons was evaluated using the Jaccard similarity index. We found the target genes of the top 3 TFs shared high similarity and clustered distinctly from other regulons (Figure S4A). Node centrality was calculated using PageRank analysis, which measures a ranking of the nodes in the graph based on the structure of the incoming links (Figure S4B). The mean estimate and -log10FDR of target risk genes in SCENIC regulons are visualized using the importance score edge weights from GRNboost2 with networkX (v3.1). For each regulon, a list of SCENIC GRN TF target genes after cisTarget pruning were obtained and, using gseapy (v1.0.5)62, tested for gene-set enrichment based on the Gene Ontology Biological Processes 2023 reference62. GO terms were clustered based on their Ward distance between -log10FDR values.
Constructing a pseudotemporal trajectory of AD
We followed the same steps when constructing Braak-stage-informed pseudotimes for the FreshMG and PsychAD datasets, respectively. First, cells that were not assigned a Braak stage were excluded and the data subsetted to contain either only cells annotated as adaptive, homeostatic, ADAM, or PVM. We then computed a k-nearest-neighbor graph using SCANPY54 (scanpy.pp.neighbors, k=30, n_pcs=15) on the PCA embedding, regressed with Harmony. Finally, transport maps were computed between pairs of consecutive disease stages using moscot’s TemporalProblem28.
Identification of driver genes with CellRank 2
To identify drivers of disease progression within each cell type, we first estimated a cell-cell transition matrix using the RealTimeKernel implemented in the CellRank framework29. Disease stage heterogeneity within each Braak stage was accounted for by including cell-cell similarity (conn_weight=0.1). Since moscot relies on entropical regularization to solve the underlying optimal transport problem, it returns dense transport maps, leading to negligible transition probabilities and large memory requirements when computing the transition matrix. To render our downstream analysis computationally feasible, we thresholded the constructed transition matrix using the RealTimeKernel’s automatic thresholding scheme (threshold=”auto”).
Based on the RealTimeKernel-derived transition matrix, we estimated terminal states with the GPCCA estimator29,63. In the case of adaptive and homeostatic cells, we computed terminal states at the resolution of subclasses (MG_Adapt, MG_Homeo), for the PVM subset based on the subtype (PVM_CD163, ADAM_GPNMB) annotation. We confirmed the estimated terminal states as biologically plausible by quantifying the cell type and disease stage purity, respectively. Given cells (n=30 in this study) identifying a terminal state, the disease stage purity is defined as With C denoting the set of cells identifying the terminal state, Bj the set of cells with Braak stage J, and |.| the cardinality of the set. Cell type purity is defined similarly.
We computed fate probabilities towards each terminal state next and identified driver genes by correlating gene expression with fate probabilities as outlined in the corresponding CellRank tutorial. To compare the gene ranking between the FreshMG and PsychAD cohorts, we subsetted to genes present in both datasets and computed the Pearson correlation coefficient between the gene-specific correlations and by CellRank: with sample mean r̄.
Pseudotime construction
DPT64 is traditionally calculated on a symmetric connectivity matrix, the constructed transition matrix is, however, non-symmetric. To compute DPT based on our Braak-stage-informed transition matrix, we, thus, symmetrized via and row-normalized entries. Following, we computed diffusion pseudotime using SCANPY’s scanpy.tl.dpt function with default values. The corresponding initial cells were identified via extreme points in diffusion components65. To verify that the constructed pseudotime recapitulates our findings that homeostatic FRMD4A cells decline with disease progression, while the number of homeostatic PICALM cells increases, we stratified pseudotime by subtype (Figure 5C). Next, we performed a Welch’s t-test to validate that homeostatic FRMD4A cells are assigned significantly smaller pseudotime values compared to the set of homeostatic PICALM. We performed similar Welch’s t-tests to assess the change between consecutive Braak stages. For each dataset, we first computed the T-statistics from Welch’s t-test and then combined them across datasets using Stouffer’s method: Considering T-statistics TfreshMG and TpsychAD, the reported T-statistic T given by As a final analysis, we correlated fate probabilities with gene expression for each lineage to identify putative driver genes for each lineage. Here, we included genes present in both datasets. To confirm concordance between the two independent cohorts, we computed the Spearman correlation between the two rankings.
Inferring cell-to-cell interactions
CCI analysis relies on inference of ligand-receptor interactions given a gene expression matrix with annotated cell types. By comparing to a known set of ligand-receptor (LR) pairs (reference “resource”), CCI can be inferred and scored through a variety of methods. LR pairs across cell types that are expressed above a set threshold and are differentially co-expressed, are inferred to represent interactions between the cell types. The LIANA32 framework combines several established CCI inference methods and reference resources. We used the Python implementation of LIANA (v0.1.8).
For each donor in FreshMG and PsychAD, we separately ran the standard rank_aggregate pipeline (resource_name=’consensus’, expr_prop=0.1, min_cells=5, n_perms=1000). This utilizes the RobustRankAggregate66 algorithm for aggregating scores from several methods (CellPhoneDB, Connectome, log2FC, NATMI, SingleCellSignalR, and CellChat)32,67–71 into a single magnitude score for each CCI. We used the standard “consensus” CCI reference resource provided by LIANA. For a given pair of cell types and pair of genes to be considered for CCI scoring, each gene must be found in the reference resource, and be expressed in at least 10% of each involved cell type and in at least 5 cells. We evaluated CCI using the rank aggregate magnitude score for cell types annotated at the subtype level. We negative log-transformed these scores so that low magnitude CCI are scored close to 0 and normalized the score distribution. Additional post-processing was then performed to filter out CCI that were not expressed in both FreshMG and PsychAD.
To assess the concordance between FreshMG and PsychAD CCI scores, we first computed the average CCI score for each CCI across donors within each cohort. We then grouped the average scores by each possible pair of subtypes and measured the Spearman rank correlation. We additionally computed an activity z-score for the frequency of CCI involving each subtype pair (Figure 5A).
To determine whether CCI are differentially expressed in AD, we applied Dream13 to construct a linear mixed model and account for technical and biological covariates (batch, age, sex). Linear mixed model regression was performed separately for FreshMG and PsychAD across four measures of AD progression: diagnosis (dx_AD), CERAD score, Braak stage, and dementia status. We then also used Dream to meta-analyze CCI that are significantly differentially expressed across both cohorts and to estimate the log fold change effect of each CCI for each AD progression measure (Figure 5B, Supplementary Figure 6D).
To evaluate whether significantly differentially expressed CCI are enriched in AD genome-wide association studies (GWAS), we utilize the Multi-marker Analysis of GenoMic Annotation (MAGMA)72. The MAGMA gene set was constructed from significantly differentially expressed CCI LR pairs. These include CCI with at least a false discovery rate (FDR) of 0.05 significance, as determined by the Dream regression meta-analyses across each AD progression measure. We normalized the log2 fold change (log2FC) values for these CCI for each progression measure and split the CCI into AD (log2FC > 0) and CTRL (log2FC < 0) groups. Top LR pairs for AD and CTRL were then ranked by selecting the LR pairs involved in the CCI with the largest absolute normalized log2FC values. The ranked AD and CTRL LR pair gene sets were analyzed by the MAGMA for enrichment across several GWAS (Figure 5C). We generated a network diagram to highlight the top-ranked three AD and CTRL CCI in our gene set (Figure 5D).
To investigate the biological processes AD-relevant CCI are involved in, we performed a gene set enrichment analysis using the cell2cell73 package and the human Gene Ontology Biological Processes 2023 reference74,75 The AD-relevant CCI gene set was constructed in the same way as the MAGMA analysis gene set. We aggregated these CCI by the sender and receiver subtype to improve statistical power, and then computed normalized enrichment scores for each process based on our AD-relevant LR pair gene set. We prioritized processes that pass the 0.05 FDR significance threshold (Figure 5E).
Analysis of genetic heritability of AD polygenic risk
We established a standardized pipeline for Multi-marker Analysis of GenoMic Annotation (MAGMA) followed by single-cell Disease-Relevance Scoring (scDRS). MAGMA incorporates the association p-values of genetic variants from the latest AD genome-wide association study (GWAS)9. We applied MAGMA using a standard window of 35kbp upstream and 10kbp downstream around the gene body. We executed scDRS using the top 1000 gene weights, sorted by Z score. The MAGMA and scDRS pipeline was run separately on FreshMG and PsychAD single-cell cohorts using the following parameters. MAGMA was run using -snp-loc g1000_eur.bim (SNP location file corresponding to the Phase 3 1000 Genome Project) and --gene-loc NCBI38.gene.loc (gene location file from NCBI build 38). Both files were obtained from https://ctg.cncr.nl/software/magma.
To justify the use of the top 1000 genes for scDRS, genes were sorted by MAGMA z-score and top 200, 500, and 1000 genes were used for testing the concordance between cohorts in subsequent scDRS analysis. scDRS command scdrs.preprocess was run with the default parameters of n_mean_bin=20, n_var_bin=20w, while scdrs.score_cell was run using n_ctrl=100. Best concordance between FreshMG and PsychAD cohorts of average scDRS scores (per myeloids subtype) was achieved using top 1000 genes (Pearson’s R of top n MAGMA genes; n=200 was 0.85, n=500 was 0.94, n=1000 was 0.97, hence we proceeded with the analysis using top 1000 MAGMA genes).
We tested the following GWAS summary stats in scDRS/MAGMA pipeline: AD9, MS76, PD77, MDD78, ASD79, BD80, SCZ81 and ALS82. The scDRS scores were highly reproducible between the FreshMG and PsychAD cohorts, with AD having the greatest correlation (r=0.97), followed by MS (r=0.92), PD (r=0.83), MDD (r=0.79), ASD (r=0.71) and ALS (r=0.66). The lowest correlation was noted for BD (r=0.60) and SCZ (r=0.42), both of which are neuropsychiatric diseases. This emphasized a high reproducibility of myeloid cell heritability estimates in AD, MS, and PD, all of which are neurodegenerative diseases with progressive damage to neuronal cell types, and whose primary pathogenic mechanisms involve non-neuronal cells, including microglia and PVM. Neuropsychiatric diseases such as SCZ and BP, whose primary risks are enriched in synaptic dysfunctions of neurons, had poor correlations (Figure S7B). Per cluster association z-scores (scDRS assoc_mcz) were obtained using scdrs.method.downstream_group_analysis function. Stouffer’s method for meta-analysis was used to combine FreshMG and PsychAD scDRS association z-scores using the number of cells per cell cluster as cluster weights. For each cohort, scDRS permutation p-values per cluster were combined using Stouffer’s method on p-values with weights (cell counts of clusters). Meta p-values were further corrected for multiple testing using FDR correction (both per cell type and using all cell types combined). A more stringent correction was achieved using the per-cell type method hence we applied this correction.
Case-control residual analysis of heritability estimates
To test for concordance in heritability estimates, we separated cases and controls and repeated MAGMA/scDRS pipe for both FreshMG and PsychAD single-cell cohorts. We calculated meta z-scores for cases and controls separately using Stouffer’s method with weights (cluster’s cell counts). We correlated meta-z-scores of cases and controls using linear regression and derived residuals as a deviation from the regression line, indicative of per-cell cluster heritability. For both cohorts, per-cell pseudotime scores were correlated with per-cell scDRS scores for every microglia subtype and the correlation coefficient was calculated using Spearman’s test. FreshMG and PsychAD coefficients were combined in a meta value by applying Stouffer’s method and the number of cells per cell cluster as cluster weights.
Processing of genotypes
The FreshMG cohort genotype data consisted of samples from the Mount Sinai and Rush brain banks, as has been previously described (PMID: 35931864). The PsychAD cohort genotype data consisted of samples from the Mount Sinai brain bank. For both cohorts, DNA extraction and genotyping were performed as described previously (PMID: 35931864). In brief, genomic DNA was extracted from buffy coat or frozen brain tissue using the QIAamp DNA Mini Kit (Qiagen), according to the manufacturer’s instructions. Samples were genotyped using the Infinium Psych Chip Array (Illumina) at the Mount Sinai Sequencing Core. Pre-imputation processing consisted of running the quality control script HRC-1000G-check-bim.pl from the McCarthy Lab Group (https://www.well.ox.ac.uk/~wrayner/tools/), using the Trans-Omics for Precision Medicine (TOPMed) reference (PMID: 33568819). Genotypes were then phased and imputed on the TOPMed Imputation Server (https://imputation.biodatacatalyst.nhlbi.nih.gov). Samples with a mismatch between one’s self-reported and genetically inferred sex, suspected sex chromosome aneuploidies, high relatedness as defined by the KING kinship coefficient83 (KING > 0.177), and outlier heterozygosity (+/-3SD from mean) were removed. Additionally, samples with a sample-level missingness > 0.05 were removed and calculated within a subset of high-quality variants (variant-level missingness ≤ 0.02).
Samples of European (EUR) ancestry, as defined by assignment to the EUR superpopulation described by the 1000 Genomes Project84,85, were isolated using a 3SD ellipsoid method. Genotypes were first merged with GRCh38 v2a 1000 Genomes Project data (https://wellcomeopenresearch.org/articles/4-50)85 using BCFtools version 1.986. PLINK 2.087 was then used to calculate the merged genotypes’ principal components (PCs), following filtering (minor allele frequency (MAF) ≥ 0.01, Hardy-Weinberg equilibrium (HWE) p-value≥1×10−10, variant-level missingness ≤ 0.01, regions with high linkage disequilibrium (LD) removed) and LD pruning (window size = 1000 kb, step size = 10, r2 = 0.2) steps. An ellipsoid with a radius of 3SD, corresponding to the 1000 Genomes Project EUR superpopulation, was constructed using the first three genotype PCs. Only samples that fell within this ellipsoid (FreshMG: n=178, PsychAD: n=759) were retained for subsequent variant-level filtering. Autosomal bi-allelic variants with an imputation R2 > 0.8, HWE p-value≥1×10−6, and variant-level missingness ≤ 0.02 were retained. Genotypes were then annotated with ancestry-specific MAF values from the National Center for Biotechnology Information’s Allele Frequency Aggregator (ALFA) (https://ftp.ncbi.nih.gov/snp/population_frequency/latest_release/). Only variants with an ancestry-specific ALFA MAF ≥ 0.01 (FreshMG: n=10,828,658, PsychAD: n=18,490,352) were retained.
PRS calculation
Polygenic risk scores (PRS) were calculated on the FreshMG and PsychAD cohort samples using AD GWAS summary statistics9. The PRS-CS-auto method88 was used to apply continuous shrinkage priors to the effect sizes from these summary statistics. A EUR LD reference panel provided by the developers of PRS-CS was utilized (https://github.com/getian107/PRScs), which draws from 1000 Genomes Project data85. The following PRS-CS default settings were used: parameter a in the γ-γ prior = 1, parameter b in the γ-γ prior = 0.5, MCMC iterations = 1000, number of burn-in iterations = 500, and thinning of the Markov chain factor = 5. The global shrinkage parameter phi was set using a fully Bayesian determination method. Individual-level PRS were calculated using PLINK 2.087.
Transcriptional variation with PRS
PRS analysis was performed using the Dreamlet package (v0.99) that applies a linear mixed model with precision weights to fit a regression model. Instead of applying a fixed contrast between two coefficients, we used AD polygenic risk scores (PRS) as a continuous variable to test for the presence of non-linear effects on variance. We subtracted admixed donors from this analysis using the PCA analysis of the first 5 genotype PCs (2 individuals were removed from PsychAD and 1 from the FreshMG cohort as clear outliers in the PCA plots). In addition, we removed a cluster of 200 individuals that were clustering separately on the PC1/PC2 plot in psychAD as admixed individuals with a percentage of a non-EUR ancestry ranging from 2.5%-10% (mainly AFR ancestry). Similarly, we removed 6 more donors from FreshMG that were outliers and showed decreased EUR and increased AFR ancestry. The removal of admixed individuals allowed us to achieve a high level of concordance in Dreamlet PRS analysis between FreshMG and PsychAD cohorts. For the PsychAD cohort, we considered the following covariates to model the variance: source, log(n_counts), PMI, age, AD_Bellenguez PRS, sex, dx, genotype PCs 1-5. For the FreshMG cohort, we used the same set of covariates, in addition to the sequencing batch as a covariate. We further performed a fixed-effect meta-analysis with the R metafor package using effect sizes and standard error from the Dreamlet output (metafor rma function with parameters: yi = logFC, sei = logFC / t, method = “FE”). Meta p-values were further corrected for multiple testing using FDR correction (using the per-cell type correction).
To test whether the observed cell proportions change with PRS as a continuous variable, we applied the crumblr R package to each cohort and performed analysis of count ratio data with precision-weighted linear mixed models. Similar to the Dreamlet analysis, we modeled PRS as a continuous variable and tested for non-linear effects. FreshMG and PsychAD effect sizes were combined in a meta value using Stouffer’s method and the number of cells per cell cluster as cluster weights.
Supplementary Information
Assessment of ex-vivo activation in Microglia
Microglia are highly reactive cells and prolonged exposure to non-physiological conditions could induce unintended responses89,90. It has previously been shown that enzymatic dissociation of brain tissue performed at elevated temperatures can stimulate microglial cells, termed ex-vivo activated microglia (exAM)17,91,92. Failing to account for dissociation-induced changes can compromise the interpretation of microglial biology93. Although artificially induced, we wondered if the proportion of exAM might reflect microglial reactivity to pathological lesions and, as such, could provide valuable insights into disease mechanisms. In this study, we are uniquely positioned to interrogate the impact of exAM as myeloids from the FreshMG cohort were isolated via enzymatic dissociation at 37°C while samples from the PsychAD cohort, consisting of flash frozen tissue, were processed using an ice-cold, enzyme free, dissociation buffer. Assessing the extent and prevalence of the exAM signature in our datasets, we observed a distinct subtype in the FreshMG dataset, called MG_exAM_ERN1, which accounts for about 14.8% of all myeloid cells, and shows up-regulation of genes implicated in cell-cell adhesion, including ERN1, PLK2 (serine/threonine-protein kinase), CSKMT, and SNHG5. Notably, PLK2 is an enzyme regulating synaptic activity and has been implicated in stimulating Aβ production94,95. We note that the PsychAD dataset had very few cells identified as the ERN1 subtype, which belongs to the exAM subclass. This result suggests a number of possibilities. Given that the transcripts in the PsychAD dataset originate from the nuclear fraction of frozen nuclei and are free from enzymatic treatment or dissociation bias, it’s possible that the exAM cluster is predominantly derived from fresh tissue and is artificially induced during processing17. In addition, the transcripts that define the ERN1 subtype may be predominantly cytoplasmic and are, thus, missing from the PsychAD samples. Moreover, we acknowledge that we observe a substantial compositional variation among three different cohorts. However, to properly assess the extent of compositional variation, we need to model this by accounting for various technical effects, including the dissection bias. In conclusion, this comparative analysis confirmed the robustness of the human microglial taxonomy, independent of tissue source, agonal state, and postmortem interval (PMI).
Disease trajectory of human myeloid cells
Understanding the dynamic changes that take place during the onset and progression of AD at a molecular-level requires modeling of gene expression change along the measures of disease progression and identifying corresponding putative drivers. Analyses solely based on donor-level clinical covariate, such as CERAD scores or Braak staging, are limited due to the discrete nature of these variables. However, we reasoned that each labeled disease stage contains observations spanning the range of disease development and sought to identify drivers and order cells along a disease pseudotime trajectory.
Trajectory inference allows us to expand our ability to describe molecular changes at single-cell resolution. Trajectory inference methods have been remarkably successful in describing normal developmental processes faithfully and identifying regulatory mechanisms based on single-cell sequencing data64,96,97. Classical pseudotime algorithms order cells along a developmental axis, with early and late cells being assigned low and high pseudotime values, respectively. To take advantage of advances in experimental design and different sources of information, alternative methods have been developed. To link observations across experimental time points, optimal transport-based solutions have been proposed to assign each measurement from one experimental time point its likely future state in the following via a probabilistic assignment in the form of transport maps28,98. These couplings can then be used to quantify cellular change and determine the fate and putative drivers using CellRank 2’s RealTimeKernel29. Using it together with and GPCCA estimator29,63, we automatically inferred the terminal states of the disease dynamics. As AD onset and progression do not result in the emergence or disappearance of novel cell types, we expected to recover the major subclasses of the data, i.e., adaptive, homeostatic, ADAM, or PVM. In concordance with this ground truth, we recovered the terminal states accordingly and observed a high terminal state purity defined by the fraction of cells with the correct cell type and Braak stage six (terminal state purity was 1.0 for all subclasses used). Following, we computed driver genes of these respective fates by correlating gene expression with fate probabilities (Methods). We replicated the same analysis using the PsychAD and observed high concordance between the correlations between gene expression and fate probabilities associated with each gene (Figure S5B). While CellRank 2 identifies putative lineage drivers, it does not align cells along the disease trajectory. To construct this orthogonal information, we used the transition matrix computed by the RealTimeKernel to compute a Braak-stage-informed pseudotime for the two subsets of the data in a similar fashion as previously proposed for experimental time points by the CellRank 2 study29 (Methods). We tested if the gene expression changes over pseudotime inferred on the FreshMG cohort agrees with the corresponding change in the PsychAD dataset. As expected, the inferred change in gene expression showed high concordance between the two independent cohorts for both homeostatic and adaptive lineages (Figure S5C, D).
Data availability
The single-cell dataset, clinical metadata, and analysis outputs are available via the AD Knowledge Portal (https://adknowledgeportal.org). The AD Knowledge Portal is a platform for accessing data, analyses, and tools generated by the Accelerating Medicines Partnership (AMP-AD) Target Discovery Program and other National Institute on Aging (NIA)-supported programs to enable open-science practices and accelerate translational learning. The data, analyses, and tools are shared early in the research cycle without a publication embargo on secondary use. Data is available for general research use according to the following requirements for data access and data attribution (https://adknowledgeportal.org/DataAccess/Instructions). For access to data described in this manuscript see: https://www.synapse.org/#!Synapse:syn52795287
Code availability
All the source codes used in this study are available via GitHub: https://github.com/DiseaseNeuroGenomics/scMyeloidAD
Author contributions
Conceptualization: P.R., J.F.F., V.H., D.L.
Methodology: P.R., J.F.F., D.L., G.H.
Investigation: S.P.K., Z.S., S.A., M.A., C.C., A.H., J.F.F.,
Formal analysis: D.L., C.P., C.S., M.P., P.W., R.K., J.B., P.N.M., S.Z., K.T., D.M.
Validation: X.W., J.F.F.
Resources: V.H., D.A.B., C.P.K., K.G.B., R.S.
Writing: D.L., J.F.F., P.R., with contributions from all authors.
Visualization: D.L.
Supervision: P.R., D.L., J.F.F., G.H., V.H., F.J.T., G.V., G.C.Y.
Project administration: D.L., P.R.
Funding acquisition: P.R., V.H., D.L.
All authors read and approved the final draft of the paper.
Competing interests
F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd, Omniscope Ltd, Cellarity, and has ownership interest in Dermagnostix GmbH and Cellarity. The remaining authors declare no competing interests.
Materials & Correspondence
Correspondence to Donghoon Lee and Panos Roussos.
Figure Legends
Supplementary Tables
Supplementary Table 1. Myeloid taxonomy of the FreshMG dataset.
Supplementary Table 2. Myeloid taxonomy of the PsychAD dataset.
Supplementary Table 3. Clinical metadata of the FreshMG dataset.
Supplementary Table 4. Clinical metadata of the PsychAD dataset.
Supplementary Table 5. List of Akoya antibodies.
Supplementary Table 6. Compositional variation meta-analysis for Aging.
Supplementary Table 7. Compositional variation meta-analysis for AD phenotypes.
Supplementary Table 8. Differential gene expression meta-analysis for Aging.
Supplementary Table 9. Differential gene expression meta-analysis for AD phenotypes.
Supplementary Table 10. Pseudotime estimates for the FreshMG dataset.
Supplementary Table 11. Pseudotime estimates for the PsychAD dataset.
Supplementary Table 12. GRN adjacencies.
Supplementary Table 13. Regulon enrichment analysis for AD signatures.
Supplementary Table 14. Differential CCI meta-analysis.
Supplementary Table 15. scDRS scores from meta-analysis.
Acknowledgments
We would like to express our deep gratitude to the patients and their families who generously donated the invaluable biological material essential for the success of this study. We are profoundly indebted to their participation and commitment to advancing scientific knowledge and improving human health.
We acknowledge the National Institute on Aging for their generous support in funding this research with the following NIH grants: R01AG065582 (PR, VH), R01AG067025 (PR, VH), and R01AG082185 (PR, VH, DL). ROSMAP is supported by P30AG10161, P30AG72975, R01AG15819, R01AG17917.
U01AG46152, U01AG61356. ROSMAP resources can be requested at https://www.radc.rush.edu. We also thank Stefano Marenco and Pavan Auluck from the Human Brain Collection Core (HBCC) at the National Institute of Mental Health-Intramural Research Program for their contribution to the resource. The HBCC is supported by the NIMH-IRP under project ZIC MH002903.
Reference
- 1.↵
- 2.
- 3.
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.↵
- 43.
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.
- 70.
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵