ABSTRACT
Identifying kidney disease mechanisms often requires comparing samples from disease states with healthy reference tissues. However, the effect of variations in sample procurement, storage and donor baseline characteristics of reference samples has thus far not been evaluated. Three distinct kidney reference sample types were evaluated for integrity and injury biomarkers and in their ability to define differentially expressed genes (DEGs) when compared to three different diabetic kidney disease (DKD) states. Unaffected parts of tumor nephrectomies (TN), pre-transplant living donor biopsies (LD), and percutaneous kidney biopsies from healthy volunteers (HC) served as sources for reference tissue. Single cell gene expression profiles showed differences in the expression of injury or disease markers and the proportion of immune and proximal cell states. TN exhibited the highest expression of early stress response genes. A gene set associated with procurement effect in post-operative biopsies (LD and TN) was identified. An age-associated transcriptional signature was extracted from the reference data. Controlling for age and tissue procurement effects, immune-related pathways were found to be most enriched in DKD when compared to HC. Energy-related processes were enriched in DEGs from DKD versus LD. TN samples exhibited more underlying pathology than LD. The pathway analyses using the DEGs underscore the importance of accounting for appropriate confounding factors in differential expression analyses between disease and reference samples.
TRANSLATIONAL STATEMENT Integrated single-cell data analysis of three reference sample types—needle biopsy from young healthy kidney tissue, pre-perfusion biopsy from transplant kidneys, and cancer-free tissue from tumor-nephrectomies—revealed distinct transcriptional profiles influenced by the biopsy procurement method and age. These differences impacted findings in diabetes-related kidney disease versus reference comparisons highlighting the importance of accounting for these differences in interpreting analyses and identifying disease mechanisms.
INTRODUCTION
Due to the highly heterogeneous nature of kidney diseases, our understanding of the disease pathogenesis remains incomplete. Advances in genomics technologies have facilitated the unveiling of molecular mechanisms underlying kidney diseases1–4. However, these findings have relied heavily on comparisons between disease and healthy reference tissue. Given the challenges of obtaining disease-free kidney tissue from healthy humans, a wide variety of sources for reference tissue have often been used. Further, factors such as the sample procurement process, preservation method, tissue processing technology, and the age and underlying physiological condition of the tissue donor could impact the quality of the reference tissue and the corresponding gene expression profiles5.
With the rapid development of molecular analysis tools, understanding the advantages and limitations of different reference tissue sources is critical6. In this study, we compared single-cell gene expression data from three reference tissue sources: Percutaneous kidney biopsies from healthy controls (HC), unaffected parts of tumor-nephrectomies (TN), and living kidney donor biopsies (LD).
We utilized single-cell gene expression data of kidney biopsy samples from patients with type 2 diabetes with diabetic kidney disease (DKD), patients with type 1 diabetes for > 25 years with no evidence of kidney disease referred henceforth as diabetes mellitus resilient (DMR), and youth with early onset type 2 diabetes and kidney disease (early DKD) as disease comparators to determine how choice of reference type could impact identification of underlying molecular mechanisms in disease progression (Figure 1).
Overview of the analysis approach
MATERIALS & METHODS
Human data
Key clinical and demographic characteristics of participants who contributed kidney biopsy samples in the three reference and three diabetes-related groups are summarized in Table 1. For this study, we have used data from samples collected by the Kidney Precision Medicine Project (KPMP), which focuses on molecular mechanisms in acute kidney injury (AKI) and chronic kidney disease (CKD)7. From the KPMP cohort, nine LD, twenty-one DKD, and six DMR biopsy samples were used in this study. LD and TN samples were obtained through post-operative surgical biopsies, whereas healthy control samples (HC), early DKD, and DKD samples were collected via needle biopsies.
The TN samples comprised tumor-free kidney cortical tissue from nephrectomy specimens (PRECISE cohort, University of Michigan). The 9 samples were selected from donors with high eGFR (>80), low pathology, no diabetes and no hypertension.
The 9 early DKD samples were from the Impact of Metabolic Surgery on Pancreatic, Renal and Cardiovascular Health in Youth with Type 2 Diabetes (IMPROVE-T2D, NCT03620773) study and the Renal Hemodynamics, Energetics and Insulin Resistance in Youth Onset Type 2 Diabetes Study (RENAL HEIR, NCT03584217) and 12 HC from the Control of Renal Oxygen Consumption, Mitochondrial Dysfunction, and Insulin Resistance (CROCODILE) study (NCT04074668).
For identifying the genes associated with age using weighted correlation network analysis, (WGCNA)8, we used data from LD. In addition to the 9 KPMP LD samples, we included 13 samples from the University of Michigan living donor cohort generated for the Human Kidney Transplant Transcriptomic Atlas.
After procurement with identical protocols (dx.doi.org/10.17504/protocols.io.7dthi6n) all samples were processed by the KPMP Precision Medicine Through Interrogation of RNA in the Kidney (PREMIERE) tissue interrogation site personnel and core facility staff.
Single cell data generation
Single cell dissociation of the samples was performed using Liberase TL at 37°C. Detailed sample processing protocol can be found in: dx.doi.org/10.17504/protocols.io.7dthi6n. The single cell suspension was immediately transferred to the University of Michigan Advanced Genomics Core facility for further processing. 10X Genomics single cell RNA sequencing (scRNAseq) of the samples was done according to dx.doi.org/10.17504/protocols.io. The raw sequences files were aligned to the reference genome (GRCh38 (hg38)) and the feature-barcode matrices were generated by Cell Ranger program embedded in 10X Genomics analysis software.
Single cell data analysis
The read count matrices were initially processed using SoupX (v1.5.0) to correct for ambient mRNA contamination. Data processing was performed using the Seurat R package (version 5.0). Quality control was enforced with cutoffs of >500 and <5000 genes per cell, and <50% mitochondrial reads per cell. Each sample group underwent normalization, variable gene identification, and principal component analysis independently. Subsequently, the integration of all six sample types was achieved via reciprocal principal component analysis (RPCA).
Dimensionality reduction using Uniform Manifold Approximation and Projection (UMAP) and unsupervised clustering (resolution 0.4) were conducted on the integrated dataset. Cell clusters were annotated based on markers of major nephron segments, as well as endothelial, interstitial, and immune cell types. Further analyses were performed on a down-sampled integrated object containing 20,000 randomly selected cells from each sample type (the integrated dataset containing a total of 120,000 cells from all 6 sample types together).
Trajectory analysis
To illustrate the separation of cell states in proximal cells, we utilized the Slingshot R package9 to perform a trajectory/pseudotime analysis. This method orders cells along a trajectory or lineage, with each cell receiving a specific pseudotime value that indicates its position along that trajectory. The Slingshot R package, designed for modeling developmental trajectories in single-cell RNA sequencing data, incorporates prior knowledge through supervised graph construction9. For our analysis, we used Seurat’s Uniform Manifold Approximation and Projection (UMAP) embeddings to construct the pseudotime trajectories.
Post-operative tissue procurement effect analysis
The gene set upregulated in post-operative sample types (LD and TN) compared to other groups (HC, DMR, early DKD, and DKD) was identified via differential expression analysis using DESeq210 on pseudobulk RNA counts. The final gene set was selected after manually checking for overexpression across all samples in LD and TN groups to avoid sample-to-sample variation. The interaction network of this gene set was constructed using STRING 11 and visualized as a network in Cytoscape12. Additionally, gene ontology biological processes enrichment analysis for this gene set was conducted using the ClusterProfiler R package13.
Age effect
WGCNA8 on sample-level pseudo bulk read counts from the 21 LD samples were used to generate gene modules. Genes from all modules showing positive or negative correlations (correlation > 0.5) with age were compiled into a positive and negative age correlated gene set. For the positive and negative age correlated gene sets, we calculated an ‘Age_gene_Score’ using the ‘AddModuleScore’ function in the integrated Seurat object.
Adjusting for procurement and age effect in differential expression analyses
We conducted differential expression analyses between disease and reference types using the DESeq2 R package on pseudo-bulk mRNA counts to identify disease-associated genes. The pseudo-bulk counts were generated by aggregating raw read counts from each sample. We performed differential expression analyses in three ways: without adjusting for confounding factors, adjusting for age, and adjusting for both age and procurement effect. Each of the three disease groups was compared separately with the three reference groups. To include the procurement effect as a covariate, we calculated a score based on the expression of 25 genes positively associated with post-operative tissue procurement. These scores were generated by creating Z-scores from the RNA expression of the genes and then using the average Z-score as the composite score.
Functional annotation
We utilized Qiagen Ingenuity Pathway Analysis (IPA)14 software to compare the canonical pathways enriched in the significantly differentially expressed genes (DEGs) (adjusted p-value < 0.05) from disease versus reference type analyses. Additionally, using decoupleR15 directly on the integrated single cell dataset, we inferred the enrichment of pathway activities in all 6 sample types.
RESULTS
Single cell analysis
In our down-sampled integrated single-cell dataset, comprising 120,000 cells from 67 samples across 6 sample types, we annotated 25 distinct cell types (Figure 2a). Within these, we identified two cell states for proximal tubule cells—healthy and adaptive/maladaptive (PT and aPT). The aPT state was specifically marked by the over-expression of vascular cell adhesion molecule 1 (VCAM1), a known indicator of adaptive/maladaptive proximal cells1. For the thick ascending loop of Henle cells (TAL), we identified three states: TAL1, TAL2, and aTAL, with the aTAL state being characterized by high expression of prominin 1 (PROM1)1. Importantly, no batch effect was observed in the integrated dataset (Figure 2b).
a. UMAP showing the annotated cell types from the integrated analysis of a total of 120,000 cells from 6 sample types (20,000 cells each from Healthy control “HC”, Living donor “LD”, Tumor-nephrectomy “TN”, Diabetic mellitus resistor “DMR”, early diabetic kidney disease “early DKD”, Diabetic kidney disease “DKD”). b. UMAP of the integrated dataset showing successful integration of HC, LD, TN, DMR, early DKD and DKD. c. stacked barplot showing the proportion of the 6 sample types in each of the cell type identified. d. Violin plot showing the specific markers for the annotated cell types. Abbreviations include: Podocyte, POD; Parietal epithelial cell, PEC; Proximal tubule, PT; adaptive/maladaptive, aPT; descending thin loop of Henle, DTL; ascending thin loop of Henle, ATL; thick ascending loop of Henle, TAL; adaptive/maladaptive thick ascending loop of Henle, aTAL; distal convoluted tubule, DCT; connecting tubule, CNT; Principal cell, PC; Intercalated type A, IC-A; Intercalated type B, IC-B; transient between PC and IC, tPC-IC; Endothelial cell EC; Efferent and Afferent arteriolar endothelial cells EC-AEA; Glomerular endothelial cell, EC-GC; Fibroblast, FIB; Vascular smooth muscle cell, VSMC; Pericyte, P; Mesangial cell, MC; Myeloid cell, MYL; B cell, B; T cell, T; Natural killer T cell, NK T/C.
While the proportions of most cell types were relatively consistent across the 6 sample types, notable exceptions included the ascending thin limb of Henle (ATL), immune cells (B, T, and Myeloid (MYL), and adaptive/maladaptive cell states (aPT and aTAL) (Figure 2c). The disproportion of ATL cells is likely due to the predominantly cortical nature of the biopsies used in this study, as ATL is a nephron segment of the medulla not regularly sampled in percutaneous biopsies. Among the reference groups, HC had the smallest proportion of adaptive/maladaptive cell states. Figure 2d shows the specific expression of markers used to annotate the cell types
Quality control features and disease markers
The proportion of reads mapping to mitochondrial genes within a cell serves as an essential quality control metric to identify potentially damaged or poor-quality cells7. Single-cell analysis of kidney tissue typically shows a high mitochondrial read content. In this study, we included only cells with 50% or fewer mitochondrial reads. The density plot (Figure 3a) indicates that HC had the fewest cells with a high percentage of mitochondrial reads. Conversely, LD and TN exhibited a higher percentage of cells with elevated mean mitochondrial read content, surpassing even the disease groups.
a. Violin density plot showing the percentage of mitochondrial reads per cell in the 6 sample types studied. Healthy control (HC) has the lowest number of cells with high percentage of mitochondrial reads. Living donors (LD) and tumor-nephrectomies (TN) have the highest number of cells with high percentage of mitochondrial reads. b. Dot plot with the scaled expression levels of early stressors (ATF3, DUSP1, FOS, JUN and EGR1) in the 6 sample types studied. c. Dot plots and d. violin plots showing the expression levels of healthy (EGF & UMOD) and injured/disease kidney (IL2, IL6, HAVCR1, LCN2, MMP7, TIMP1 and VCAM1 markers in the 6 sample types.
Early stress response genes can be activated within minutes after stimulation16, 17. Highest expression of early stressors including Activating transcription factor 3 (ATF3), Dual specificity phosphatase 1 (DUSP1), Early growth response 1 (EGR1), Fos proto-oncogene (FOS) and Jun proto-oncogene (JUN) were observed in TN (Figure 3b). Both LD and TN had higher expression of these genes than DMR and early DKD. Notably, relatively few cells expressed these stressor genes in HC (Figure 3b).
Expression levels of epidermal growth factor (EGF) and uromodulin (UMOD) markers of healthy kidneys18, 19, were the lowest in DKD. Pseudo bulk differential expression analysis of EGF and UMOD between the reference groups did not show any significant differences. The expression levels of kidney injury or disease markers including hepatitis A virus cellular receptor 1 (HAVCR1), interleukin 2 (IL2,) interleukin 6 (IL6), lipocalin 2 (LCN2), matrix metalloproteinase 7 (MMP7), TIMP metallopeptidase inhibitor 1 (TIMP1) and VCAM11, 20–23 is shown in Figure 3c and 3d. HC was the only reference group with undetectable or very low expression of these genes. LD and TN had comparable expression of these genes.
Immune cells and proximal cell states
Immune cell enrichment was over five times greater in DKD compared to HC and LD (Figure 4a). Figure 4b shows the row-scaled heatmap of the log fold changes from the differential immune cell composition analyses performed between DKD and the reference groups; all comparisons were significantly different (adjusted p value < 0.05). Low B cell composition in HC was indicated by the high log fold change in DKD versus HC. Similarly, the heatmap indicates high B and T cell composition in TN compared to HC and LD. Supplementary Table 1 provides the number of cells in each cell type for every sample group.
a. Proportion of immune cell types found in each of the 6 sample types (HC, LD, TN, DMR, early DKD and DKD). The number on top of each bar is the total number of immune cells identified in each cell type. b. Row-scaled heatmap of the log fold changes from the differential immune cell composition analyses performed between DKD and the reference groups; all comparisons were significantly different (adjusted p value < 0.05.). Log fold changes indicate T and B cells were highest in TN samples while HC had fewest B cells. c. The two proximal cell states ordered along a trajectory. d. The proximal cell states identified in the 6 sample types ordered along the trajectory. Among reference samples, LD had the most adaptive cell state cells.
The pseudotime ordering of cells from the proximal cell type along a one-dimensional trajectory (Figures 4c-d) revealed similarities among the three reference tissues. Figure 4c illustrates the trajectory of the two PT cell states across all sample types. In Figure 4d, the PT cell state trajectory is shown separately for each studied sample type; the cell state ordering in HC, LD, and TN appeared similar. Notably, HC also contained the adaptive/maladaptive cell state. However, the proportion of the adaptive cell states in HC were lower compared to the other two reference sample types (Supplementary Figure 1). As expected, the DKD cell states displayed the most distinct ordering along the trajectory, with a high number of adaptive/maladaptive state cells.
Effect of post-operative tissue procurement
Differential expression analysis between the two different biopsy procurement method groups, along with manual curation, identified a set of 25 genes (Supplementary Table 2) that exhibited significantly elevated expression in the post-operative surgical biopsy group (Figure 5a).
Using bulk mRNA analysis, a gene set that was upregulated in post-operative tissue procurement method compared to percutaneous needle biopsies was identified. LD and TN samples were acquired by post-operative biopsy procedure and HC, DMR, early DKD and DKD samples were by percutaneous needle biopsies. a. Violin density plot showing the level of the score calculated at cell level for the 25 genes that were upregulated in post-operative tissue biopsies. b. String interaction network showing direct interaction of 19/25 genes. c. Violin plot showing the elevated expression of the score in every LD and TN sample compared to the needle biopsy samples. d. Top 5 enriched Gene Ontology biological processes for the gene set.
Network analysis using STRING11 revealed direct interactions among 19 of these 25 genes (Figure 5b), with enrichment in the Jun N-terminal kinase (JNK) and p38 mitogen-activated protein kinase (MAPK) pathways (Figure 5d). Notably, every LD and TN sample demonstrated markedly higher expression of this 25 gene set score without much variability between individual samples compared to the percutaneous biopsy tissues from HC, DMR, early DKD, and DKD. (Figure 5c).
Age effect
Figure 6a presents Venn diagrams illustrating the overlap of DEGs between disease and reference groups. The number of unique DEGs in DMR vs HC (1428 genes) that did not overlap with any comparison was higher (> 1.5 times) compared to that of DMR vs LD and DMR vs TN. Similarly, for DKD vs HC, the number of unique DEGs (2004 genes) was more than twice that for DKD vs LD and DKD vs TN. In contrast, for early DKD vs HC, where both groups had an average age below 26, the number of unique DEGs was lower than in early DKD vs LD and early DKD vs TN. The average age in the LD, TN, DMR, and DKD groups was above 37 years (Table 1). These observations point to a potential age effect contributing to the DEGs.
a. Venn diagrams showing the overlap of significantly (adjusted p value < 0.05) differentially expressed genes (DEGs) in DMR (left panel) versus the three reference samples (HC, LD and TN); early DKD (middle panel) versus the three reference samples; and DKD (right panel) versus the three reference samples. b. Bar plot showing the age distribution in each of the 6 sample types. c. Dot plot showing the scaled expression of the scores calculated for the genes that positively and negatively associated with age. The gene scores from the negatively associated genes were enriched in endothelial cells and the gene scores from the positively associated genes were enriched in PC, PT and TAL cell states. d. Plot showing the significant correlation between the scores from the positively age-associated genes and age of the living donors. e. Plot showing the significant correlation between the scores from the negatively age-associated genes and age of the living donors. f. Top enriched Gene Ontology biological processes for the positively age-associated gene set. g. Top enriched Gene Ontology biological processes for the negatively age-associated gene set.
Figure 6b shows the age distribution per sample type. HC and early DKD were considerably younger than all other sample types. DKD were the oldest. We identified two sets of genes that positively (n = 58) and negatively associated (n = 96) with age (correlation >0.5) from the WGCNA analysis of pseudo bulk read counts of LD (Supplementary Table 3).
Based on the mRNA expression levels in single cell data, Age_gene_scores at cell level were calculated for these positively and negatively age correlated genes. Density plot of the gene scores show enrichment of the positively correlated genes in the PT, TAL and principal cells and the negatively correlated genes were enriched in endothelial, podocytes and fibroblast cells (Figure 6c). Figures 6d and 6e show significant correlation (p<0.0002, p<0.001) between age and the mean Age_gene_scores calculated at sample level. Gene ontology biological processes enriched for positively regulated genes included methylation, macro molecule modification and microtubule polymerization; negatively correlated genes were enriched for protein kinase c signaling, lymph vessel development, endothelial cell proliferation etc. (Figure 6f, 6g).
Disease versus reference groups
Figures 7a and 7b present Venn diagrams illustrating the overlap of DEGs comparing DKD to reference types. Specifically, Figure 7a displays the overlap with adjustments for age alone, while Figure 7b shows the overlap with adjustments for both age and procurement method.
a. Venn diagram illustrating the overlap of age-adjusted, significant differentially expressed genes (DEGs) (adjusted p-value < 0.05) between DKD and the three reference types. b. Venn diagram depicting the overlap of DEGs, adjusted for age and procurement effects (adjusted p-value < 0.05), between DKD and the three reference types. c. Heat map displaying enrichment scores for the top 5 canonical pathways from IPA for the significant DEGs in DKD versus HC, DKD versus LD and DKD versus TN. Scores are shown after adjusting for age and procurement effects. d. Heat map displaying the top 5 enriched canonical pathways from IPA for the significant DEGs between DKD versus HC. Scores are shown for the three types of analyses: without confounding factor adjustment, adjusted for age, and adjusted for age and procurement effects. e. Bar plot presenting the top 5 enriched canonical pathways from IPA for the significant DEGs between early DMR and LD. Scores are shown for the three types of analyses: without confounding factor adjustment, adjusted for age, and adjusted for age and procurement effects. f. Dot plot showing SIRT1 expression levels across the sample groups. g. Bar plot presenting the top 5 enriched canonical pathways from IPA for the significant DEGs between early DKD and LD. Scores are shown for the three types of analyses: without confounding factor adjustment, adjusted for age, and adjusted for age and procurement effects. h. Dot plot illustrating ARAP1 expression levels in the sample groups studied.
Supplementary Figure 2 shows the significantly enriched gene ontology biological processes for the 1792, 320 and 501 significant DEGs that were unique to HC, LD and TN groups in the differential expression analyses with DKD (Figure 7b).
Figure 7c illustrates the top 5 processes from the IPA analysis of DEGs between DKD and three reference groups: HC, LD, and TN, after adjusting for age and procurement effects. The top enriched pathways were all immune-related. The DKD versus HC comparison exhibited the highest enrichment in immune processes, while the DKD versus TN comparison showed the lowest enrichment. In the DKD versus LD analysis, enrichment in immune-related processes markedly decreased after adjustments for age and procurement effects. Interestingly, some of the energy related pathways were more enriched in DKD versus LD comparison than DKD versus HC or DKD versus TN after adjusting for the confounding factors (Supplementary Figure 3).
Figure 7d displays the pathway enrichment scores for DEGs in the DKD versus HC comparison. The scores remained similar when adjusted for only age and when adjusted for both age and procurement effects, as both DKD and HC samples were obtained via needle biopsies, rendering procurement effects irrelevant.
Figure 7e highlights the top 5 pathways for DEGs from the DMR versus LD comparison, with and without the adjustment for confounding factors. Adjusting for these factors reduced enrichment in all pathways except for the Sirtuin signaling pathway. Figure 7f shows that Sirtuin 1 (SIRT1) expression is more enriched in the DMR group compared to all other groups studied. The enrichment score for neutrophil degranulation significantly decreased after adjusting for age and procurement effects.
Figure 7g highlights the top 5 pathways for DEGs from the early DKD versus LD comparison, both with and without confounding factors. The top 5 enrichment pathways included Rho GTPase cycle, Senescence pathway, CLEAR signaling, superpathway of inositol phosphate components and protein kinase A signaling. Adjusting for just age increased the enrichment score in all processes except inositol phosphate components pathway. The increased effect disappeared when adjusted for both age and procurement effect except for Rho GTPase cycle. The expression of ArfGAP with RhoGAP Domain, Ankyrin Repeat And PH Domain 1 (ARAP1) a key player in Rho GTPase cycle was enriched markedly in early DKD group compared to other groups (Figure 7h).
Pathway activity
Figure 8a presents a heat map illustrating pathway activity enrichment based on scaled activity scores across different sample types. In the reference groups, phosphoinositide 3-kinase (PI3K) was the only pathway enriched in HC. In contrast, multiple pathways were activated in both LD and TN; specifically, tumor protein p53 (P53), trail, and transforming growth factor beta (TGFβ) pathways were enriched in both groups. Additionally, hypoxia was enriched in LD, while mitogen-activated protein kinases (MAPK), nuclear factor kappa-B (NF-κB), PI3K and wingless-related integration site (WNT) were enriched in TN.
a. Heatmap showing the enrichment of top pathway activities inferred by decoupleR analysis on the single cell expression of the 6 sample types. b. Hierarchical clustering using the aggregate read counts at the sample type level. c. Hierarchical clustering of the age and procurement effect adjusted aggregate read counts.
Hierarchical clustering of aggregated pseudo read counts at sample group level without adjusting for confounding factors is shown in Figure 8b. After accounting for these confounding factors, the DKD and DMR groups clustered separately from all other sample groups studied (Figure 8c).
DISCUSSION
Reference samples are crucial in disease pathogenesis studies, but obtaining healthy human kidney samples is particularly challenging6. As a result, researchers frequently rely on kidney reference sources that may not closely align with disease samples. In this study, we compared the molecular profiles of reference samples from three different kidney tissue sources to address this issue. We integrated and compared transcript profiles from single-cell data obtained from HC, LD, and TN samples. Although the samples originated from different projects, tissue dissociation and single cell RNA sequencing were carried out using the same KPMP - Tissue Interrogation Site personnel and core facility staff. This consistency allowed for integration without batch effects.
Kidney tissue procurement, handling, and dissociation can lead to early stress effects and associated transcriptional changes24. However, since all samples in this study were stored and processed using an identical protocol, the observed differences are likely attributable to tissue procurement methods or the physiological and clinical conditions of the donors. Samples were obtained via percutaneous needle biopsy (HC, early DKD, DMR, DKD) and post-operative tissue biopsy (LD, TN). In HC compared to TN and LD, lower mitochondrial read content per cell and reduced expression of early stress markers were encountered. The mean percentage of mitochondrial read content per cell for LD and TN exceeded that of needle biopsy disease samples (Figure 3a), indicating that kidney cells start losing their viability during the procedure in post-operative biopsies. Furthermore, we identified a gene set associated with the post-operative tissue procurement method that was consistently higher in every LD and TN samples compared to percutaneous needle biopsy samples (Figure 5). As expected, the enriched pathways for this gene set were stress associated JNK and p38MAPK cascades.
The TN samples included non-cancerous tissue from donors with high eGFR (>80), minimal pathology, no diabetes, and no hypertension. However, the higher proportion of B, and T cells in TN samples compared to HC and LD (Figures 4a and 4b) suggests underlying pathology. The low expression of UMOD and EGF in the DMR group may be attributed to having only six samples, along with a lower TAL cell proportion compared to other groups (Figure 2c).
The low or nonexistent expression of disease markers, including IL2, IL6, LCN2, MMP7, and VCAM1, in HC indicates the pristine quality of the tissue (Figure 3c and 3d). The minimal kidney damage marker expression and low B cell infiltration signal in HC can be attributed to both the biopsy procurement and the younger age of this group compared to LD and TN.
Interestingly, LD exhibited higher expression of HAVCR1, which encodes the kidney injury marker 1 (KIM1), in more cells compared to HC and TN (adjusted p-value < 0.05). HAVCR1 is significantly expressed in the adaptive/maladaptive proximal cell state, which is more prevalent in LD and TN than in HC (Supplementary Figure 1).
Across the sample groups a gene set associated with post-operative tissue procurement and age could be characterized. Elevated scores from the gene set linked to post-operative procurement effects were consistently observed in every sample from HC and TN, compared to all other percutaneous biopsy samples. The genes that were positively associated with age were found to be enriched for methylation. Increase in methylation with age has been previously shown25. The enrichment of endothelial associated processes for the genes negatively associated with age is also supported by published studies26, 27.
Incorporating age and procurement effects as confounding factors in the differential expression analysis minimized their impact on gene expression (Figures 6 and 7a-e). Several immune-related pathways were highly enriched among the DEGs in the DKD versus reference sample analyses, highlighting the complex role of immune cells in disease progression (Figure 7c, Supplementary Figure 2). DEGs from the DKD versus HC comparison showed the highest enrichment for immune-related processes whereas, the DKD versus TN comparison showed overall lowest enrichment. This suggests that many of these top processes, as shown in Figures 7c, may already be elevated in TN compared to HC and LD.
Adjusting for the procurement effect in comparisons between disease biopsies and healthy controls did not significantly alter the IPA pathway enrichment scores (Figure 7d), indicating the specificity of the procurement effect gene set (both disease and HC were surgical needle biopsies).
Enrichment of oxidative phosphorylation was observed exclusively in DKD versus LD, compared to DKD versus HC and TN suggesting high levels of these activities in HC and TN (Supplementary Figure 2). Energy-related activities are known to be higher in younger individuals compared to older adults28, 29, which could explain the elevated levels of these processes in HC. Neighboring cells around a tumor often exhibit high levels of oxidative phosphorylation (OXPHOS) due to the ‘reverse Warburg effect’30. This phenomenon could account for the increased energy-related processes in TN.
Adjusting for confounding factors did not alter the enrichment score for the Sirtuin signaling process in the DEGs from the DMR versus LD analyses suggesting its role in DMR (Figure 7e). The role of SIRT1 in diabetes and insulin sensitivity have been previously implicated by several studies 31–34. Moreover, the expression of SIRT1 was most enriched in DMR compared to all other sample groups in our analysis (Figure 7f). Another interesting observation in the comparative analyses was the enrichment of Rho GTPase cycle process in DEGs from early DKD versus LD (Figure 7g). The confounding factor adjustments increased the enrichment score of Rho GTPase cycle in early DKD versus LD analyses. The specific over-expression of ARAP1 a protein involved in Arf and Rho signaling 35 in early DKD (Figure 7h) further supports the observation from IPA enrichment analyses. ARAP1 has been implicated in kidney fibrosis, particularly in DKD36, ARAP1 knockdown attenuated kidney injury and dysfunction in diabetic mice36.
The absence of enriched activity in ‘Pathway Responsive GENes’ (PROGENy37) except for PI3K in HC indicates that these samples are unperturbed (Figure 8a). The enrichment of PI3K in HC is likely due to its crucial role in physiological cell function such as growth and metabolism38. Conversely, the enrichment of stress related activities including p53, Trail, and TGFβ activities in LD and TN, suggest a potential response to the effect of post-operative tissue procurement. Additionally, the enrichment of NFKβ and MAPK activities in TN indicates a possible underlying pathology in these samples.
One limitation of this study is the small number of samples per group, which prevented matching sex distributions, shown to impact health and disease39. Another limitation is the lack of post-surgical disease biopsy samples to clearly demonstrate the procurement effect adjustment in differential expression analyses.
In conclusion, HC tissue sourced from younger donors, appeared to be the most pristine. In contrast, TN samples displayed more underlying molecular pathology compared to LD. Comparative analyses of top enriched pathways for differentially expressed genes highlight the importance of accounting for appropriate confounding factors in differential expression analyses between disease and reference samples to achieve credible results.
Funding
The study is funded by the following grants from the NIH: The KPMP is funded by the following grants from the NIDDK: U01DK133081, U01DK133091, U01DK133092, U01DK133093, U01DK133095, U01DK133097, U01DK114866, U01DK114908, U01DK133090, U01DK133113, U01DK133766, U01DK133768, U01DK114907, U01DK114920, U01DK114923, U01DK114933, U24DK114886, UH3DK114926, UH3DK114861, UH3DK114915, UH3DK114937. Additional funding sources are K23 DK116720, R01 DK132399, UC2 DK114886, and P30 DK116073, JDRF (2-SRA-2019-845-S-B, 3-SRA-2022-1097-M-B), the Boettcher Foundation, and in part by the Intramural Research Program at NIDDK and the Centers for Disease Control and Prevention (CKD Initiative) under inter-agency agreement #21FED2100157DPG (P.B), U54DK083912, P30 DK081943/HCA: Kidney Seed Network (M.K), U2C DK114886, R21 DK126329 and Chan-Zuckerberg Foundation - CZF2019-002447 (M.B.), K23 DK125529 (A.S.N). This work was also supported by the George M. O’Brien Michigan Kidney Translational Resource Center funded by NIH/NIDDK grant U54DK137314.
Disclosures
Dr. Kretzler reports grants and contracts through the University of Michigan with the National Institutes of Health (NIH), Chan Zuckerberg Initiative, Breakthrough T1D, Alport Foundation, amfAR, AstraZeneca, NovoNordisk, Eli Lilly, Gilead, Janssen, Boehringer-Ingelheim, Moderna, European Union Innovative Medicine Initiative, Certa, Chinook, Angion, RenalytixAI, Travere, Regeneron, IONIS, Maze, Sanofi, Dimerix, Roche-Genentech and Vera Therapeutics. He has received consulting fees through the University of Michigan from Janssen, NovoNordisk, Otsuka, Alexion, Variant Bio and Novartis. Dr. Kretzler served on the NIH-NCATS council, is the committee chair for the American Society of Nephrology Program and is on the board of Nephcure Kidney International. In addition, he has a patent PCT/EP2014/073413 “Biomarkers and methods for progression prediction for chronic kidney disease” licensed. Dr. Bjornstad reports serving or having served as a consultant for AstraZeneca, Bayer, Bristol-Myers Squibb, Boehringer Ingelheim, Eli-Lilly, LG Chemistry, Sanofi, Novo Nordisk, and Horizon Pharma. P.B. also serves or has served on the steering committees, advisory boards and/or data safety committees of AstraZeneca, Bayer, Boehringer Ingelheim, Lilly, Novo Nordisk, and XORTX. Dr. Rosas received research funding paid to her institution from Bayer and the National Institutes of Diabetes and Digestive and Kidney Diseases (NIDDK) and benefiting from an enrichment program (P30 DK03836) during the conduct of the study. She is a member of the steering committee for the FINE-ONE Study and attended advisory board meeting for Fresenius and Bayer. Dr. Sedor has received research funding paid to his institution, outside of this study, from NIDDK and the National Institute of Allergy and Infectious Diseases; for clinical trials from Vertex, Chinook and Maze. He has consulted for Maze and Boehringer Ingelheim; and receives royalties from Sanofi for US Issued Patent US/11,645,753 (ML segmentation Kidney biopsies). Dr. Kimmel received royalties and advances from Elsevier and Mayo Clinic Press. Outside of current work, Dr. Naik reports consulting fees from Vera and a patent pending, Targeting the IGF-1 signaling pathway to extend organ lifespan. Dr. Hodgin received, outside of the current study, research funding from NIH, Janssen, Astra Zeneca, Gilead, Novo Nordisk.
Supplementary Materials
Supplementary acknowledgement The KPMP consortium members
Supplementary Tables
Supplementary Table S1. Reference samples cell type summary
Supplementary Table S2. Processing effect gene set
Supplementary Table S3. Age effect gene sets
Supplementary Figures
Supplementary Figure S1. Proportion PT and TAL cell states
Supplementary Figure S2. Enriched Gene Ontology biological processes
Supplementary Figure S3: Heatmap of the comparative enrichment of energy related pathways
Acknowledgement
We are deeply indebted to the generosity of participants volunteering to donate tissue primarily for research purposes despite receiving no direct immediate benefit to their clinical care. We thank the clinical coordinators for their efforts in participant enrollments and biopsy tissue procurement. All the members of the Kidney Precision Medicine Project consortium are listed in supplementary acknowledgement.