Abstract
Spatial transcriptomics technology remains resource-intensive and unlikely to be routinely adopted for patient care soon. This hinders the development of novel precision medicine solutions and, more importantly, limits the translation of research findings to patient treatment. Here, we present DeepSpot, a deep-set neural network that leverages recent foundation models in pathology and spatial multi-level tissue context to effectively predict spatial transcriptomics from H&E images. DeepSpot substantially improved gene correlations across multiple datasets from patients with metastatic melanoma, kidney, lung, or colon cancers as compared to previous state-of-the-art. Using DeepSpot, we generated 1 792 TCGA spatial transcriptomics samples (37 million spots) of the melanoma and renal cell cancer cohorts. We anticipate this to be a valuable resource for biological discovery and a benchmark for evaluating spatial transcriptomics models. We hope that DeepSpot and this dataset will stimulate further advancements in computational spatial transcriptomics analysis.
1 Introduction
Spatial transcriptomics provides valuable insights into tissue-specific properties both in normal physiology [1–3] and in disease progression and treatment [4–8]. However, the technology is limited to sequencing small tissue regions (e.g., 6.5x6.5mm capture area for Visium, 10x Genomics™) and it is both resource-intensive and error-prone, making its routine adaption in clinical setting unlikely in the near future [9, 10]. This hinders the development of new precision medicine solutions and, more importantly, limits the translation of basic research findings to patient diagnosis and treatment. Therefore, robust and cost-effective methods are urgently needed to enable simple, affordable, and reliable analysis of spatially resolved gene expression data in routine clinical workflows and retrospective cohorts.
Recent advances in deep learning demonstrated that high-resolution hematoxylin and eosin (H&E)-stained slides can be used as input for computational models to efficiently predict bulk RNA expression [11–13]. Building on this, the increased availability of spatial transcriptomics data [5, 14–16] enabled the development of models to predict spatial transcriptomics from H&E images. However, accurately predicting spatial transcriptomics profiles remains challenging. Existing frame-works either struggle to effectively leverage morphological details and spatial context - especially those adapted from bulk RNA predictions - are computationally expensive or are limited to predicting only a small number of genes. For example, ST-Net uses a convolutional neural network (DenseNet-121 [17], pre-trained on ImageNet [18]) followed by a fully connected linear layer to predict the expression of 250 genes [19]. While Jaume and Doucet et al. [14] utilized recent pathology foundation models (e.g., UNI [20], Phikon [21]), pre-trained on extensive H&E datasets, to extract tile spot features, they focused on predicting only 50 genes using either ridge regression or random forest. Other methods, such as BLEEP [22], use contrastive learning to create a low-dimensional joint embedding and require access to the full training data during inference to perform k-nearest neighbor mapping. Furthermore, alternative methods employing vision transformers propose compressing the H&E image as a sequence of tiles and integrating their spatial location by encoding absolute pixel coordinates through positional embeddings to predict ∼750 genes [23–25]. However, the gigapixel resolution of pathology slides makes full-image processing impractical, requiring tile subsampling and potentially leading to the loss of important details. By contrast, the spatial tissue context, proven effective for representing spatial transcriptomics data [5], remains underutilized for predicting spatial transcriptomics from H&E images. Therefore, a new and reliable method is needed to overcome the aforementioned challenges, advance spatial transcriptomics prediction and enable its effective integration into clinical practice.
To address this, we developed DeepSpot, a novel deep-learning model that utilizes recent pathology foundation models and spatial multi-level tissue context to effectively predict spatial transcriptomics from H&E images. We compared its performance against previous state-of-the-art methods on multiple datasets sequenced using Visium from 10x Genomics™: 24 samples with kidney cancer, 18 with metastatic melanoma, 8 with colon cancer, and 6 samples enriched with tertiary lymphoid structures (TLS) from kidney and lung cancer. We demonstrated substantially improved gene correlations which enabled downstream spatial transcriptomics analysis leading to biomarker and pathway discovery. Furthermore, we applied DeepSpot to slide images from 1 792 TCGA patients with corresponding bulk RNA-seq. This not only served as an out-of-distribution validation but also generated a large multi-modal spatial transcriptomics dataset with over 37 million spots from patients with melanoma or kidney cancer. This dataset represents a unique resource that significantly enriches the available spatial transcriptomics data for TCGA samples, providing valuable insights into the molecular landscapes of cancer tissues (Figure 1).
DeepSpot predicts spatial transcriptomics from H&E images by leveraging recent foundation models in pathology and spatial multi-level tissue context. 1: DeepSpot is trained to predict 5 000 genes, with hyper-parameters optimized using cross-validation. 2: DeepSpot can be used for de novo spatial transcriptomics prediction or for correcting existing spatial transcriptomics data. 3: Validation involves nested leave-one-out patient cross-validation and out-of-distribution testing. We predicted spatial transcriptomics from TCGA slide images, aggregated the data into pseudo-bulk RNA profiles, and compared them with the available ground truth bulk RNA-seq. 4: DeepSpot generated 1 792 TCGA spatial transcriptomics samples with over 37 million spots from melanoma or kidney cancer patients, enriching the available spatial transcriptomics data for TCGA samples and providing valuable insights into the molecular landscapes of cancer tissues.
2 Results
2.1 DeepSpot leverages pathology foundation models and spatial tissue context
We introduce DeepSpot, a novel deep-learning model that leverages recent foundation models in pathology to effectively predict spatial transcriptomics from H&E images. DeepSpot employs a deep-set neural network [26] to model transcriptomic spots as bags of sub-spots and integrates multi-level tissue details along with spatial neighborhood morphology. This integration, supported by the robust foundation from pre-trained H&E models, significantly enhances the accuracy and granularity of gene predictions from H&E images at minimal extra cost (Figure 2, S1). The model is trained to predict the 5 000 most variable genes in a multiregression setting using spatial transcrip-tomics data. First, the H&E slides are split into tiles each corresponding to a transcriptomic spot. For each spot, we create a bag of sub-spots by splitting it into non-overlapping sub-tiles to capture local tissue morphology and a bag of neighboring spots to capture the global tissue environment. A pre-trained pathology foundation model (e.g., UNI [20], Phikon [21], H-optimus-0 [27]) is then used to extract tile feature representations. These features input DeepSpot’s ϕspot module, whose learned representations are concatenated and fed into the gene head predictor, ρgene, to predict the spatial transcriptomics profiles. Ultimately, the predicted digital spatial transcriptomics can be used for various downstream tasks, including spot phenotyping, gene expression analysis, and pathway and biomarker discovery.
Workflow of DeepSpot: H&E slides are first divided into tiles, each corresponding to a spot. For each spot, we create a bag of sub-spots by dividing it into sub-tiles that capture the local morphology, and a bag of neighboring spots to represent the global context. A pre-trained pathology foundation model extracts tile features, which are input to the model. The concatenated representations are then fed into the gene head predictor, ρgene, to predict spatial gene expression.
The motivation for modeling transcriptomic spots as bags of sub-spots is inspired by the fact that each spot contains between 1-10 cells (10x Genomics™, Visium). This approach captures finer details within the target spot effectively overcoming the resolution limitations of the sequencing technology while also learning the contributions of individual sub-spots. Additionally, DeepSpot integrates the global tissue environment by pooling neighboring spots and jointly learning the tissue landscape. Building on the strong foundation of recent models for H&E images, the deep sets architecture is particularly well-suited for representing spatial transcriptomics data due to its permutation invariance and permutation equivariance properties [26]. These characteristics make it a robust framework for modeling spatial transcriptomics, as it effectively captures the relationships between spots by focusing on biologically meaningful spatial patterns rather than the exact order of the spots. Furthermore, the shared weights of the ϕspot module are optimized simultaneously across multiple tasks, leading to better generalization and more efficient learning in the limited data context of spatial transcriptomics data (Figure 6A).
2.2 DeepSpot improves gene expression prediction
We benchmarked DeepSpot’s performance on multiple spatial transcriptomics datasets sequenced using Visium from 10x Genomics™ to predict the top 5 000 most variable genes as defined by Scanpy with the Seurat v3 flavor [28]. In line with the methodology of [14, 19, 22–25], we adopted Pearson correlation to measure the similarity between the predicted and the ground truth gene expression. We calculated correlations for the top 100 to 5 000 most variable genes (Figure 3A). To avoid hyper-parameter overfitting, we employed nested leave-one-out patient cross-validation, with an internal cross-validation loop for hyperparameter selection. Specifically, for each fold, we trained a model on the training set, selected hyperparameters on the validation set, and evaluated performance on the test set, ensuring that each set contains distinct patients. This process was iterated over all folds, and the resulting median Pearson correlation, along with the standard error, is reported (Figure 3A).
A: Benchmark of DeepSpot and previous state-of-the-art methods on four spatial transcriptomics datasets generated using Visium from 10x Genomics™. The y-axis represents the Pearson correlation between the predicted and ground truth gene expression. The x-axis represents Pearson correlation computed on the top N highly variable genes as defined by Scanpy with the Seurat v3 flavor [28]. Models are ordered based on their relative rank across datasets. B: H&E image and pathology annotation for slice MELIPIT-1-2 from the Tumor Profiler dataset [5] and C: Comparison of gene expression for SOX10 (melanoma marker [29, 30]). Stars denote the levels of statistical significance, with *** indicating p<0.001 as determined by bootstrapping.
DeepSpot consistently predicted improved spatial gene expression with higher Pearson correlation across all datasets compared to previous state-of-the-art methods (Figure 3A). For example, on the Tumor Profiler dataset [5], DeepSpot improved the Pearson correlation across the top 500 genes by 29%, increasing it from 0.24 (best competitor, MLP) to 0.31 (DeepSpot). The Tumor Profiler dataset comprises 18 tissue slices from 7 patients with metastatic melanoma. This improvement highlights the superior performance of DeepSpot in predicting spatial transcriptomics by leveraging recent foundation models in pathology and spatial multi-level tissue context. The vision transformer-based models (HisToGene, Hist2ST, and THItoGene) underperformed across all datasets, likely because their architecture depends on learning pathology features from scratch. In contrast, leveraging transfer learning with prior histopathology knowledge from pathology foundation models trained on extensive collections of clinical slides could offer a significant advantage (Figure S1).
To qualitatively illustrate the predicted spatial transcriptomics, we compare the expression for SOX10, a known melanoma marker gene [29, 30] for slice MELIPIT-1-2 with available ground truth pathology annotations (Figure 3B, C). The gene expression predictions from BLEEP (ρ = 0.57) and ST-Net (ρ = 0.61) exhibit low spatial resolution, with areas of high expression intensity concentrated at tumor boundaries. In contrast, linear regression (ρ = 0.64) and MLP (ρ = 0.64), utilizing pathology foundation model features, show greater contrast between normal and tumor tissues suggesting they are more effective at distinguishing between these conditions. However, the gene expression within the tumor structures appears noisy and random, lacking a clear pattern and offering less detailed information about the tumor microenvironment. Notably, DeepSpot achieved the highest gene correlation to the ground truth (ρ = 0.70), effectively distinguishing between normal and tumor tissues. DeepSpot produced a more consistent gene expression pattern within the tumor, with the highest levels of expression concentrated in the tumor’s core (Figure 3C).
2.3 DeepSpot enables digital spatial transcriptomics analysis
To demonstrate the utility of the digital spatial transcriptomics data, we use the DeepSpot predicted gene expression for slice MELIPIT-1-2 from the Tumor Profiler dataset to perform a common spatial transcriptomics analysis workflow [31]. First, we apply dimensionality reduction to visualize the data with randomly sampled morphology spots (Figure 4A). We observe an aggregation of tumor spots on the right side, with a gradual transition towards spots of necrosis, stroma, and normal lymphoid tissue on the left. Tumor spots appear significantly larger with enlarged nuclei; stroma cells have an elongated morphology and are noticeably more scattered; normal lymphoid cells are generally smaller and denser in line with the observations in [5].
A: UMAP plot of spatial transcriptomics predictions from DeepSpot for sample MELIPIT-1-2 from the Tumor Profiler dataset with sampled spot images. B: Matrix plot of the log fold change (logFC) of the top 10 marker genes per group derived from predicted spatial transcriptomics for the same sample. C: Gene set enrichment analysis using the top 100 tumor marker genes and the data from the Cancer Cell Line Encyclopedia [41]. D: Pathway analysis based on the predicted spatial transcriptomics using decoupleR [47].
Next, we performed a marker gene analysis using the Wilcoxon test. Among the identified tumor marker genes, several well-known melanoma-related genes (e.g., S100B [32], TYR [33], CSPG4 [34, 35], PMEL [36] and MLPH [37]) were found (Figure 4B). They are involved in melanoma progression and associated with increased invasiveness of melanoma cells [33, 38–40]. To further validate our findings, we performed enrichment analysis using the top 100 tumor marker genes in conjunction with data from the Cancer Cell Line Encyclopedia [41] (Figure 4C). All of the top 15 cell lines were related to skin melanoma, indicating a strong association with tumor biology relevant to these malignancies [42–46]. Additionally, the significant enrichment of specific pathways involved in cell proliferation and survival suggests potential therapeutic targets for further investigation.
Finally, we conducted a pathway analysis (Figure 4D) using decoupleR [47] which revealed increased activity of the MAPK pathway in tumor tissues. This pathway is known for promoting processes such as cell proliferation, invasion, metastasis, migration, survival, and angiogenesis [48–50]. Additionally, consistent with the findings of [5], we observed predominant hypoxia signatures in regions of necrosis and hemorrhage, while JAK-STAT inflammatory signaling was dominant within the cancer microenvironment. Overall, these transcriptomics analyses highlight the accurate gene expression predictions made by DeepSpot and reinforce their interpretation in the context of underlying biological processes.
2.4 DeepSpot overcomes spatial transcriptomics gene sensitivity
Higher resolution in spatial transcriptomics such as that produced by Visium from 10x Genomics often leads to lower gene sensitivity and spot contamination - known issues that result in poor down-stream performance and misleading results [51]. These challenges are further compounded by the reduced RNA quality in archival samples. To demonstrate how DeepSpot addresses these issues, we present tissue slice KC2 (Figure 5A) excluded from the experiments due to low transcriptomics quality (Figure S2). Specifically, regions annotated as tertiary lymphoid structures (TLS) did not show expression of known TLS marker genes (Figure 5B, red boxes) [52, 53], and unexpectedly, these genes were detected in other locations. This hold-out sample would typically have been discarded from downstream analysis. However, DeepSpot, trained on kidney and lung samples with TLS, successfully recovered this low-quality sample by effectively transferring tissue morphology patterns associated with TLS. As a result, DeepSpot accurately predicted the gene expression at positions corresponding to these morphological features (Figure 5C). Moreover, DeepSpot’s architecture enables gene expression prediction at super-resolution by focusing on a single sub-spot at the center of the spot, further augmenting patient data with fine-grained details at minimal additional cost (Figure 5D). For example, we predicted spatial transcriptomics for the same patient by reducing spot distance (4x) and diameter (3x) in pixels, which enhanced data resolution and improved spatial accuracy. The super-resolution allowed us to pinpoint the exact locations in the H&E image corresponding to the highest gene expression (e.g., LTB, Figure 5D). Specifically, one can observe that the highest LTB expression occurs in the center of the TLS formations potentially indicating the TLS germinal centers.
A: H&E image and pathology annotation for slice KC2 from the Lung/Kidney with TLS dataset, zoomed into TLS formations. B: Comparison of gene expression for LTB, CXCL13, MS4A1M, and CCL19 (known TLS markers) between ground truth data from 10x Genomics™, Visium and predictions from DeepSpot. C: Pearson correlation of gene expression for LTB, CXCL13, MS4A1M, CCL19, CORO1A, CD19, SPIB, and BLK across all spots in slice KC2. D: Spatial transcriptomics predictions for gene LTB with original spot distances of 130 px and spot diameter of 81 px and at super-resolution with spot distances of 30 px and spot diameter of 27 px in slice KC2.
2.5 DeepSpot predicts digital spatial transcriptomics for 1 792 TCGA slides
To assess out-of-distribution performance, we validated DeepSpot, trained on melanoma or kidney cancer spatial transcriptomics, using fresh frozen (FF) or formalin-fixed paraffin-embedded (FFPE) slides from 1 792 TCGA samples with corresponding bulk RNA-seq (Figure 6A). In summary, we predicted spatial gene expression for each slide in TCGA with melanoma (SKCM) or kidney cancer (KIRC) and then aggregated the expression to generate pseudo-bulk RNA profiles. We compared this pseudo-bulk RNA to the available ground truth bulk RNA-seq (Figure 6B). Notably, DeepSpot accurately predicted gene expression and outperformed previous state-of-the-art models also in out-of-distribution setting. In melanoma, the DeepSpot pseudo-bulk RNA profile derived from lower-quality FF slides correlated more closely with the paired bulk RNA from the same tissue piece compared to the larger FFPE slide. Although DeepSpot was trained on FFPE melanoma slides, its design, leveraging a pathology foundation model, enabled effective generalization to FF slides, highlighting its potential for reliable gene expression predictions (Figure 6B). Conversely, DeepSpot trained on lower-quality FF slides from the kidney dataset underscored the importance of high-quality training data for learning robust pathology representations. Although the model outperformed previous state-of-the-art models, the higher correlation of FFPE pseudo-bulk over FF pseudo-bulk suggests limitations imposed by the quality of the FF slides in training data. Moreover, DeepSpot’s predictions were more accurate for genes with greater variability in the training data. This highlights the need for larger spatial transcriptomics datasets that encompass a diverse range of genes to train more precise models.
A: Number of spots and samples per dataset. The x-axis represents the datasets, while the y-axis shows the number of spots. The number of samples is displayed above each dataset bar. B: Out-of-distribution performance on models trained on 10x Genomics™, Visium and gene expression predictions based on TCGA slides. The y-axis represents the Pearson correlation between the pseudo-bulk RNA and ground truth bulk RNA profile. The x-axis represents Pearson correlation computed on the top N highly variable genes as defined by Scanpy with the Seurat v3 flavor [28] on the training data. The line type indicates the data type used to generate the predictions (FF or FFPE). C: Survival analysis benchmark. The x-axis represents the concordance index (C-index), while the y-axis represents the data type. D: Tumor type classification benchmark. The x-axis represents the F1 score, while the y-axis represents the slide type. E: TCGA FFPE SKCM examples of spot images for SOX10 (melanoma), CD37 (normal lymphoid), and COL1A1 (stroma). F: TCGA FFPE KIRC examples of spots manually labeled as normal tissue and TLS. G: The DeepSpot-predicted expression of known TLS marker genes in spots manually labeled as Normal or TLS. The y-axis represents the predicted gene expression, standardized per slide. Stars denote the levels of statistical significance, with **** indicating p<0.0001 as determined by the Wilcoxon rank-sum test.
To further demonstrate the quality and robustness of our predicted spatial transcriptomics data, we trained downstream models on the pseudo-bulk RNA for several tasks, including survival analysis (Figure 6C) and tumor type classification (Figure 6D). Notably, the gene expression generated from DeepSpot outperformed previous state-of-the-art methods in survival analysis and served as a better predictor than the ground truth bulk RNA-seq (Figure 6C, S5). For instance, in the SKCM dataset, DeepSpot improved the concordance index by 2%, increasing it from 0.677 ± 0.001 (best competitor, STNet FFPE) to 0.691 ± 0.001 (DeepSpot FFPE). When compared to the ground truth bulk RNA-seq (0.600 ± 0.001), this represents a 13% improvement. Moreover, DeepSpot’s gene expression improves tumor type classification in SKCM, increasing the F1 score by 4%, from 0.818 0.001 (best competitor, STNet FFPE) to 0.852 ± 0.001 (DeepSpot FFPE), slightly surpassing the ground truth bulk RNA-seq - 0.846 ± 0.001 (Figure 6D, S6). Finally, we compared the DeepSpot-predicted gene expression of known TLS marker genes [52, 53] in FFPE KIRC samples with available manual annotations [54] (Normal tissue or TLS, Figure 6F). Notably, TLS marker genes were significantly overexpressed in spots annotated as TLS compared to those in normal tissue (Figure 6G), underscoring the robustness of DeepSpot-predicted spot-level gene expression in providing enhanced contextual insights.
These experiments resulted in a large multi-modal spatial transcriptomics dataset with over 37 million spots from 1 792 samples with melanoma or kidney cancer. It represents a unique resource that significantly enriches the available spatial transcriptomics data for TCGA samples, providing unique insights into the molecular landscapes of cancer tissues. To further illustrate the richness and utility of our dataset compared to the limited information from TCGA bulk RNA-seq, we compared the expression of the melanoma marker SOX10 in both bulk RNA data (Figure S3) and predicted spatial transcriptomics data (Figure S4) for SKCM. SOX10 expression is notably higher on the left side, indicating the presence of tumor spots, and gradually decreases towards the right. Additionally, we provide visual examples of predicted spots expressing SOX10 (melanoma), CD37 (normal lymphoid), and COL1A1 (stroma) (Figure 6E), which further validates the correlation between the morphological data and predicted transcriptomics. DeepSpot demonstrated to be a robust method for predicting spatial transcriptomics from H&E images, with the ability to generalize effectively to previously unseen and out-of-distribution pathology slides.
3 Discussion
In this work, we propose DeepSpot, a method that leverages pathology foundation models and spatial tissue context to effectively predict spatial transcriptomics from routine histology images. Our results consistently show improved spatial gene predictions over previous state-of-the-art methods across multiple datasets, including melanoma, colon, kidney and lung cancers sequenced using Visium from 10x Genomics™. DeepSpot demonstrated robustness to out-of-distribution H&E images from TCGA, producing accurate gene predictions. This led to the creation of a unique multimodal spatial transcriptomics dataset with over 37 million spots from 1 792 samples with melanoma or kidney cancer. We systematically demonstrated that the gene expression obtained from DeepSpot enables de novo spatial transcriptomics analysis, including clustering, marker gene identification, and pathway discovery.
This improvement in spatial transcriptomics prediction resulted from incorporating multi-level tissue details and spatial neighborhood information along with utilizing pathology foundation models trained on extensive datasets of H&E slides. Our approach addresses the limitations of sequencing technology resolution by representing spots as bags, aggregating local spot information, and pooling global tissue context to improve the accuracy of spatial transcriptomics predictions from H&E images. This proved beneficial across diverse datasets (Figure 3A, S1). In contrast, the linear regression and MLP baseline performed comparably to the previous state-of-the-art models, ST-Net and BLEEP, which use only the target spot as input. Moreover, BLEEP inference requires complete access to the training data for k-nearest neighbor mapping in the latent space, which may become impractical due to the increasing data size and various ethical and privacy requirements in clinical setting. Furthermore, the vision transformer models struggled to establish robust correlations between morphology and transcriptomics possibly due to their inability to capture the spatial tissue environment and their limitations in incorporating morphological features from pre-trained image models.
Several promising paths exist to enhance DeepSpot’s gene expression prediction. 1) Currently, we split each spot into a fixed number of non-overlapping sub-spots, but this approach could be improved by using a cell segmentation algorithm to detect cell locations and boundaries, enabling us to divide the spot based on identified cells. 2) While we utilized general pathology foundation models to extract tile representations, it would be beneficial to use models specifically tailored to particular tissue types or tasks that are more relevant to spatial transcriptomics prediction (e.g., Hover-Net [55], cell nuclei segmentation and classification). Additionally, we anticipate that advancements in pathology foundation models could further enhance DeepSpot’s performance. 3) Currently, all neighboring spots are assigned the same weight; however, as the neighboring radius increases, this assumption may not hold. Therefore, it is important to investigate how to account for this variability such as by setting fixed radius parameters or learning a neighboring weight function.
In the future, DeepSpot could be applied to accurately predict spatial transcriptomics from H&E images and generate large-scale multi-modal spatial transcriptomics datasets at scale. This advancement would make spatial transcriptomics more accessible to scientists and healthcare professionals, enhancing its potential for biological discovery and accelerating clinical adoption. To foster these efforts, we have released the code for DeepSpot along with examples demonstrating its usage. Furthermore, we anticipate that the predicted TCGA spatial transcriptomics dataset, containing over 37 million spots, will serve as a valuable resource providing unique insights into the molecular landscapes of cancer tissues. It will also establish a benchmark for evaluating the performance and explainability of spatial transcriptomics models and support new model development. We hope that DeepSpot and this dataset will stimulate further advancements in computational spatial transcriptomics analysis.
4 Methods & Materials
4.1 Transcriptomics preprocessing
We normalized the transcriptomics count for each spot by total counts over all genes and then scaled it to a factor of 10 000, followed by a log1p transformation. To identify the most variable genes, we applied the highly_variable_genes function using Scanpy [28] with the Seurat v3 flavor, considering all spots in each dataset and using a batch key based on the tissue slice to reduce the impact of slice-specific variability. From this, we selected the top 5 000 most variable genes per dataset, which were used as the targets for prediction.
4.1.1 Spatial oversampling with AESTETIK
Due to the skewed nature of the transcriptomics data, where genes might be expressed only in a small number of spots, we perform oversampling on the training data as follows: We utilize AESTETIK [5], a recent deep-learning model for spatial transcriptomics representation learning, to jointly integrate the spatial and transcriptomics modalities and project them into a lower-dimensional space for each slide. We then perform clustering in AESTETIK’s latent space using the Leiden algorithm with a resolution of 1 to obtain spot labels. Each cluster is subsequently oversampled to match the size of the largest cluster using resampling with replacement.
4.2 DeepSpot design
DeepSpot utilizes a deep-learning model based on deep-set neural networks [26]. The ϕspot module consists of a single fully connected layer with dropout regularization and ReLU activation. Its weights are shared across the three submodules—spot, sub-spot, and neighboring spots—enabling the model to perform multiple tasks simultaneously: 1) extracting features specific to each spot, 2) aggregating information from sub-spot features, and 3) maximum pooling features across the neighboring spots. The outputs from these submodules are concatenated and used as inputs to the ρgene module, which also consists of a single fully connected layer with dropout regularization and ReLU activation. The ρgene module predicts gene expression in a multiregression setting. To enhance network stability, we employ an ensemble architecture for both ϕspot and ρgene modules using random LeCun initialization [56]. The ensemble output is obtained by averaging the predictions from an ensemble of 10 ϕspot and 10 ρgene modules. DeepSpot is implemented in Python using PyTorch and PyTorch Lightning.
4.2.1 Model input
For each spot tile X with array coordinates x arraytarget and y arraytarget, we compute 3x3 non-overlapping sub-spot tiles and select the closest neighboring spot tiles within a radius r from the target spot’s location in the array (x arraytarget−r <= x array <= x arraytarget+r; y arraytarget− r <= y array <= y arraytarget + r). For each tile, we extract spot features using recent pathology foundation models (e.g., UNI [20], Phikon [21], H-optimus-0 [27]), which were pre-trained on extensive H&E datasets, and follow their recommended image preprocessing workflows. For each spot X, we construct three sets: one representing the spot itself, one representing the local tissue structure (3x3 - 9 sub-spots), and one representing the global tissue environment (neighboring spots located within a radius r).
4.2.2 Model output
We apply standard normalization per gene during training, using the mean and standard deviation computed from the training set. This process ensures that all genes are on the same scale, preventing the loss function from being dominated by a small subset of genes (Figure S1). In inference mode, we reverse the data transformation to restore the original gene ranges outlined in 4.1.
4.2.3 Training details
The model is optimized using the mean squared error loss function (Equation 1). It is trained on a single GPU with at least 12GB RAM with early stopping, using the Adam optimizer [57] with a learning rate of 1e-4, a weight decay of 1e-6 and a batch size of 1024. Computational data analysis was performed at Leonhard Med (https://sis.id.ethz.ch/services/sensitiveresearchdata/), a secure trusted research environment at ETH Zurich.
4.3 Evaluation
We evaluate the model’s performance using Pearson correlation per gene (Equation 2), employing nested leave-one-out patient cross-validation with an internal 3-fold cross-validation loop for hyper-parameter selection. Specifically, for each fold, we trained a model on the training set, selected hyperparameters on the validation set, and evaluated performance on the test set, ensuring that each set contained distinct patients. This process was repeated across all folds. We bootstrapped 10 000 times from the median Pearson correlation across the test folds and reported the resulting median Pearson correlation along with its standard error. For state-of-the-art methods, we utilized hyperparameter values recommended by the authors or discussed in their respective papers. The specific hyperparameter settings are included in the supplementary materials.
4.3.1 Out-of-distribution evaluation on TCGA
Moreover, for out-of-distribution evaluation on TCGA, we generated spatial transcriptomics samples from the available TCGA image slides (FFPE and FF) and applied the pre-trained DeepSpot model (trained on the full spatial transcriptomics dataset for the specific condition) to predict gene expression. We then computed pseudo-bulk expression by averaging gene expression across all spots per slide and measured the correlation with the corresponding bulk RNA-seq data from TCGA. While the FF slides and their corresponding bulk RNA profiles came from the exact same tissue piece, the FFPE slides with larger tissue areas lacked exact bulk RNA profiles. Therefore, we matched them to a bulk RNA profile based on patient-level identification and tumor type classification. We applied the pre-trained melanoma model to TCGA samples with skin melanoma (TCGA SKCM, n=472 FF; n=276 FFPE) and the kidney cancer to TCGA samples with kidney cancer (TCGA KIRC, n=528 FF, n=516 FFPE).
4.3.2 Ablation study
In the ablation study of DeepSpot, we follow the procedure outlined in 4.3, where we fix the hyper-parameter of interest and evaluate its effect on the Area under the Pearson gene correlation curve (Figure S1). We studied the impact of the neighbor spot radius, loss function, feature image model, normalization and oversampling.
4.4 Downstream applications
4.4.1 Predicted gene expression transformation
We enforce positive gene values by setting all values smaller than 0 to 0 and adding 1. This approach ensures compatibility with downstream tasks without introducing additional computational challenges (e.g., NaN values due to negative scores or floating point overflow) or requiring custom solutions.
4.4.2 Marker genes
For identifying marker genes, we use the rank_genes_groups function from Scanpy with the Wilcoxon signed-rank test. We select significant marker genes and rank them by their average log-fold change.
4.4.3 Gene set enrichment analysis
For the gene set enrichment analysis, we utilized the Python interface for EnrichR [58] in conjunction with data from the Cancer Cell Line Encyclopedia [41]. We selected the top 100 tumor marker genes based on their log-fold change.
4.4.4 Pathway analysis
For pathway analysis, we apply the multivariate linear model from the decoupler package [47] to assess regulatory pathway activities, using data from the PROGENy database [59].
4.4.5 Survival analysis on TCGA
We used the Cox proportional hazards model with an elastic net penalty [60] to conduct the survival analysis benchmark. We randomly sampled training sizes of 75, 100, and 125 patients, with the remaining patients used for evaluation. This process was repeated 1 000 times, and we reported the mean concordance index along with its standard error. When multiple samples were available per patient, we selected the larger sample based on tissue area.
4.4.6 Tumor type classification on TCGA
Following the procedure described in 4.4.5, we implemented logistic regression with an L2 penalty and evaluated the model using the F1 score (Figure S6).
4.4.7 Super-resolution gene expression prediction
To obtain super-resolution gene expression predictions, we utilize the already pre-trained DeepSpot. Rather than feeding all non-overlapping sub-spots into the model, we select only a sub-spot in the center and adjust the neighbor spot distances accordingly.
4.5 Spatial trancsriptomics data
We used spatial transcriptomics datasets generated using Visium from 10x Genomics™ with a capture area of 6.5x6.5mm and a spot diameter of 55 μm.
4.5.1 Tumor Profiler
The Tumor Profiler dataset [5] consists of 18 tissue slices from 7 patients with metastatic melanoma and includes pathology annotations with the following labels: Tumor, Normal lymphoid tissue, Blood-/Necrosis, Stroma, and Pigment. The dataset consists of FFPE sections processed using Visium technology. The H&E slices and corresponding pixel coordinates were scaled to 20x magnification.
4.5.2 Kidney cancer HEST-1K and Colon cancer HEST-1K
The kidney cancer HEST-1K [61] dataset consists of 24 tissue slices from 24 patients with kidney cancer. The colon cancer HEST-1K [62] consists of 8 tissue slices from 4 patients. Both datasets consist of fresh frozen sections processed using Visium technology and were downloaded from the HEST-1K [14]. Although the H&E slices were originally available at 40x magnification, to ensure compatibility with the TCGA slides, which are a mixture of 20x and 40x, we scaled the H&E slices and corresponding pixel coordinates to 20x magnification.
4.5.3 Kidney and Lung cancer with TLS
The kidney and lung cancer with tertiary lymphoid structures (TLS) dataset consists of 5 μm thick FFPE sections from kidney (3) and lung (5) tumors obtained from the Insitute of Pathology at the University Hospital of Zurich and mounted onto the Visium slides with the Human Probe Set v1. Samples were stained with hematoxylin and eosin and subsequently processed for sequencing following the manufacturers’ recommendations. After library preparation, the samples were sequenced on an Illumina Novaseq 6000 and preprocessed with Space Ranger v2.1.0. Patient analyses were conducted according to the Declaration of Helsinki. Ethical approval for performing research on anonymized, archival patient material was obtained from the cantonal ethics commission Zurich (BASEC Nr. 2022-01854 and BASEC-Nr. 2024-01428). Spatial transcriptomics sequencing was performed at the Functional Genomics Center Zurich (FGCZ) of University of Zurich and ETH Zurich. The Visium spots were annotated in the corresponding H&E images by expert researchers (K.S. and S.D.) and included manual annotations with the following labels: TLS, Immune, Tumor, Normal, and Unassigned. One lung (LC4) and one kidney (KC2) slice were excluded from the analysis due to a misalignment between the spatial transcriptomics data and the expected morphological features (Figure S2).
4.5.4 TCGA
For out-of-distribution validation and de novo spatial transcriptomics prediction, we used patient metadata, image slides (fresh frozen - FF and formalin-fixed paraffin-embedded - FFPE) and paired bulk RNA-seq from the TCGA Research Network (https://www.cancer.gov/ccg/research/genome-sequencing/tcga) from samples with skin melanoma (TCGA SKCM), and kidney cancer (TCGA KIRC). When multiple FF sections were available for the same tissue sample, the top section was selected. All slides were scaled to 20x magnification. To distinguish between in-tissue spots and background, we computed the mean RGB value for each tile and discarded all tiles with a mean greater than 200 (close to white).
4.6 Data availability
The kidney cancer HEST-1K and colon cancer HEST-1K datasets were downloaded using the HEST-1K download pipeline outlined here: https://github.com/mahmoodlab/HEST/blob/main/tutorials/1-Downloading-HEST-1k.ipynb, with data available at Hugging Face https://huggingface.co/datasets/MahmoodLab/hes. The Tumor Profiler spatial transcriptomics dataset was obtained from [5]. The lung and kidney cancer with TLS dataset is available at https://zenodo.org/records/14620362.
TCGA spatial transcriptomics samples are available for download on Hugging Face. Access details can be found at https://github.com/ratschlab/DeepSpot.
4.7 Code availability
The open-source implementation of DeepSpot along with a tutorial is available at:
https://github.com/ratschlab/DeepSpot
The Snakemake pipeline for reproducing the results is available at:
Consent for publication
This manuscript has been seen and approved by all listed authors. The figures were created using BioRender.com and exported under a paid subscription.
Funding
We gratefully acknowledge funding from the Tumor Profiler Initiative and the Tumor Profiler Center (to V.H.K., G.R.). The Tumor Profiler study is jointly funded by a public-private partnership involving F. Hoffmann-La Roche Ltd., ETH Zurich, University of Zurich, University Hospital Zurich, and University Hospital Basel. We also acknowledge funding of K.N. by Swiss National Science Foundation (SNSF) grants 220127 (to G.R.) and 201656, ETH core funding (to G.R.), UZH core funding (to V.H.K.), funding by the Promedica Foundation grant F-87701-41-01 (to V.H.K.), SNSF Prima grant PR00P3-201656 (to K.S.) and funding from the Swiss Federal Institutes of Technology strategic focus area of personalized health and related technologies project 2021-367 (to G.R., V.H.K., S.A.).
Conflict of interest/Competing interests
V.H.K reports being an invited speaker for Sharing Progress in Cancer Care (SPCC) and Indica Labs; advisory board of Takeda; and sponsored research agreements with Roche and IAG, all unrelated to the current study. VHK is a participant in a patent application on the assessment of cancer immunotherapy biomarkers by digital pathology; a patent application on multimodal deep learning for the prediction of recurrence risk in cancer patients, and a patent application on predicting the efficacy of cancer treatment using deep learning all unrelated to the current work. GR is a participant in a patent application on matching cells from different measurement modalities which is not directly related to the current work. Moreover, G.R. is a cofounder of Computomics GmbH, Germany, and one of its shareholders.
TUMOR PROFILER CONSORTIUM
Rudolf Aebersold5, Melike Ak33, Faisal S Al-Quaddoomi12,22, Silvana I Albert10, Jonas Albinus10, Ilaria Alborelli29, Sonali Andani9,22,31,36, Per-Olof Attinger14, Marina Bacac21, Daniel Baumhoer29, Beatrice Beck-Schimmer44, Niko Beerenwinkel7,22, Christian Beisel7, Lara Bernasconi32, Anne Bertolini12,22, Bernd Bodenmiller11,40, Ximena Bonilla9, Lars Bosshard12,22, Byron Calgua29, Ruben Casanova40, Stéphane Chevrier40, Natalia Chicherova12,22, Ricardo Coelho23, Maya D’Costa13, Esther Danenberg42, Natalie R Davidson9, Monica-Andreea Dragan7, Reinhard Dummer33, Stefanie Engler40, Martin Erkens19, Katja Eschbach7, Cinzia Esposito42, André Fedier23, Pedro F Ferreira7, Joanna Ficek-Pascual1,9,16,22,31, Anja L Frei36, Bruno Frey18, Sandra Goetze10, Linda Grob12,22, Gabriele Gut42, Detlef Günther8, Pirmin Haeuptle3, Viola Heinzelmann-Schwarz23,28, Sylvia Herter21, Rene Holtackers42, Tamara Huesser21, Alexander Immer9,17, Anja Irmisch33, Francis Jacob23, Andrea Jacobs40, Tim M Jaeger14, Katharina Jahn7, Alva R James9,22,31, Philip M Jermann29, André Kahles9,22,31, Abdullah Kahraman22,36, Viktor H Koelzer36,41, Werner Kuebler30, Jack Kuipers7,22, Christian P Kunze27, Christian Kurzeder26, Kjong-Van Lehmann2,4,9,15, Mitchell Levesque33, Ulrike Lischetti23, Flavio C Lombardo23, Sebastian Lugert13, Gerd Maass18, Markus G Manz35, Philipp Markolin9, Martin Mehnert10, Julien Mena5, Julian M Metzler34, Nicola Miglino35,41, Emanuela S Milani10, Holger Moch36, Simone Muenst29, Riccardo Murri43, Charlotte KY Ng29,39, Stefan Nicolet29, Marta Nowak36, Monica Nunez Lopez23, Patrick GA Pedrioli6, Lucas Pelkmans42, Salvatore Piscuoglio23,29, Michael Prummer12,22, Prélot, Laurie9,22,31, Natalie Rimmer23, Mathilde Ritter23, Christian Rommel19, María L Rosano-González12,22, Gunnar Rätsch1,6,9,22,31, Natascha Santacroce7, Jacobo Sarabia del Castillo42, Ramona Schlenker20, Petra C Schwalie19, Severin Schwan14, Tobias Schär7, Gabriela Senti32, Wenguang Shao10, Franziska Singer12,22, Sujana Sivapatham40, Berend Snijder5,22, Bettina Sobottka36, Vipin T Sreedharan12,22, Stefan Stark9,22,31, Daniel J Stekhoven12,22, Tanmay Tanna7,9, Alexandre PA Theocharides35, Tinu M Thomas9,22,31, Markus Tolnay29, Vinko Tosevski21, Nora C Toussaint12,22, Mustafa A Tuncel7,22, Marina Tusup33, Audrey Van Drogen10, Marcus Vetter25, Tatjana Vlajnic29, Sandra Weber32, Walter P Weber24, Rebekka Wegmann5, Michael Weller38, Fabian Wendt10, Norbert Wey36, Andreas Wicki35,41, Mattheus HE Wildschut5,35, Bernd Wollscheid10, Shuqing Yu12,22, Johanna Ziegler33, Marc Zimmermann9, Martin Zoche36, Gregor Zuend37
1AI Center at ETH Zurich, Andreasstrasse 5, 8092 Zurich, Switzerland, 2Cancer Research Center Cologne-Essen, University Hospital Cologne, Cologne, Germany, 3Cantonal Hospital Baselland, Medical University Clinic, Rheinstrasse 26, 4410 Liestal, Switzerland, 4Center for Integrated Oncology Aachen (CIO-A), Aachen, Germany, 5ETH Zurich, Department of Biology, Institute of Molecular Systems Biology, Otto-Stern-Weg 3, 8093 Zurich, Switzerland, 6ETH Zurich, Department of Biology, Wolfgang-Pauli-Strasse 27, 8093 Zurich, Switzerland, 7ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058 Basel, Switzerland, 8ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 1-5/10, 8093 Zurich, Switzerland, 9ETH Zurich, Department of Computer Science, Institute of Machine Learning, Universitätstrasse 6, 8092 Zurich, Switzerland, 10ETH Zurich, Department of Health Sciences and Technology, Otto-Stern-Weg 3, 8093 Zurich, Switzerland, 11ETH Zurich, Institute of Molecular Health Sciences, Otto-Stern-Weg 7, 8093 Zurich, Switzerland, 12ETH Zurich, NEXUS Personalized Health Technologies, Wagistrasse 18, 8952 Zurich, Switzerland, 13F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland, 14F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland,, 15Joint Research Center Computational Biomedicine, University Hospital RWTH Aachen, Aachen, Germany, 16Life Science Zurich Graduate School, Biomedicine PhD Program, Winterthurerstrasse 190, 8057 Zurich, Switzerland, 17Max Planck ETH Center for Learning Systems,, 18Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany, 19Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Grenzacherstrasse 124, 4070 Basel, Switzerland, 20Roche Pharmaceutical Research and Early Development, Roche Innovation Center Munich, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany, 21Roche Pharmaceutical Research and Early Development, Roche Innovation Center Zurich, Wagistrasse 10, 8952 Schlieren, Switzerland, 22SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland, 23University Hospital Basel and University of Basel, Department of Biomedicine, Hebelstrasse 20, 4031 Basel, Switzerland, 24University Hospital Basel and University of Basel, Department of Surgery, Brustzentrum, Spitalstrasse 21, 4031 Basel, Switzerland, 25University Hospital Basel, Brustzentrum & Tumorzentrum, Petersgraben 4, 4031 Basel, Switzerland, 26University Hospital Basel, Brustzentrum, Spitalstrasse 21, 4031 Basel, Switzerland, 27University Hospital Basel, Department of Information- and Communication Technology, Spitalstrasse 26, 4031 Basel, Switzerland, 28University Hospital Basel, Gynecological Cancer Center, Spitalstrasse 21, 4031 Basel, Switzerland, 29University Hospital Basel, Institute of Medical Genetics and Pathology, Schönbeinstrasse 40, 4031 Basel, Switzerland, 30University Hospital Basel, Spitalstrasse 21/Petersgraben 4, 4031 Basel, Switzerland, 31University Hospital Zurich, Biomedical Informatics, Schmelzbergstrasse 26, 8006 Zurich, Switzerland, 32University Hospital Zurich, Clinical Trials Center, Rämistrasse 100, 8091 Zurich, Switzerland, 33University Hospital Zurich, Department of Dermatology, Gloriastrasse 31, 8091 Zurich, Switzerland, 34University Hospital Zurich, Department of Gynecology, Frauenklinikstrasse 10, 8091 Zurich, Switzerland, 35University Hospital Zurich, Department of Medical Oncology and Hematology, Rämistrasse 100, 8091 Zurich, Switzerland, 36University Hospital Zurich, Department of Pathology and Molecular Pathology, Schmelzbergstrasse 12, 8091 Zurich, Switzerland, 37University Hospital Zurich, Rämistrasse 100, 8091 Zurich, Switzerland, 38University Hospital and University of Zurich, Department of Neurology, Frauen-klinikstrasse 26, 8091 Zurich, Switzerland, 39University of Bern, Department of BioMedical Research, Murtenstrasse 35, 3008 Bern, Switzerland, 40University of Zurich, Department of Quantitative Biomedicine, Winterthurerstrasse 190, 8057 Zurich, Switzerland, 41University of Zurich, Faculty of Medicine, Zurich, Switzerland, 42University of Zurich, Institute of Molecular Life Sciences, Winterthurerstrasse 190, 8057 Zurich, Switzerland, 43University of Zurich, Services and Support for Science IT, Winterthurerstrasse 190, 8057 Zurich, Switzerland, 44University of Zurich, VP Medicine, Künstlergasse 15, 8001 Zurich, Switzerland
Acknowledgments
This work was supported by the Swiss Federal Institutes of Technology (strategic focus area of personalized health and related technologies; 2021–367). The 10x spatial transcriptomics sequencing of a subset of the Tumor Profiler samples was made possible through a technology access program by 10x Genomics™, with special acknowledgments to Jacob Stern, James Chell, Rudi Schläfli, Laura Lipka, Mario Werner, Nikhil Rao, and Scott Brouilette for their invaluable contributions. The Tumor Profiler study was supported by a public-private partnership involving Roche Holding AG, ETH Zurich, University of Zurich, University Hospital Zurich, and University Hospital Basel.
References
- [1].↵
- [2].
- [3].↵
- [4].↵
- [5].↵
- [6].
- [7].
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].
- [13].↵
- [14].↵
- [15].
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].
- [40].↵
- [41].↵
- [42].↵
- [43].
- [44].
- [45].
- [46].↵
- [47].↵
- [48].↵
- [49].
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵