Summary
Background Liquid biopsy based on cell-free DNA (cfDNA) is an established approach in clinical diagnostics. In recent years, a fraction of cfDNA comprising short fragments has been discovered, that is enriched at gene promoters and binding sites of DNA-binding proteins. However, the diagnostic potential of such short double-stranded cell-free DNA (footprint DNA) remains to be fully explored. Therefore, we characterized the clinical utility of footprint DNA in septic patients.
Methods We enriched for footprint DNA based on size selection and subsequent high-throughput DNA sequencing to receive an unbiased, genome-wide picture of the host response to the infection. Footprint DNA occupancies were analyzed for correlation with clinical metrics including urea, hemoglobin, or alanine aminotransferase (ALT). Additionally, footprint DNA markers were benchmarked by read and receiver operating curve (ROC) analysis against procalcitonin (PCT) as an established marker for infection status as well as against clinical parameters for early death prediction.
Findings We found that levels of occupancy of footprint DNA at defined genomic loci semi-quantitatively correlated with physiological markers like ALT or urea from major organ systems including liver or kidney. In a small proof-of-concept cohort, differential signatures of DNA footprints distinguished between patient groups with bacterial and viral infections with an area under the ROC (AUROC) of 1.0, which is considerably better than PCT with an AUROC of 0.75. Likewise, footprint DNA could also predict early death in septic patients with an AUROC of 0.983, compared to the SOFA (Sepsis-related organ failure assessment) score with an AUROC of 0.76.
Interpretation Our findings show that footprint DNA delivers quantitative information on physiology at the DNA level, demonstrating its diagnostic and prognostic potential. Identified footprint biomarker regions could be helpful in the clinical assessment of septic patients and other complex diseases outperforming current state-of-the-art clinical diagnostics.
Funding This study was financed with internal funds from Fraunhofer society.
Introduction
The potential of liquid biopsy in clinical diagnostics has revolutionized the field of clinical diagnostics for multiple indications [1]. Especially cell-free DNA is widely used as a biomarker class for prognosis and diagnosis in non-invasive prenatal testing, cancer, sepsis and organ rejection, among others [2–4]. Exemplarily, several studies underline the potential of cfDNA based biomarkers for clinical sepsis diagnostics [5–7]. For pathogen detection in sepsis, it is well known that the clinical gold standard of culture-based analysis (e.g., blood culture) is associated with considerable limitations [8–10]. Using metagenomic high-throughput sequencing of plasma cfDNA, a six times higher sensitivity for pathogen detection than blood culture can be achieved [11]. More recently, in contrast to regular cfDNA, with an average fragment size of 167 bp, a very small subset of double-stranded cfDNA with a significantly shorter fragment length of 35 - 80 bp has been found to be protected by non-histone DNA-binding proteins, like transcription factors [12,13]. Using ultra-deep high-throughput DNA-sequencing, it has been described that short cfDNA directly maps to genomic regions that are occupied by DNA binding proteins like transcription factors, indicating first potential for clinical application for example in breast cancer diagnostics [12,14]. Alternative approaches to infer the transcription factor occupancy make use of nucleosomal positioning therefore detecting open chromatin, where DNA regulatory binding protein may bind [15]. Quite recently, we have developed a workflow to specifically enrich short double-stranded cfDNA from plasma (footprint DNA) and could show that these fragments accumulate at open chromatin and gene regulatory elements [16]. The signals extracted from the footprint DNA were found to have the clinical potential to distinguish between two different types of gastrointestinal cancer as well as to discriminate sepsis from non-infected postoperative controls and healthy individuals [16].
Sepsis, defined as a “life-threatening organ dysfunction caused by a dysregulated host response to infection” [17], is a major health threat with high mortality rates and a global incidence of estimated 48.9 mio sepsis cases with 11 mio deaths per year [18]. However, precise and reliable biomarkers for sepsis diagnosis and prognosis are still in their infancies. One standard assessment for sepsis diagnosis represents the Sequential (Sepsis-related) Organ Failure Assessment (SOFA) score, which evaluates multiorgan failure in ICU patients [19] and has been described to have predictive value for mortality risk [20,21]. Another important clinical parameter frequently used in sepsis is procalcitonin (PCT) which is an indicator for bacterial sepsis and primarily suggested to guide antibiotic treatment [22–24]. Still, PCT and SOFA, although routinely used in clinical practice, do not represent reliable predictors of infection and outcome, thus indicating that further research is needed to overcome limitations and to establish more precise biomarkers [25,26].
In this study, we evaluated the diagnostic potential of footprint DNA in the context of sepsis. Consequently, we identified footprint DNA target regions that have the potential to semi-quantitatively retrace patient physiology. Based on the host response, footprint DNA allowed to more accurately distinguish between bacterial and viral sepsis and to predict early deaths versus survivors better than established biomarkers, including SOFA and PCT.
Methods
Ethics Approval and patient consent
Septic patient samples from the monocentric clinical study S-097/2013, conducted in the surgical intensive care unit of Heidelberg University Hospital (November 2013 - January 2015, German Clinical Trials Register: DRKS00005463) [11] and from the multicentric study S-084/2017 (March 2019 - August 2020, German Clinical Trials Register: DRKS00011911, ClinicalTrials.gov: NCT03356249) [27] were used, with study and control patients or their legal representatives signing written informed consent. Both studies were conducted in accordance with the Declaration of Helsinki and the professional code of conduct for physicians of the competent state medical association in the current versions, approved by the Institutional Ethics Committee of the Medical Faculty of Heidelberg University. Samples from S-097/2013 were used for correlating clinical metrics with footprint DNA signals [11,28]. For infection type and outcome prediction, septic samples from the study S-084/2017 [27] and postoperative, non-infection controls from S-097/2013 [11,28] were used.
Blood plasma preparation and cell-free DNA isolation
Plasma and cfDNA was isolated as described previously [11]. CfDNA was isolated with the QIAsymphony SP and the QIAsymphony DSP Circulating DNA Kit as well as the QiaCube connect and the QIAamp MinElute ccfDNA kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Eluted cfDNA was quantified with the Qubit dsDNA HS Assay Kit (Thermo Fisher, Waltham, USA) and cfDNA quality was assessed by the Fragment Analyzer High Sensitivity DNA Kit (Agilent, Santa Clara, California, USA).
Library generation and Next-Generation sequencing
Footprint DNA sequencing libraries from isolated cfDNA were prepared as already described in [16], with 15 ng of cfDNA as input and final elution in 20 µL of nuclease-free water. Library generation with the NEXTFLEX Cell free DNA-Seq Kit (V2) (Perkin Elmer, Pittsburgh, USA) was automated using the Biomek FXP (Beckman Coulter, Brea, USA). Library quality was assessed with the Fragment Analyzer High Sensitivity DNA Kit (Agilent, Santa Clara, USA) and concentration was measured by the Qubit dsDNA HS Assay Kit (Thermo Fisher, Waltham, USA). Size selection of cfDNA libraries was performed as described in [16] using BluePippin and 3% agarose cassettes (Sage Science, Beverly, USA) with one minor modification: following the second size selection, samples were purified with 1.8 x the volume of AMPureXP beads (Beckman Coulter, Brea, USA). Size selection performance was evaluated by the Fragment Analyzer High Sensitivity DNA Kit (Agilent, Santa Clara, USA) and by Qubit dsDNA HS Assay Kit (Thermo Fisher, Waltham, USA). After size selection and molarity normalization, sequencing was performed with a NextSeq 2000 (Illumina, San Diego, USA) using 100 bp single read kits with a targeted sequencing depth of 50 million reads per sample.
Raw data processing
After quality control of raw sequencing reads with FastQC (v0.12.1), the following steps were performed to remove sequencing artifacts: 1) Removal of sequencing adapters, terminal polyG sequences (min 10 G’s), and quality trimming (BBTools - bbduk.sh v39.01). 2) Removal of terminal single A introduced by library preparation (BBTools - bbduk.sh v39.01). 3) Size selection of sequenced reads allowing lengths greater than 20 bp and smaller than 60 bp (BBTools - bbduk.sh v39.01). 4) Removal of sequencing reads with dust scores smaller than 7 (prinseq-lite v0.20.4) [29–31]. 5) Processed reads were mapped to the human reference genome GRCh37 using NextGenMap (v0.5.5) with default settings [32]. 6) Mapped reads were deduplicated with samtools rmdup (v1.6), reads in blacklisted regions were removed, and reads with a MAPQ value < 1 were removed with samtools view (v1.6) [33,34]. Resulting reads were converted to BigWig for visualization and other downstream analyses with deeptools (bin size = 10, normalization = counts per million (CPM), bamCoverage v3.5.2) [35].
Peak calling and annotation
For footprint DNA data, peaks were called with MACS2 callpeak (narrow: --nomodel --extsize 32 --call-summits --min-length 30 -q 0.05; v2.2.9.1) [36]. Consensus peaks of a condition were identified with R if narrow peaks were identified in at least fifty percent of samples of a given condition. In addition, consensus peaks less than 31 nucleotides apart were merged. Consensus peaks were annotated to genomic functional elements with bedtools (nuc; v2.30.0) [37] including gene bodies, CpG islands, cis-regulatory elements (CREs) and transcription factor binding sites (TFBS). References for functional elements are provided on github and were originally retrieved from UCSC (CpG islands: UCSC TableBrowser, track name=cpgIslandExt; TFBS: UCSC TableBrowser, track name=encRegTfbsClustered) and ENCODE (CREs: https://doi.org/doi:10.17989%2FENCSR461KLY, https://doi.org/doi:10.17989%2FENCSR471KRT, https://doi.org/doi:10.17989%2FENCSR676GWZ, https://doi.org/doi:10.17989%2FENCSR405FRQ).
Transcription factor motif enrichment analysis
Enriched transcription factor motifs were identified with MEME (AME; --scoring avg -- method fisher; v5.4.1) [38]. Input DNA sequences were retrieved from consensus peaks with bedtools (nuc; v2.30.0) and analyzed for enrichment of motifs in the HOCOMOCOv11_core_HUMAN_mono database [37,39]. DNA sequences from each condition’s consensus peak set served as test or control, e.g., consensus peak DNA sequences from bacterial sepsis as test for consensus peak DNA sequences from viral sepsis as control. Motifs with adjusted p-values < 0.05 were considered significantly enriched.
Identification of differentially occupied genomic regions
Differentially enriched genomic regions were identified from the respective consensus peak sets of two conditions of sepsis outcomes or sepsis infection types. Genomic GC content of the consensus peak set used in a comparison was extracted with bedtools (nuc; v2.30.0), for later normalization of read counts [37]. Raw read counts were obtained with deeptools (multiBamSummary; v3.5.2) [35]. The read count matrix, GC content information and experiment design were used as input for differential enrichment analysis in R with EDASeq and edgeR packages (EDASeq v2.32.0; withinLaneNormalization (y=’GC’, which=’full’), betweenLaneNormalization (which=’full’), glmFit, glmLRT) [40,41]. Genomic regions with an adjusted p-value < 0.05 and a |log2(fold change)| > 0.5 were considered as differentially enriched regions (DERs). In addition, regions were ranked by p-values per fold change direction. From the ranked regions, the top 5 per fold change direction were selected. The top 5 DERs for the early death condition of the outcome comparison also includes regions with an adjusted p-value > 0.05. Normalized read counts from top regions were visualized as heatmaps with gplots (heatmap.2; v3.1.3; log2(normalized read count + 1), z-scaling per region, clustering by regions and samples using Euclidean distance and ward.D2 clustering). Normalized read counts of the top regions per condition were used as input for a principal component analysis (PCA) in R (prcomp; center=TRUE, .scale=TRUE). The first two principal components were evaluated for their discriminative power of the experimental conditions with receiver operating characteristic (ROC) analysis in R (pROC; v1.18.0).
For the infection type comparison, bacterial and viral sepsis samples were separately compared to non-infected control samples. All three comparisons, bacterial vs. viral, bacterial vs. no infection, and viral vs. no infection, yielded each 10 top DERs, in total 30. The GC content, sequencing depth, and region size normalized read counts from these regions were used for PCA.
Correlation analyses between clinical metrics and short cfDNA
For the correlations between footprint DNA and clinical metrics, two patient groups were used. The ’Identification’ cohort (n = 28) included time course samples from six sepsis patients over 21 days, with eight samples missing due to unavailability or exclusion as described earlier [11]. Afterwards, identified correlates were validated with independent samples of varying time points from additional sepsis patients (‘Test’ cohort, n = 34). Correlations were derived from read counts of genomic bins to ensure an analysis that encompasses the entire genome, rather than focusing solely on regions with detectable peaks in the samples. This approach allows the detection of variations across the full dynamic range of read counts, including low or zero values, which might be missed by peak-calling methods. The genome was split in 30,962,143 bins of length 100nt and the short cfDNA reads per bin were counted. Read counts of bins were adjusted for GC content in R with EDASeq (EDASeq v2.32.0; withinLaneNormalization (y=’GC’, which=’full’)) [40]. After GC adjustment, values were transformed to transcripts per million (TPM), to account for the sequencing depth of each sample [42]. Pearson correlations were calculated between normalized read counts of each bin and the following clinical metrics: activated partial thromboplastin time, alanine aminotransferase, albumin, alkaline phosphatase, aspartate aminotransferase, bilirubin, C-reactive protein, creatinine, erythrocytes, hemoglobin, leukocytes, procalcitonin, Quick’s value, thrombocytes, and urea. Correlation values were computed for each patient individually and all samples combined. For the latter , p-values were adjusted for multiple testing using the false discovery rate (FDR) [43]. For metrics with over 50 bins having an FDR < 0.05, the top 50 bins were selected based on the sum of the combined and patient-wise correlation values. The correlates from the identification cohort were evaluated for robustness with the test cohort. This evaluation was based on the correlation value in the test cohort and the mean absolute error (MAE) to a linear model fitted on the identification cohort. Outliers, defined as values outside the identification cohort’s clinical metric data range, were excluded from thew test cohort. Correlates with the strongest correlation and smallest MAE were manually manual inspected to ensure that the correlation was not overestimated by a subset of data points. All correlation analyses were executed with python.
Statistics
Summary statistics for sepsis infection type and sepsis outcome cohorts were calculated in R (tidytlg; v0.1.4). For numerical metrics, mean with standard deviation, median, range, and interquartile range are reported. Categorical metrics are reported as percentages. Numerical metrics were tested for difference of medians using a two-sided Wilcoxon rank-sum test. Categorical metrics were tested for differences in proportions using a two-sided Fisher’s exact test. The discriminatory power of numerical metrics for the patient groups in the two cohorts was evaluated using the area under the receiver operating characteristic (AUROC) curve calculated in R (pROC; v1.18.0). 95 % confidence interval values of AUROC values are reported and were calculated using DeLong’s method. Differences between ROC curves were tested for significance with the bootstrap method of pROC. Optimal classification thresholds are provided that have been calculated using the Youden criterion.
Optimal classification thresholds derived from ROC analyses were used to categorize patients in the outcome cohort into high-risk or low-risk groups for mortality. Subsequently, time-to-event analyses were conducted to evaluate potential differences between these groups. Cox proportional hazards models were fitted using R, and significance of differences was assessed with pairwise_survdiff function from survminer (v0.4.9) and survival (v3.5.7). Hazard ratios were calculated and tested for statistical significance using the Wald test.
Role of the funding source
This study was financed with internal funds from the Fraunhofer society, which had no direct role in the study design, in the collection, analysis, and interpretation of data, in the writing of the manuscript, or in the decision to submit the paper for publication.
Results
Characterization of footprint DNA
Released DNA in the bloodstream gets rapidly degraded by DNA digesting enzymes, like DNase I, generating circulating cell-free DNA (cfDNA). Histones and other DNA binding proteins (DBP), like transcription factors, protect cfDNA from degradation (Figure 1a). Footprint DNA fragments originating from DBPs are generally shorter compared to cfDNA from histones, which facilitates specific enrichment and high-throughput sequencing of these fragments. Sequencing reads of footprint DNA are enriched at regulatory genomic sites like promoters, enhancers, transcription factor binding sites (TFBS), at CpG islands or in proximity to transcription start sites (TSS) [12,44] (Figure 1a, b, Supplement Figure 1 and Supplement Figure 2). An exemplary genomic region with enrichment of footprint DNA reads is shown for the NOCT gene (Figure 1b). Enrichment and sequencing of footprint DNA provides genome-wide information on regulatory interactions between DBPs and the genome, which correlates with the host response of sepsis patients and thus can be used to identify valuable biomarkers. Accordingly, genomic regions with differentially enriched footprint DNA reads, potentially revealing bona fide biomarkers, were inferred from peak calls, while correlations between footprint DNA signals and clinical metrics were identified from genomic partitions (bins) of 100 bp in size (Figure 1b zoom).
a) CfDNA is protected from enzymatic digestion in the bloodstream either by histones (purple) or other DNA-binding proteins (DBP, footprint DNA, green). Isolating cfDNA from the plasma of patients, enriching DBP-bound footprint DNA according to size and sequencing it yields amplified signals at regulatory elements such as promoters or transcription start sites (TSS) of genes. Created in BioRender [Sonntag M., 2025, https://BioRender.com/i06x610]. Exemplary footprint DNA read enrichment at the NOCT gene promoter, enhancer, CpG island, and various TFBS. Either by peak calling (left) or by dividing the human genome into bins, footprint DNA signals can be extracted and analyzed (right, zoom in).
Footprint DNA contains information about sepsis patient physiology
Footprint DNA provides genome-wide information about the interaction of regulatory proteins with the genome. Next, we investigate whether this information can be linked to known physiological functions measured by established clinical parameters. Blood and plasma samples were collected from sepsis patients over a period of 21 days for routine clinical measurements and plasma extraction (Figure 2a). Patient characteristics and physiological parameter are depicted in Supplement Table 1 and Supplement Table 2. Plasma was used for footprint DNA enrichment, sequencing, and extracted signals from the bin approach were evaluated for correlations to routine laboratory parameters. Parameters were evaluated for liver function (alanine aminotransferase (ALT), albumin, alkaline phosphatase (ALP), aspartate aminotransferase (AST), and bilirubin), kidney function (creatinine and urea), immune response (C-reactive protein (CRP), leukocytes, and procalcitonin (PCT)), blood coagulation system (activated partial thromboplastin time (aPTT), Quick’s value and thrombocytes), as well as oxygen transport (erythrocytes, hemoglobin). The correlations between footprint DNA signals and described metrics were tested both patient-centered and between patients (‘identification’ cohort) as well as validated in completely independent samples (‘test’ cohort).
a) Experimental design for the identification and verification of footprint DNA correlates. Routine clinical metrics as well as short cfDNA sequencing data were acquired from six septic patients over 21 days (Identification, n=28 samples, 6 individuals). Independent samples from other septic patients were used for validation (Test, n=34 samples, 19 individuals). Created in BioRender [Sonntag M., 2025, https://BioRender.com/w86w257]. b-e) In the identification cohort, footprint DNA markers from the six patients were correlated patient wise to routine clinical metrics (left three plots) and across all patients (second right plot). In the test cohort, independent samples from sepsis patients were used for validation of identified correlation results. Reported correlations are pearson correlation values (r). Adjusted p values of correlations across patients: urea: r=0.766, p=0.0011; PCT: r=0.806, p=0.0163; hemoglobin: r=0.820, p=0.0424; ALT: r=0.784, p=0.0165. The red lines show the fitted linear function based on the identification cohort from which the MAE is calculated. PCT, procalcitonin; ALT, alanine transaminase: MAE, mean absolute error.
For four physiological parameters, significant footprint DNA correlations were identified, including kidney function (urea; Figure 2b), immune response (PCT; Figure 2c), oxygen transport (hemoglobin; Figure 2d), and liver function (ALT; Figure 2e). Additionally, weaker correlates for additional liver functions (ALP, AST, and bilirubin; Supplement Figure 3a-c) and oxygen transport (erythrocytes; Supplement Figure 3d) were also identified. Patient physiology can change dynamically within a few days, leading to an exponential decrease of PCT in patient S33, for example (Figure 2c). Within the identification cohort for all six patients combined, footprint DNA signals correlate significantly and strongly with clinical parameters including urea (Pearson r: 0.766) or PCT (Pearson r: 0.806) (Figure 2b-e). At the individual patient level, correlations also showed similar trends (Supplement Figure 4, Supplement Figure 7). These correlations were then further validated using samples from an independent test cohort (Supplement Table 3). Although extent of correlations decreased (Pearson r: 0.586 for urea and 0.440 for hemoglobin) and the MAEs of the linear fits increased, a moderate correlation with the clinical parameters was still observed (Figure 2b-e right plot). These results suggest that sequencing footprint DNA can provide semi-quantitative information to describe dynamic physiological changes of clinical metrices, which are used to assess the current status of the patient, e.g., organ function.
Footprint DNA accurately predicts type of infection in sepsis
In clinical sepsis diagnostic, PCT is routinely measured to monitor bacterial sepsis, with potential use in detecting bacterial sepsis. However, PCT not always provides reliable results to differentiate between bacterial and non-bacterial sepsis. Therefore, our cohort was used to determine whether footprint DNA markers are able to differentiate between sepsis infection types and uninfected controls already at the day of admission to the ICU (Figure 3a and Supplement Table 4). Bacterial and viral sepsis groups of the cohort do not differ significantly in any of the measured standard clinical parameters (Supplement Table 5). In clinical routine, procalcitonin (PCT) is often used as an indicator for bacterial sepsis, however, we found PCT values only significantly different between non-infected postoperative controls (POP) and infected patients, but not between bacterial and viral septic patients (Supplement Figure 9). Footprint DNA markers were identified by differential read coverage analysis from the consensus peaks of each condition (Supplement Figure 10). The top five differentially enriched regions (DER) per condition and per comparison (POP vs. viral, POP vs. bacterial, bacterial vs. viral; in total 30 DERs) separate the three infection types by the first and second principal components applying a principal component analysis (PCA) (Figure 3b and Supplement Figure 11). Separation of these groups based on the top 30 DERs from the three comparisons is also accomplished by hierarchical clustering (Supplement Figure 12 and Supplement Figure 13a-c). Differential enrichment analysis for bacterial vs. viral sepsis revealed 220 DER in bacterial and 68 DER in viral sepsis, respectively (Figure 3c). Among significantly enriched bacterial DERs, a distinct differential DNA footprint in the promoter region of the AP1B1 gene could be detected (Supplement Figure 12). Among significant viral DERs, differential footprint DNA signals could be exemplarily detected in the promoter of the RNA Polymerase III subunit B (POLR3B) or the solute carrier family 30 member 10 (SLC30A10), respectively (Supplement Figure 12). Receiver operating characteristics (ROC) analysis was used to compare the discriminatory and predictive power of the top DER with PCT. Results show that footprint DNA is a significantly better predictor of sepsis infection type than PCT (area under the ROC curve (AUROC): footprint DNA=1.000 vs. PCT=0.750; p=0.0354) (Figure 3d). Additionally, differential enrichment of transcription factor binding motifs analysis was performed to identify regulatory proteins that may be involved in the discrimination of the conditions. In total, 37 significantly enriched transcription factor motifs for bacterial and 186 for viral sepsis were detected, respectively (Figure 3e,f). For viral sepsis the most significantly enriched binding motif belongs to SMAD family member 3 (SMAD3), while two Kruppel-like factors (KLF12 and KLF9) were detected among the top 10 most significant motifs in bacterial sepsis (Figure 3e,f).
a) Experimental design: For 23 samples (8 viral sepsis patients, 10 bacterial sepsis, 5 non-infected postoperative controls) plasma was prepared, footprint DNA enriched and sequenced. Created in BioRender [Sonntag M., 2025, https://BioRender.com/d50d259]. b) Principal component analysis based on the top 30 identified differentially enriched regions separates viral from bacterial sepsis and from non-infected postoperative controls. c) Differential enrichment analysis volcano plot for the comparison of viral and bacterial sepsis. Differentially enriched regions are colored according to their respective condition (adjusted p-value ≤ 0.05 and |log2 (fold change)| ≥ 0.5). d) Receiver operating characteristics analysis based on the top 10 identified Liquid Footprinting (LF) biomarkers from c) and for the routine clinical parameter PCT (95% confidence intervals are included for AUROC values; p=0.0354). e) and f) Top 10 most significant differentially enriched transcription factor motifs in bacterial sepsis (e) and in viral sepsis (f).
Footprint DNA accurately predicts early death in sepsis
Prognostic biomarkers for the prediction of outcome in sepsis can be crucial to administer the most suitable treatment options and to reduce adverse outcomes. We therefore defined a cohort of severe sepsis cases (‘outcome’ cohort) including 22 sepsis survivors (survival > 28 days) and 21 early death sepsis cases (death < 3 days). Based on the outcome cohort, we aim to identify footprint DNA biomarkers for identification of early death in sepsis (Figure 4a and Supplement Table 7). The outcome cohort groups differ significantly in eight of the 27 measured clinical metrics, with the SOFA score showing the lowest p value and the highest AUROC of 0.76 (Supplement Table 8). Differential enrichment analysis identified one DER for early death and seven for sepsis recovery, respectively (Figure 4b). Based on the top 5 footprint DNA signals per condition, early death and recovery from sepsis could be separated by the second principal component from PCA (Figure 4c), as well as by hierarchical clustering (Supplement Figure 13d). ROC analysis of the top 10 footprint DNA signals and the most established clinical metric SOFA yielded a significantly better predictive power for the footprint DNA biomarkers (AUROC: footprint DNA=0.983 vs. SOFA=0.760; p=0.002; Figure 4d). Additionally, time-to-event analysis revealed that the low-risk group identified by footprint DNA biomarkers had a significantly lower hazard ratio (HR) for death compared to the group identified by the SOFA score (HR: footprint DNA=0.042 (95 % confidence interval: 0.009-0.186), SOFA=0.364 (95 % confidence interval: 0.148-0.896); p=0.015; Supplement Figure 14). Differential enrichment analysis of transcription factor binding motifs identified 123 significantly enriched transcription factor motifs for early death and two for sepsis recovery, respectively (Figure 4e,f). For recovery in sepsis, the top binding motif reflects the transcription factor BTB domain and CNC homolog 1 (BACH1) (Figure 4e,f).
a) Experimental design: From 43 septic patients samples (22 early death sepsis patients and 21 recovery patients) plasma was obtained, footprint DNA enriched and sequenced. Created in BioRender [Sonntag M., 2025, https://BioRender.com/b36q835]. b) Differential enrichment analysis volcano plot for the comparison of early death and recovery in sepsis. Differentially enriched regions are colored according to their respective condition (adjusted p-value ≤ 0.05 and |log2 (fold change)| ≥ 0.5). c) Principal component analysis based on the top 10 identified differentially enriched regions separates early death from recovery in sepsis. d) Receiver operating characteristics analysis based on the top 10 identified Liquid Footprinting (LF) biomarkers from b) and for the routine clinical metric SOFA (95% confidence intervals are included for AUROC values; p=0.002). e) and f) Top 10 most significant differentially enriched transcription factor motifs in recovery (e) and early death in sepsis (f).
Discussion
Sepsis remains a global health threat, in part due to the lack of specific diagnostic tools to assess the condition of septic patients. Monitoring the host immune response in sepsis offers valuable insights but is currently underutilized in clinical practice [45]. Several approaches aim to utilize host immune response information, for example gene expression data for pathogen detection or miRNA expression data for host response characterization [46,47]. However, dedicated sepsis biomarkers could facilitate early diagnosis, guide anti-infective therapy and track the severity of sepsis over time [48]. Accordingly, our approach aims to identify dedicated sepsis biomarkers based on cell-free DNA that capture the host response on the basis of a non-invasive liquid biopsy.
Footprint DNA sequencing provides genome-wide information on the interaction between regulatory proteins and the genome, hence signals mainly occur at regulatory genomic elements such as promoters or TFBS (Figure 1 and Supplement Figure 1, Supplement Figure 2). Individual footprint signals correlate to clinical, physiological parameters, such as liver (for ALT, r = 0.784) and kidney (for urea, r = 0.766 (Figure 2). These footprint DNA markers capture rapid changes of the clinical metrics on single patient level as well as across all patients (Supplement Figure 4, Supplement Figure 7, Supplement Figure 8), underlining that the dynamic in patient’s physiology can be assessed during disease progression, i.e., during a stay on the ICU. Despite the moderate correlation strength observed in the test cohort, the identified correlates suggest that DNA footprinting provides semi-quantitative data about dynamic changes in the physiological state of a patient. Footprint DNA allows body- and genome-wide the monitoring of different organ systems beyond the hematologic system by capturing the dynamics of clinical metrices. These results also imply that a DNA-based liquid biopsy marker can deliver physiological insights about the patient’s status, similar to gene expression data from sources such as whole blood [49]. While this gene expression analysis only depicts signal mainly from the hematopoietic system, footprint DNA is capable deliver additionally information system-wide, e.g., from different organ systems or tumors. Advanced models that integrate multiple footprint DNA signals might further increase correlation and therefore the prediction of established clinical metrics. However, directly replicating well-established, reliable, and inexpensive clinical metrics offers limited added value. Furthermore, the development of such models is limited by the comparably small cohort size.
Inadequate treatment is associated with increased mortality in septic patients, while adequate and early administration of anti-infective treatment leads to improved survival chances [50,51]. Based on the top 30 DERs, we could distinguish non-infected controls from bacterial and viral sepsis (Figure 3b). and with the top 10 DERs discriminate viral sepsis cases from bacterial cases with an AUROC of 1.000, outperforming all clinical routine biomarkers for infection including PCT (AUROC = 0.750) (Figure 3c,d; Supplement Table 6). Among significant DERs enriched in bacterial sepsis, a footprint signal at the promoter of AP1B1 was detected. AP1B1 is located at the Golgi complex to mediate recruitment and sorting of intracellular processes. It is a known regulator of Stimulator of interferon genes (STING) and thereby the cyclic GMP–AMP synthase (cGAS)-STING pathway. STING functions as a receptor for bacterial cyclic dinucleotides and small molecules that activates immunity during bacterial infection [52]. An enhancer in the AKT serine/threonine kinase 3 (AKT3) gene body was also identified as DER. AKT3 is known to be manipulated by bacterial infection and takes part in immune cell signaling including macrophages [53,54]. For viral sepsis, a significant DER in the promoter of POLR3B was detected. POLR3B is a part of the RNA polymerase III complex and senses common DNA viruses, such as cytomegalovirus, vaccinia, herpes simplex virus-1 and varicella zoster virus. This polymerase detects and transcribes viral genomic regions to generate AU-rich transcripts that bring to the induction of type I interferon [55,56]. Based on the transcription factor motif analysis, transcription factors enriched in either bacterial or viral sepsis were identified (Figure 3e,f). While the transcription factor SMAD3 is known to be activated during viral infection[57], the family of KLF transcription factors are induced during bacterial infection [58,59]. Also, footprint DNA sequencing outperforms common clinical routine metrics that were summarized in a recent review by Ahuja et al. [60]. Exemplarily for PCT, studies are described to reach an AUROC of 0.85 or lower discriminating bacterial sepsis, which is in concordance with our infection cohort for PCT [61]. Other approaches using either miRNA signatures for host response profiling reach AUROC of 0.90 and 0.83 for bacterial or viral vs. non-infection, respectively [62], while integrated host response analysis for microbe detection reaches 0.96 [63]. Current clinical metagenomics detects microbial-derived cfDNA and is therefore limited in their analysis to bacteria and DNA viruses [4,11]. With footprint DNA biomarkers, also RNA-based microorganisms, e.g., RNA viruses, should be detected based on the host response analysis. Overall, our results show that footprint DNA biomarkers have the potential to discriminate sepsis infection types at an early time point solely on the host response, thereby delivering important information for adequate treatment decisions.
In addition, we used ten footprint DNA markers to prognostically differentiate between early death in sepsis and recovery cases on the day of ICU admission (Figure 4a-c). With an AUROC of 0.983, footprint DNA marker’s discriminatory power is significantly better than that of clinical disease severity scoring tools such as the SOFA score (AUROC = 0.760) (Figure 4d, Supplement Table 9). The SOFA score for prediction of early death in critically ill patients had a sensitivity of 80% in a study by Ferreira et al. [64]. Also, other clinical parameters like lactate (AUROC = 0.66), lactate and SOFA (AUROC = 0.679 for initial sampling), presepsin (91.5 % sensitivity 28-day mortality) or a combination of several biomarkers including interleukin 6 and PCT (AUROC = 0.823) did not perform as promising as footprint DNA markers [65–68]. Recent approaches utilizing machine learning models with readily available clinical data were able to reach AUROCs between 0.83 to 0.89 depending on study and used data [69–71]. The top TF motifs enriched in early death cases, include different cell fate determining transcription factors including myocyte enhancer factor 2 (MEF2) and Nanog homeobox (NANOG). However, a direct causality to sepsis severity or survival remains elusive, yet.
Footprint DNA sequencing offers promising results for the quantitative assessment of the dynamic physiological changes and functionality of organ systems and for the prediction of the infection type and outcome of sepsis. At the current stage, the main limitation of this proof-of-concept study is its small sample size of the analyses. Follow up studies are essential to verify the results. The complete procedure for footprint DNA sequencing and quantification has currently a turnaround time of two days, which is valuable time for critically ill patients. Therefore, a future improvement of the approach will involve the quantification of the most promising footprint DNA markers in a targeted approach to shorten the turnaround time for clinical applicability. Nonetheless, footprint DNA sequencing represents a promising new platform to identify liquid biopsy biomarkers, here presented for application in sepsis diagnostics. By combining the evaluation of the host response using footprint DNA with the evaluation of pathogens using clinical metagenomics, a more holistic assessment of sepsis could be achieved in the future. Footprint DNA biomarkers might be valuable not only for sepsis but also in other complex diseases including autoimmune diseases, cancer or other related fields [16].
Contributors
MS and KS conceptualized the presented ideas with input from JM. MS performed all laboratory experiments. JM performed computational and statistical analysis of all data with help of AvH. Further interpretation and analysis of the results was carried out by JM, MS and KS. JM prepared the figures. TB, SOD as well as the GAIN-CARE collaborators recruited the patients’ samples used for this study. MS, JM, and KS wrote and revised the manuscript. KS supervised the project. All authors reviewed and approved the manuscript.
Data sharing statement
The raw high-throughput sequencing data used in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject) under the accession number PRJNA1134400. Custom data analysis code created for this work is available on GitHub (https://github.com/janmueller17/short_cfDNA_for_sepsis).
Declaration of interests
Kai Sohn is a co-founder of Noscendo GmbH, a diagnostic company that detects pathogens based on Next-generation sequencing. Moreover, Kai Sohn, Mirko Sonntag and Jan Müller filed for a patent regarding the described methods and bioinformatics algorithms for analyzing short double-stranded cfDNA (EP23203060.1).
Data Availability
The raw high-throughput sequencing data used in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject) under the accession number PRJNA1134400. Custom data analysis code created for this work is available on GitHub (https://github.com/janmueller17/short_cfDNA_for_sepsis).
Supplementary Data
Obtained peaks from footprint DNA signals were mapped against the human genome (a). All annotated peaks were further mapped against annotated regulatory elements including transcription factors bindings sites (TFBS, b), genes (c), cis-regulatory elements (CRE, d) and CpG islands (e).
Obtained peaks from footprint DNA signals were mapped against the human genome (a). All annotated peaks were further mapped against annotated regulatory elements including transcription factors bindings sites (TFBS, b), genes (c), cis-regulatory elements (CRE, d) and CpG islands (e).
Routine clinical metrics as well as short cfDNA sequencing data were acquired from six septic patients over a 21 days timespan (Identification, n=28 samples). Independent samples from other septic patients were used for validation (Test, n=34 samples). In the identification cohort, footprint DNA markers from the six patients were correlated to routine clinical metrics across patients (left plot). In the test cohort, independent samples from sepsis patients were used for validation of identified correlation results (middle plot). Correlation of footprint DNA signals to a) Alkanine phosphatase (ALP), b) Aspartate aminotransferase (AST), c) Bilirubin, d) Erythrocytes. Reported correlations are pearson correlation values (r). Adjusted p values of correlations across patients: ALP: r=0.775, p=0.0011; AST: r=0.844, p=0.0163; bilirubin: r=0.843, p=0.0424; Erythrocytes: r=0.869, p=0.0165. The red lines show the fitted linear function based on the identification cohort from which the MAE is calculated. MAE, mean absolute error.
a-f) Footprint DNA markers were analyzed for correlation to clinical metric PCT for all patients in the physiological cohort identification set. In the left and middle panel, PCT and normalized readcounts for the footprint, respectively, is depicted over time. The correlation of footprint DNA normalized counts and the clinical metric PCT is shown in the right panel. Results for patients are depicted the following: a) Patient S10, b) Patient S33, c) Patient S39, d) Patient S42, e) Patient S43 and f) Patient S49. PCT, procalcitonin.
a-f) Footprint DNA markers were analyzed for correlation to the clinical metric hemoglobin for all patients in the physiological cohort identification set. In the left and middle panel, hemoglobin and normalized readcounts for the footprint, respectively, is depicted over time. The correlation of footprint DNA normalized counts and the clinical metric hemoglobin is shown in the right panel. Results for patients are depicted the following: a) Patient S10, b) Patient S33, c) Patient S39, d) Patient S42, e) Patient S43 and f) Patient S49.
a-f) Footprint DNA markers were analyzed for correlation to clinical metric ALT for all patients in the physiological cohort identification set. In the left and middle panel, ALT and normalized readcounts for the footprint, respectively, is depicted over time. The correlation of footprint DNA normalized counts and the clinical metric ALT is shown in the right panel. Results for patients are depicted the following: a) Patient S10, b) Patient S33, c) Patient S39, d) Patient S42, e) Patient S43 and f) Patient S49. ALT, alanine aminotransferase.
a-f) Normalized readcounts of footprint DNA markers were analyzed for correlation to the clinical metric urea for all patients in the physiological cohort identification set. In the left and middle panel, urea and normalized readcounts for the footprint, respectively, is depicted over time. The correlation of footprint DNA normalized counts and clinical metric urea is shown in the right panel. Results for patients are depicted the following: a) Patient S10, b) Patient S33, c) Patient S39, d) Patient S42, e) Patient S43 and f) Patient S49.
a-d) Normalized readcounts of footprint DNA markers were analyzed for correlation to each clinical metric combined for all patients in the physiological cohort identification set. Results for clinical metrics are depicted the following: a) urea, b) PCT, c) hemoglobin and d) ALT. PCT, procalcitonin; ALT, alanine aminotransferase.
The concentration of Procalcitonin (PCT) was measured for each sample and categorized to non-infection, postoperative controls (blue), viral sepsis (pink) and bacterial sepsis (yellow). Statistical testing was performed using a two-sided Wilcoxon rank-sum test. Resulting p-values are indicated above the brackets for each comparison.
Differentially enriched regions (DERs) were analyzed between two conditions for enriched transcription factor binding sites (log10 Adj. p-value ≤ 0.05 and log2 Fold change ≥ 2). Significant results are colored and depicted in a Volcano plot. a) non-infection controls (blue) were analyzed against bacterial sepsis cases (yellow) b) non-infection controls (blue) were analyzed against bacterial sepsis cases (pink).
Based on the top 5 differential enriched regions (DERs) per condition, principal component analysis (PCA) was performed. The following conditions of the infection type cohort were investigated a) bacterial sepsis against viral sepsis b) non-infection control against bacterial sepsis c) non-infection control against viral sepsis.
Heatmap and hierarchical clustering of differential enrichment analysis of the infection type comparison of non-infected controls (blue), viral (pink) and bacterial sepsis (yellow). Based on the top 5 DER per condition, hierarchical clustering was performed. DER signals are normalized per row according to Z-score and color-coded.
Based on the top 5 DER per condition, hierarchical clustering was performed. DER signals are normalized per row according to Z-score and color-coded. a) Heatmap for DERs for bacterial (yellow) in comparison to viral sepsis (pink). b) Heatmap for DERs for bacterial sepsis (yellow) in comparison to non-infected, postoperative controls (blue). c) Heatmap for DERs for viral sepsis (pink) in comparison to non-infected, postoperative controls (blue). d) Heatmap for DERs for early death (black) in compaison to recovery in sepsis (green).
Based on disease severity scoring with SOFA and footprint DNA signals, 43 patients from the outcome cohort were assigned to the low risk SOFA (SOFA ≤ 6; green dashed line) or high risk SOFA (Sofa > 6; black dashed line) and to the low risk footprint DNA (low risk LF, green line) or the high risk footprint DNA group (high risk LF, black line). Over the course of 28 days, the survival probability is depicted for each group.
Patient metadata summary for sepsis cases over time for each physiological parameter and time point from the validation set (physiological status test validation cohort). NA=not available
Patient descriptive data, pathogen burden, outcome and infection type summary for sepsis cases in infection type cohort. Early death reflects patients that died within two days after ICU admission. Survivors reached the end of the study at 28 days after ICU admission.
Patient metadata summary statistics for bacterial and viral sepsis cases (infection type cohort). SD = Standard deviation, IQ = Inter quartile
Categorical features are marked with an asterisk (‘*’). P values were obtained from two-sided Wilcoxon rank-sum test for numerical features, and Fisher’s exact test for categorical features. Area under ROC curves (‘AUROC’) is reported with 95% confidence interval boundaries (‘AUROC_ci_l’ = lower boundary, ‘AUROC_ci_h’ = higher boundary). From the ROC analysis per metric, the decision threshold (‘Thresh’) that maximizes the sum of specificity (‘Spec’) and sensitivity (‘Sens’) is indicated with the corresponding specificity and sensitivity. ALT=Alanine aminotransferase, AP=Alkaline phosphatase, AST=Aspartate aminotransferase, BMI=Body-Mass-Index, CRP=C-reactive protein, GFR=Glomerular filtration rate, INR=International normalized ratio, PCT=Procalcitonin, SOFA=Sequential organ failure assessment, RRT=Renal replacement therapy.
Early death reflects septic patients that died within two days after ICU admission. Recovery reached the end of the study at 28 days after ICU admission.
Categorical features are marked with an asterisk (‘*’). P values were obtained from two-sided Wilcoxon rank-sum test for numerical features, and Fisher’s exact test for categorical features. Metrics with a p value <= 0.05 are highlighted in bold font. Area under ROC curves (‘AUROC’) is reported with 95% confidence interval boundaries (‘AUROC_ci_l’ = lower boundary, ‘AUROC_ci_h’ = higher boundary). From the ROC analysis per metric, the decision threshold (‘Thresh’) that maximizes the sum of specificity (‘Spec’) and sensitivity (‘Sens’) is indicated with the corresponding specificity and sensitivity. ALT=Alanine aminotransferase, AP=Alkaline phosphatase, AST=Aspartate aminotransferase, BMI=Body-Mass-Index, CRP=C-reactive protein, GFR=Glomerular filtration rate, INR=International normalized ratio, PCT=Procalcitonin, SOFA=Sequential organ failure assessment, RRT=Renal replacement therapy.
Acknowledgements
We want to thank all clinicians, nurses and patients of the participating intensive care units of both clinical studies.
Footnotes
↵¥ GAIN-CARE Collaborators
References
- [1].↵
- [2].↵
- [3].
- [4].↵
- [5].↵
- [6].
- [7].↵
- [8].↵
- [9].
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].
- [67].
- [68].↵
- [69].↵
- [70].
- [71].↵