ABSTRACT
Microbial cell-free DNA (mcfDNA) sequencing is an emerging infectious disease diagnostic tool which enables unbiased pathogen detection from plasma. The Karius Test®, a commercial mcfDNA sequencing assay developed by and available since 2017 from Karius, Inc. (Redwood City, CA), detects and quantifies mcfDNA as molecules/μl in plasma. The commercial sample data and results for all tests conducted from April 2018 through mid-September 2021 were evaluated for laboratory performance metrics, reported pathogens, and data from test requisition forms. A total of 18,690 reports were generated from 15,165 patients in a hospital setting among 39 states and the District of Columbia. The median time from sample receipt to reported result was 26 hours (IQR 25–28), and 96% of samples had valid test results. Almost two-thirds (65%) of patients were adults, and 29% at the time of diagnostic testing had ICD10 codes representing a diverse array of clinical scenarios. There were 10,752 (58%) reports that yielded at least one taxon for a total of 22,792 detections spanning 701 unique microbial taxa. The 50 most common taxa detected included 36 bacteria, 9 viruses, and 5 fungi. Opportunistic fungi (374 Aspergillus spp., 258 Pneumocystis jirovecii, 196 Mucorales, and 33 dematiaceous fungi) comprised 861 (4%) of all detections. Additional diagnostically challenging pathogens (247 zoonotic and vector borne pathogens, 144 Mycobacteria, 80 Legionella spp., 78 systemic dimorphic fungi, 69 Nocardia spp., and 57 protozoan parasites) comprised 675 (3%) of all detections. We report the largest cohort of patients tested using plasma mcfDNA sequencing. The wide variety of pathogens detected by plasma mcfDNA sequencing reaffirm our understanding of the ubiquity of some infections while also identifying taxa less commonly detected by conventional methods.
INTRODUCTION
Sequencing microbial cell-free DNA (mcfDNA) in plasma represents integration of progress in genomic sequencing, computation analyses, and recognition of cell-free DNA as a clinically useful blood analyte (1–3). Over the last four decades, PCR-based tests, specifically multiplexed broad syndromic panels, have made welcomed contributions to infectious disease diagnostics but fall short of desired performance including breadth of pathogen detection and require samples of infected tissue or body fluid (4). Broad range PCR testing arguably facilitates considering a wider range of potential pathogens but is still only limited to bacteria and fungi (5). A recent meta-analysis, which included 20 studies that satisfied the Quality Assessment of Diagnostic Accuracy Studies (6) to assess the diagnostic accuracy of next generation sequencing in distinguishing infectious diseases, concluded that this group of technologies demonstrated satisfactory diagnostic performance for infections and yielded an overall detection rate superior to conventional methods (7). Four of the studies included in this review employed plasma mcfDNA sequencing. Moreover, early experience with plasma mcfDNA sequencing suggests this new approach, especially when applied early in a patient’s clinical course and for specific use cases, has potential to improve upon the above-noted shortcomings (8, 9). Plasma mcfDNA sequencing enables unbiased pathogen detection through noninvasive sampling with rapid turnaround, creating opportunities to enhance diagnosis of bloodstream and deep-seated infections (10–12). This is urgently needed particularly among immunocompromised patients who are often the most vulnerable to serious and frequently life-threatening infections.
The Karius Test® is an analytically and clinically validated mcfDNA sequencing test, commercially available for US inpatients since 2017 as a laboratory developed test from Karius, Inc. The test can identify and quantitate molecules/µl (MPM) mcfDNA in plasma from >1,500 bacteria, DNA viruses, fungi, and parasites. The analytical and clinical validation of the test was previously reported (12). Since the time of this study, others have reported how this unbiased test may contribute to the diagnosis and management of life-threatening infections in defined patient populations, specifically immunocompromised patients early in their clinical course, by creating the potential to minimize invasive procedures (13), reducing time to specific etiologic diagnosis of infections compared with standard of care (SOC) microbiological testing (14), and, in some cases, optimizing antimicrobial therapy (15, 16). In contrast, several retrospective, observational reviews of Karius Test utilization concluded that in routine clinical practice the diagnostic and clinical impact of the test was limited, which highlights the need for diagnostic stewardship to optimize implementation and maximize clinical utility in specific patient populations (17–19).
Plasma mcfDNA sequencing for infectious disease diagnosis performance at scale with respect to time to results, quality metrics, positivity rates, and diversity of taxa detected has not been previously reported. Here we review the results for a large commercial laboratory testing cohort of over 18,000 plasma samples from over 15,000 patients in a hospital setting with the primary objective to provide additional insights about the breadth and depth of microbial identifications. In the course of doing so, we also describe the current test performance metrics and characterize clinical use based on the limited available data.
MATERIAL AND METHODS
Commercial laboratory test cohort
The Karius Test results for patients from across the United States were evaluated for reported pathogens and patient data (including basic demographics, ordering clinician, and ICD-10 codes if provided) obtained from the test request forms (TRF) for all samples tested from April 1, 2018 through mid-September, 2021. Laboratory performance metrics were gathered for all samples collected from April 1, 2018 through the end of September, 2021. Diagnosis codes submitted via TRFs were summarized at both the chapter level and Clinical Classifications Software Refined Categories (20, 21). Immunocompromising conditions were then flagged using definitions published by the Agency for Healthcare Research and Quality (22–24).
The Karius Test
Plasma mcfDNA sequencing was performed as previously described (12) in the Karius clinical laboratory, certified under the Clinical Laboratory Improvement Amendments of 1988 and accredited by the College of American Pathologists. Briefly, whole-blood samples were collected in either BD Vacutainer plasma preparation tubes (PPTs) or K2-EDTA tubes. After plasma has been separated from cells, the sample is stable at ambient temperature for 96 hours and at -20°C for 6 months. Upon receipt at Karius, controls for carry-over, sequencing bias, metagenomic sequencing quality, and sample mix-ups were added to the sample. Proprietary chemistries were used to enrich samples for mcfDNA without preselecting pathogens to test. Automated DNA extraction and sequencing library preparation protocols were optimized for high speed and low pathogen bias. Single-end, 76-cycle sequencing was performed on NextSeq 500 instruments (Illumina, San Diego, CA) with an average of >20 million reads/sample. Double-unique dual indexes were used to ensure robust sample demultiplexing. Sequencing data were processed using a proprietary analytical pipeline, and microbial reads were aligned to a database comprising >20,000 curated assemblies from >16,000 species of which >1,500 taxa are reported, including bacteria, DNA viruses, fungi, and parasites (https://kariusdx.com/the-karius-test/pathogen-list/).
Microorganisms present in statistically significant amounts were reported as a concentration of their mcfDNA expressed as MPM, a unique, absolute quantification capability of the Karius Test shown in preliminary work to correlate well with single-analyte quantitative PCR measurements (25, 26). The reports also contained median and range of MPM values observed for each microorganism reported in the last 1,000 specimens, as MPM values from different microbes are not comparable, and a reference interval determined from 675 asymptomatic donors for comparison. We routinely analyzed the raw data for mcfDNA from potential pathogens, including those present at levels below our standard laboratory report thresholds. For this study, we focused on microorganisms identified in statistically significant amounts.
Notable improvements in the test wet bench procedures and analytical pipeline, as may be anticipated, occurred during the study period. We used the operational classes of pathogens described by Relman, Falkow, and Ramakrishnan (27). The operational classes include obligate, commensal, zoonotic, and environmental pathogens.
Data analytics
Data analysis and visualization were conducted using Python v. 3.9.7, pandas v. 1.3.4, matplotlib v. 3.4.3, and seaborn v. 0.11.2 (28–30). Given the taxonomy and nomenclature for some genera continue to evolve, we selected three (Legionella, Nocardia, and Mycobacterium) to examine species detections, especially multiple species co-detections, more closely.
RESULTS
Test cohort
A total of 19,739 samples meeting collection and transport requirements were tested from 16,172 patients in a hospital setting in 39 states and the District of Columbia during the study period. The median time from sample receipt at the Karius laboratory to reported result was 26 hours (IQR 25–28), and 96% of samples had valid test results. A summary of key performance metrics for the test in this production data set are shown in Table 1. These metrics were not significantly different (two-sided t-test p-value >0.01) from those reported for the first 2,000 clinical samples run by the Karius clinical laboratory and reported in the initial validation study (12). Infectious disease and hematology/oncology providers represented most ordering clinicians, 64% (n=9,804) and 14% (n=2,132), respectively, for the 15,424 specimens with a National Provider Identifier indicated. We were able to capture and analyze 18,690 reports from 15,165 patients. Twelve percent (n=1,839) of patients had at least one repeat test during the study interval. Almost two thirds (65%, n=9,798) of patients were adults (i.e., age >18 years). More than a quarter (29%, n=4,423) of patients at the time of diagnostic testing had ICD10 codes representing a diverse array of clinical scenarios indicated in their TRFs (Table 2). Eighteen percent (n=797) of these patients were indicated as immunocompromised (IC); 717 (16%) had fever; and 230 (5%) had sepsis.
Taxa detection and quantification
Of samples yielding a valid result, 7,938 (42%) reported a negative test with no pathogens identified. The remaining 10,752 (58%) Karius Test reports from 8,849 patients had at least one microbe identified (5,531 [30%] only one) representing 701 unique microbial taxa (526 [75%] bacteria, 103 [15%] fungi, 47 [7%] viruses, and 24 [3%] parasites) and a total of 22,792 detections. The overall frequency of detection for each of these groups is shown in Fig. 1, and the number of detected taxa counts per report for all positive reports is shown in Fig. 2. All the quality control metrics were met for taxa quantification in MPM for 9,690 (90%) samples with positive results. A complete list of all taxa reported along with their frequency of detection and median and IQR for the MPM values are given in Supplemental Table 1.
Top 50 reported taxa
The top 50 reported taxa and the median, range, and IQR of MPM for each taxon are shown in Table 3. They included 36 bacteria, 9 viruses, and 5 fungi. Together the top 50 taxa included a broad range of commensal and environmental pathogens and represented 15,692 detections (69% of all detections).
The distribution of these taxa are as follows. There were 11,023 detections of bacteria including 11 anaerobes (2,730, 25%), 8 Streptococcus spp. (1,379, 12%), 4 Enterobacterales (2,031, 18%), 3 Staphylococcus spp. (1,369, 12%)., 2 Rothia spp. (564, 5%), 2 Haemophilus spp. (466, 4%), 2 Enterococcus spp. (1,092, 10%), and 1 each of Acinetobacter haemolyticus (199, 2%), Pseudomonas aeruginosa (817, 7%), Stenotrophomonas maltophilia (169, 1%), and Helicobacter pylori (207, 2%). There were 3,982 viral detections of 9 different viruses that included 1,275 (32%) cytomegalovirus, 902 (23%) Epstein-Barr virus, 479 (12%) herpes simplex virus 1, 468 (12%) human herpes virus 6B, 354 (9%) BK polyoma virus, 171 (4%) human adenovirus C, 135 (3%) torque teno virus (TTV), 100 (3%) human adenovirus B, and 98 (3%) human herpes virus 7. Finally, there were 920 detections of fungi comprising 260 (28%) Candida albicans, 258 (28%) Pneumocystis jirovecii, 186 (20%) Aspergillus fumigatus, 113 (12%) Candida glabrata, and 103 (11%) Candida tropicalis.
Difficult to diagnose uncommon pathogens
The SOC methods for the organisms listed below have considerable shortcomings including, but not limited to, sensitivity and specificity, comprehensiveness, accuracy, time to result, and/or local availability.
Bacteria
The frequency distribution of the number of detections of Legionella-like organisms (n=80) is shown in Fig. 3. Forty-one percent of these detections were the most recognized pathogen, L. pneumophila. Two reports contained co-detections of two different species (L. brunensis, 400 MPM and hackeliae, 270 MPM; and L. feeleii, 78,508 MPM and L. tunisiensis, 76,445 MPM, respectively). Neither L. brunensis nor L. tunisiensis has been associated with human disease (https://specialpathogenslab.com/legionella-species/), and the MPM values for the co-detections in each report were similar.
The frequency distribution of Nocardia spp. detections, n=76, is shown in Fig. 4. Plasma mcfDNA sequencing detected 25 of the approximately 100 validly named species. Of the 8 species reported to be isolated more frequently from patients, 7 were detected (Fig. 4). One species, N. cyriacigeorgica, dominated with 19 (25%) detections. Of the 69 patient reports represented by the Nocardia spp. detections, 12 (17%) reported ≥2 concurrent species (range 2–5). All the co-detections were reported with similar MPM values, and 8 (67%) were co-detections of closely related species (4 N. exalbida/gamkensis, 3 N. elegans/nova/africana, 1 N. kruczakiae/violaceofusca/aobensis) previously reported to be indistinguishable by mcfDNA sequencing (31, 32).
The frequency distribution of the 156 Mycobacterium spp. detections is shown in Fig 5. Plasma mcfDNA sequencing detected 107 (69%) slowly growing mycobacteria (SGM) and 49 (31%) rapidly growing mycobacteria (RGM). The frequency of species distributions of these three genera showed similarities in that several species were predominant, followed by long tails of uncommon to single species detections. We reported ≥2 concurrent species (range 2–6) in 6 (4%) of the 144 reports including Mycobacterium spp. All the co-detections were reported with similar MPM values. Three of the reports contained M. avium complex/chimera and one each M. avium complex/celatum/kyorinense, M. brisbanense/mucogenicum/obuense, and M. chubuense/elephantis/flavescens/goodii/holsaticum /phlei. The co-detections of multiple Legionella, Nocardia, and Mycobacterium spp., respectively, within the same patient are shown in Supplemental Table 2 .
The frequency distribution of the 247 (3% of all bacterial detections) zoonotic and vector borne bacterial detections are shown in Fig. 6. Bartonella henselae predominated with 90 (36%) detections.
Fungi
The frequency distribution of the 632 Candida spp. detections is shown in Fig 7; 374 Aspergillus spp. detections in Fig 8; 196 detections in the order Mucorales in Fig. 9; 78 detections of the systemic dimorphic fungi in Fig. 10; and 33 detections of dematiaceous fungi in Fig. 11. We detected 9 microsporidia, including 5 Enterocytozoon bieneusi and one each of E. cuniculi, E. hellem, Anncaliia algerae, and Vittaforma corneae. In addition, Pneumocystis jirovecii (258 detections) was among the top 50 taxa detected.
Eukaryotic parasites
The frequency distribution of the 57 (89% of 64 parasite detections) protozoa is shown in Fig. 12. Among the protozoan parasite detections, 68% were Toxoplasma gondii, and 14% were pathogenic amoebae. Among the 7 (11%) helminthic parasites, we detected 4 nematodes (all Strongyloides stercoralis), 2 cestodes (both Echinococcus multilocularis), and 1 trematode (Schistosoma mansoni).
DISCUSSION
We report the largest testing cohort of patients in which mcfDNA was identified and quantified. The key test performance metrics in this large cohort mirror what was reported with a much smaller cohort in the initial validation study (10), demonstrating that the Karius Test is robust and can be performed at scale in a clinically relevant time frame. These mcfDNA sequencing data reaffirm the ubiquity of some infections, as commonly expected microbes were detected in most patients while less common microbes were rarely detected. However, notably, the unbiased approach of the plasma mcfDNA sequencing made those “rare” detections possible, whereas conventional diagnostics require targeting specific organisms. Identifying optimal utilization, both clinical indications and timing, of the plasma mcfDNA sequencing in future studies to augment clinical decision-making and integration into current testing algorithms (17, 18, 33–35) will serve to improve the utility of the test. Examples of such studies include two recently completed prospective, observational clinical trials of the use of the Karius Test to diagnose pneumonia in immunocompromised patients (NCT04047719) (36) and infections in stem cell transplant inpatients and outpatients over time (NCT02804464) (26), respectively.
Our finding that 58% of the Karius Test reports identified ≥1 pathogen is substantially lower than the 70–85% reported for the test when applied to well-defined clinical uses (13, 14, 37). These findings support the need for diagnostic stewardship and additional clinical studies demonstrating how yield differs and should be carefully interpreted according to the specific patient population and disease prevalence considered.
The Karius Test quantitates detections in MPM, which can be influenced by several factors specific to the microbial genome—e.g., turnover rate and genome size (12). Confounding patient variables (e.g., infection site, therapeutic interventions, and immune status) may also influence this measure. Generally higher concentrations have been found in definite infections, but since considerable overlap of MPM values exists in unlikely infections and in the asymptomatic cohort, threshold values for definite infections have not been established (12). However, following the decay of mcfDNA quantitatively by serial testing may have important implications for individual patient management in assessing the effectiveness of antimicrobial therapy and other medical or surgical interventions (8, 38, 39), but needs further study.
Notable among the most common detections were the many commensal bacterial and fungal pathogens that cause serious and often invasive infections in patients with relevant risk factors (e.g., immunocompromised). Among the viruses most detected by the Karius Test (e.g., Herpesviridae), all could represent latent infection, reactivation, or active infections depending on the respective clinical conditions; regardless, the detection of these viruses may be of particular concern for immunocompromised patients for whom they may cause considerable morbidity (40).
A key benefit of unbiased mcfDNA sequencing identified in this study is the ability to detect diagnostically challenging microbes such as opportunistic and systemic dimorphic fungi and zoonotic and vector borne pathogens (41). These pathogens often carry a sense of urgency for the management of the individual patient and even for the public’s health, as they may be associated with considerable morbidity and potential mortality. As some are uncommonly expected pathogens, they present a major challenge for clinicians in considering and ordering appropriate SOC testing to capture all possible pathogens. For the laboratories, the microbiologic diagnosis of these infections often represents a major challenge for SOC methods as has been described by others (42, 43).
The absence of certain pathogens such as the emerging, multidrug resistant C. auris among the detections may arguably reflect the relatively few cases occurring during the study period in the geographic locations or patient populations of the test cohort and reflects the rarity of clinical cases in the United States (44). Plasma mcfDNA sequencing has the potential to contribute to the greater understanding of the geographic distribution of microbes and even overall disease surveillance, as demonstrated by the systemic dimorphic fungi detections. At 43 detections (55% of dimorphic fungal identifications), Histoplasma capsulatum predominated, very likely related to its wide geographic range and opportunities for environmental exposure and mirrors what is known about the epidemiology and incidence of systemic dimorphic fungal infections (45). The relative frequency of detections of Coccidioides immitis and posadasii and Blastomyces dermatidis may have been influenced by the geographic bias in this study sample cohort. Of the nine Cryptococcus spp. detected, gattii was detected only once compared with neoformans, reflecting its restricted geographic distribution and the overall rarity of infections in the United States (46).
Among the rarely detected microbes during the study period are some deserving further mention. The detection of Legionella, an obligate pathogen, signals public health concern, whether community or hospital acquired. L. pneumophila serogroup 1 (LP1) is estimated to cause 84% of community acquired Legionnaire’s disease (LD) (47); however, other serogroups of L. pneumophila and other Legionella spp. may cause 60% of hospital acquired LD (48). Urinary antigen detection is the most common diagnostic in the United States and Europe; yet, given its specificity for LP1, as many as 40–50% of patients with non-LP1 legionellosis could be missed (49, 50). Plasma mcfDNA sequencing provides comprehensive testing for Legionella and Legionella-like organisms in one diagnostic test and could thereby expand the known LD epidemiology, particularly in nosocomial cases.
Nocardia have 8 species-specific drug susceptibility patterns (51). While accurate species identification can predict antimicrobial susceptibility patterns, molecular methods are required for accurate identifications but are not widely available. The Karius Test detected in our cohort 7 of the 8 species with recognized susceptibility patterns and can provide results more rapidly than existing approaches to species identification.
Finally, culture for mycobacteria, while a complicated and lengthy process, is considered the gold standard, supplemented by direct detection of M. tuberculosis complex by nucleic acid amplification tests (NAATs) in many laboratories. However, NAATs for the direct detection of nontuberculous mycobacteria are not widely available, and accurate identification to species level from cultured isolates remains challenging for most laboratories. The M. chimaera may be overrepresented in our cohort since the Karius Test was optimized for its detection following reports of infections occurring post-surgeries employing contaminated cardiopulmonary bypass devices (52). However, these detections demonstrate the capability of mcfDNA sequencing to provide comprehensive identification of these important obligate and commensal pathogens directly from plasma and provides increased diagnostic value to the above-mentioned SOC methods (53).
Plasma mcfDNA sequencing offers a non-invasive means of detecting microbial infection and capturing species diversity, potentially revealing new insights on genetic complexity not resolved by current taxonomic classification. Still, our findings highlight the need to expand currently available genomic sequencing libraries as many species across various genera remain undiscovered or undescribed (54, 55). The species co-detections demonstrated among the genera we highlighted, Legionella, Nocardia, and Mycobacteria, specifically those occurring at similar MPM values, suggest detection of a single divergent strain, a species not in the Karius Test database, or, alternatively, true polymicrobial detections. Similar challenges exist for broad range PCR testing (5, 56) and occurred with adoption of proteomic identification by MALDI-TOF mass spectrometry (57).
This study has several limitations preventing us from directly comparing and fully elucidating plasma mcfDNA sequencing impact on patient care. Orthogonal testing data (including type, timing, and results) were not available for comparison with the Karius Test results; nor was information regarding antimicrobial or other therapies or procedures. Also, the clinical context for using the Karius Test was provided in only 29% of patients; that information was voluntarily provided, may have been incomplete, and would reflect the clinician’s perspective at the time they ordered the test vs. the final diagnosis.
However, others have reported potential increased diagnostic yield of plasma mcfDNA sequencing compared with SOC tests in certain clinical scenarios (14–16, 58). Some have noted the test to be commonly applied in managing severely ill, especially IC, patients (59), who are more likely to be infected with unusual, and difficult to diagnose pathogens (60, 61).
Constraints of the underlying data structure prevented analyzing the data from those patients who had repeated testing to determine whether these tests were performed serially to diagnose new suspected infections or to monitor therapy response or conduct pathogen surveillance among immunocompromised patients as suggested by others (8, 26, 62–64). Further, the underlying data structure limited our ability to stratify the results by notable improvements to the bioinformatics pipeline over the course of the study period. Nevertheless, the unique data from this large testing cohort provides a wealth of information to help improve our understanding of infecting pathogens.
Unbiased plasma mcfDNA sequencing can potentially enhance patient outcomes by direct and timely recognition of pathogens in specific clinical scenarios as well as benefit the overall public health by increasing our understanding of the epidemiology of emerging infectious diseases such as monkeypox virus or Borrelia miyamotoi (M.S. Lindner, K. Brick, N. Noll, S.Y. Park, et. al., submitted for publication; L.A. Rubio, A.M. Kjemtrup, G.E. Marx, S. Cronan, et. al., submitted for publication, respectively). Further, this powerful, novel diagnostic tool may facilitate medical advances through recognizing previously unanticipated pathogens, as noted when mcfDNA sequencing was leveraged in a research use only modality to identify porcine cytomegalovirus infection in a patient who had received a genetically modified porcine-to-human cardiac transplant (65). As with any advanced diagnostic tool, careful, timely clinical application with expert guidance and interpretation of its results as well as appropriate diagnostic stewardship will optimize its application to offer even greater clinical impact. In addition, the development of robust clinical outcomes studies to evaluate the clinical impact and cost effectiveness of plasma mcfDNA sequencing for specific clinical indications to guide use remains a top diagnostic stewardship priority.
Data Availability
All data produced in the present work are contained in the manuscript.
ACKNOWLEDGEMENTS
We thank the talented, experienced, and dedicated clinical laboratory operations and analytics teams at Karius for generating the test results and sequencing analyses, respectively, that were foundational to this study. We also thank T. Matthew Hill, PharmD, PhD, formerly with Karius, Inc., and Asim A. Ahmed, MD for their early contributions to the paper. This study was supported by Karius. Nathan Ledeboer, PhD and Kevin Messacar, MD, PhD both serve as uncompensated consultants for Karius on this manuscript. Their relationship with Karius has been reviewed and approved by the Medical College of Wisconsin and the University of Colorado, Children’s Hospital of Colorado, respectively, in accordance with their conflict of interest policies.
Footnotes
One author was added. All text, tables, figures remain the same.