Preparation of robust Synthetic Control samples and their use in a metatranscriptomic clinical test
===================================================================================================

* Ryan Toma
* Lan Hu
* Guru Banavar
* Momchilo Vuyisich

## Abstract

Metatranscriptomics (MT), or RNA sequencing, has the potential to revolutionize the field of molecular diagnostics. Due to the complexity of MT diagnostic models, positive and negative control materials for specific disease indications can be difficult to obtain. Controls must often be sourced directly from patients. This introduces logistical burdens, assay variability, and limits high throughput clinical laboratory operations. To overcome this limitation, we developed a method for generating Synthetic Control (SC) samples, which duplicate the nucleic acid signature of complex clinical specimens and produce the desired test outcome. SCs can be easily and cost-effectively produced in large quantities (>100,000 SCs per amplification cycle), enabling high throughput diagnostic testing. Here, we report the generation of Synthetic Positive Control (SPC) samples. SPCs were validated and implemented in a clinical laboratory. The SPCs produced robust positive signals (average OC risk score of 0.997) and high levels of reproducibility (%CV of 0.2%) in a high throughput automated CLIA laboratory. SCs are a novel and useful method for the generation of high quality controls for MT-based diagnostic tests, and their adoption could herald the widespread use of MT tests in molecular diagnostics.

## Introduction

Metatranscriptomics (MT), or RNA sequencing, is becoming an important analytical method in the field of screening and diagnostics, offering a comprehensive platform for the interrogation of a variety of chronic and communicable diseases 1–4. With its ability to generate large amounts of transcriptomic data, RNA sequencing has become an increasingly popular tool for identifying genetic mutations and variations5, detecting infectious agents1, understanding the role of human gene expression profiles in health and disease6, and elucidating the impact of the various human microbiomes in chronic diseases7–9. As MT technology continues to advance, its potential for improving the accuracy and speed of disease diagnosis is rapidly increasing, making it a potentially revolutionary tool for molecular diagnostics.

MT methodologies have already been used to develop diagnostic tests for a variety of chronic diseases that would not have been possible with prior methodologies. For example, MT has been used for accurate screening of oral and throat cancer 3, type 2 diabetes 2, irritable bowel syndrome 10, and autism spectrum disorder4. Despite the advantages of MT in disease diagnostics there are numerous challenges that must be overcome before these tests can be translated from research into the healthcare system.

Unlike other targeted diagnostic methods such as quantitative PCR or amplicon sequencing that are looking for a few gene mutations, most MT models utilize machine learning to identify hundreds of molecular features associated with an indication 2,3,11. While adequate positive or negative control materials can readily be generated for targeted methodologies by introducing the specific mutations or genes of interest, this is not feasible with the complex signals seen in MT disease models 12–14. Control material is an essential component of any test as it serves as a known reference point for the performance of the test and helps to ensure accuracy and reproducibility. However, obtaining large amounts of high-quality control material for clinical assays can be challenging, particularly for unbiased tests. The shortage of control material negatively affects the clinical validation of tests and the confidence in their results, and is a barrier for the widespread implementation of MT-based diagnostic tests.

Traditional control material for sequencing based assays is typically from patients that are known to be positive or negative for an indication of interest 15,16. However, the collection of these samples can be difficult and the sample is finite, which means that more control material from different patients must continually be obtained. This can result in significant logistical challenges. The need to continually obtain control material from different patients also raises questions about the stability, consistency, and reliability of the control source over time.

To address these challenges, our laboratory has developed Synthetic Controls (SCs) for MT assays. SCs can be made by unbiased amplification of total RNA extracted from a clinical specimen. This process allows for the generation of a highly controlled and standardized sample that mimics the RNA profile of the original sample, thus consistently returning a known test outcome. The use of SCs eliminates the need to continually obtain specimens from patients, as they provide an almost limitless source of synthetic control material.

Herein we present the methods for generating SCs and present data from the implementation of synthetic positive controls (SPC) for an oral and throat cancer screening test in a CLIA laboratory. The results demonstrate robust SPC performance that is clinically useful. We show that SPCs are a viable and effective approach to controlling MT tests and could be adopted as standard practice.

## Methods

### Ethics

This study ([clinicaltrials.gov](http://clinicaltrials.gov) registration number [NCT05451303](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT05451303&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F02%2F2023.04.28.23289273.atom)) was conducted with a protocol and consent forms approved by an Institutional Review Board (IRB) accredited by the United States Health and Human Services. All participants consented to participate in the study.

### SC Preparation and Initial Validation

Two saliva samples from patients with clinically adjudicated oral cancer (OC) were used to generate SPCs. These samples were verified to test positive using CancerDetect, an oral and throat cancer screening test that is a clinically validated and licensed Laboratory-Developed Test (Viome Life Sciences)3. Total nucleic acids were purified, and DNA was degraded using DNase, followed by heat inactivation. Human and microbial ribosomal RNAs were removed using subtractive hybridization. The remaining transcripts were converted to cDNA with 5’ and 3’ PCR primer-annealing adapters appended (figure 1). To ensure that full length transcripts were recovered to mimic the RNA size distribution in the original sample, RNA fragmentation and cDNA size selection were not performed. The methods for SC generation up through the end of cDNA synthesis is identical to our laboratories previously published methods17. The cDNA was purified prior to PCR (AMPure XP Reagent, Beckman Coulter). The cDNA pool was amplified by PCR, where the forward primer contains a T7 promoter sequence while the reverse primer was designed to be as short as possible while maintaining a comparable melting temperature to the forward primer (figure 1). The resulting PCR product (F0) was used for *in vitro* transcription with AmpliScribe T7 High Yield Transcription Kit (Biosearch Technologies, AS3107) (figure 1). RNA from the transcription reaction was Dnase treated, cleaned up (AMPure XP Reagent, Beckman Coulter), and resuspended in nuclease free water.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/02/2023.04.28.23289273/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2023/05/02/2023.04.28.23289273/F1)

Figure 1: Preparation method for Synthetic Controls.

For the test of repeated amplification, the initial PCR product (F0) was purified, quantified, diluted, and re-amplified by PCR (figure 2). Purified dsDNA template (3 micrograms) was used as input for the next PCR reaction. This process was repeated three times (F1-F3). Each resultant F0-F3 dsDNA template was also *in vitro* transcribed as described above. Each SPC sample (150 ng of amplified RNA) was analyzed using the CancerDetect test in triplicate. In addition, 158 samples were collected from the general population and were analyzed through the CancerDetect test as a comparison cohort (all samples had ≥500,000 microbial ESD).

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/02/2023.04.28.23289273/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2023/05/02/2023.04.28.23289273/F2)

Figure 2: Diagram of repeated dsDNA template PCR amplifications.

### Library Preparation

SPC samples were analyzed using the CancerDetect test. The methods for this test have been previously published 18. Briefly, the method includes DNase treatment, non-informative RNA depletion, cDNA synthesis, size selection, and limited cycles of PCR for adding dual unique barcodes to each sample.

### SPC Clinical Validation

SPCs from F1 were generated for use in a clinical laboratory (Viome Life Sciences) using the methods outlined above. For the validation of SPCs, 18 replicates were processed through the CancerDetect test and their OC risk scores were calculated. The OC risk score is the predicted probability returned by the oral and throat cancer classifier, pre-trained and validated as previously described3. SPCs used in the clinical test had to have ≥10,000,000 total single reads, ≥500,000 microbial ESD, ≥1,000 KO richness, and ≥100 species richness to pass QC and be included in downstream analysis. All 18 replicates passed these QC criteria.

### Bioinformatics and Disease Classification

Our laboratory maintains a custom reference catalog which includes 32,599 genomes from NCBI RefSeq release 205 ‘complete genome’ category, 4,644 representative human gut genomes of UHGG19, ribosomal RNA (rRNA) gene sequences, and the human genome GRCh3820. These genomes cover archaea, bacteria, fungi, protozoa, phages, viruses, and the human host. The microbial genomes have 98,527,909 total annotated genes. Our laboratory adopts KEGG Orthology (KO) 21 to annotate the microbial gene functions using eggNOG-mapper22.

The microbiome pipeline maps paired-end reads to this catalog using Centrifuge23 for taxonomy classification (at any taxonomy rank). Reads mapped to the host genome and rRNA sequences are tracked for monitoring but excluded from further analysis. Reads mapped to microbial genomes are processed with an Expectation-Maximization (EM) algorithm24 to estimate the expression level (or activity) in the sample. Respective taxonomy ranks (strains, species, genus, etc.) can be easily aggregated from the genomes. For this study, we use species activity in the downstream analyses. These genome mapped reads are extracted and mapped to only gene or open reading frame (ORF) regions for molecular function or KO annotation and quantification.

We define the number of reads mapped to the microbiome as ‘microbial ESD’ (Effective Sequence Depth) to represent the usable portion of reads in a sample for microbiome. ESD only refers to the reads that have an identical match to the microbial database over contiguous 240 bps, which allows clinically validated, strain level taxonomic classification. Many additional reads align to the microbial ORF (open reading frame) database. The percentage of total reads aligning to our custom reference catalog is referred to as %Microbial ESD. OC risk scores were calculated based on our laboratories previously published machine learned (ML) classifier for oral and throat cancer that requires at least 500,000 microbial ESD to return a valid result3.

## Results

### SPC Metrics

Two patient samples were shown to be positive for Oral or Throat Cancer via the CancerDetect test and the diagnosis was confirmed by clinical examination. The predicted OC risk score from the CancerDetect test was 0.99 for donor SPC\_1 and 0.89 for donor SPC\_2. Each positive sample was then converted into a SPC.

The resulting dsDNA (F0) was used as a template for *in vitro* transcription or purified for repeated amplification and transcription (F1-F3). The average RNA and dsDNA yields of F0-F3 are reported in table 1.

View this table:
[Table 1:](http://medrxiv.org/content/early/2023/05/02/2023.04.28.23289273/T1)

Table 1: Average dsDNA template and RNA yields of F0-F3.

F0 generated sufficient dsDNA template for >150 additional PCR reactions, each of which can generate >100ug of SPC RNA. A single F0 PCR reaction is therefore capable of producing >15mg of F1 SPC RNA. The quantity of SPC RNA produced by the method is sufficient for >100,000 SPC samples, demonstrating the scalability of the method.

SPC samples generated ample sequencing data (table 2). %Microbial ESD was markedly suppressed with increased PCR cycling, indicating that repeated PCR amplifications reduces the amount of usable data. This can be easily overcome with increased sequencing depth.

View this table:
[Table 2:](http://medrxiv.org/content/early/2023/05/02/2023.04.28.23289273/T2)

Table 2: Average sequencing metrics for SPC samples.

### SPC OC Risk Scores vs. Amplification Cycles

Even though the SPC\_1 and SPC\_2 OC risk scores were not significantly different between F0 and F1, F2, and F3 (p>0.05), there is a trend towards a reduction in the risk score with increased amplification cycles. The results indicate that SPCs should undergo a maximum of two additional rounds of PCR (up to F2) to minimize the degradation of the positive signal (figure 3A and 3B). However, this could be different for other tests. Even for this particular test, if the threshold for a positive test result is changed at some point in the future, re-amplification may or may not be appropriate.

![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/02/2023.04.28.23289273/F3/graphic-6.medium.gif)

[](http://medrxiv.org/content/early/2023/05/02/2023.04.28.23289273/F3/graphic-6)

![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/02/2023.04.28.23289273/F3/graphic-7.medium.gif)

[](http://medrxiv.org/content/early/2023/05/02/2023.04.28.23289273/F3/graphic-7)

![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/05/02/2023.04.28.23289273/F3/graphic-8.medium.gif)

[](http://medrxiv.org/content/early/2023/05/02/2023.04.28.23289273/F3/graphic-8)

Figure 3: SPC performance is affected by repeated PCR amplification and produces robust positive signals.

All SPCs (SPC_1 F0-F3 and SPC_2 F0-F3, combined data) showed significantly increased positive signal compared to the general population (p<0.001, figure 3C). SPCs are effective at producing robust positive signals that are significantly higher than that of the general population even with repeated PCR amplification.

### SPC Implementation in a Clinical Test

SPCs were implemented in a clinical laboratory that performs a high throughput test for oral and throat cancer screening (CancerDetect, Viome Life Sciences). The average SPC OC risk score was 0.997 (standard deviation: 0.002) and the percent coefficient of variation between all replicates was 0.2%. These results demonstrate that SPCs can be utilized in a clinical test with high confidence in the SPC returning the desired outcome (positive).

## Discussion

The utilization of MT in diagnostic tests has the potential to significantly improve healthcare. However, the lack of suitable control materials remains a significant challenge in its clinical adoption. Currently, for MT diagnostic assays with complex molecular features, controls must be obtained from patients known to be positive or negative for an indication. However, this method is often problematic as sample collection can be difficult and the sample amounts are limited. This results in the need to continually obtain more control material from different patients, which can be logistically difficult.

In order to address these issues, our laboratory has developed a new method for producing control materials for MT assays. Synthetic Controls (SCs) can be created by amplifying the total RNA, which includes the diagnostic signatures, from samples that are known to be positive or negative. This allows for a near limitless source of control material that is easy and cost effective to produce, and eliminates the need for continuous collection of control material from patients.

One of the key advantages of SCs is their stability and robustness. Positive SCs (SPCs) were shown to produce positive signals even after multiple rounds of DNA template amplification (figure 3), indicating that the positive signal is stable and that the method does not introduce significant sources of bias. This makes SPCs ideal for use as control materials in MT diagnostic tests. In addition, SPCs are also cost-effective, as they eliminate the need for frequent collection of positive control material from patients and can be produced in large quantities in a controlled laboratory setting. Future research should seek to replicate these findings with negative SCs (SNCs).

We also demonstrate a clinical application of SPCs. SPCs were utilized as positive controls in a clinical test for an oral and throat cancer screening test (CancerDetect by Viome Life Sciences). The results demonstrate that SPCs can be translated into a clinical setting while maintaining their robust performance. Few research methods are ever implemented into clinical laboratories due to issues with high throughput scaling, reproducibility, and cost. SPCs overcome these challenges and have been shown to be a feasible, effective, and relevant control for an actual clinical MT test.

The concept of SC preparation can likely be applied to additional next generation sequencing (NGS) molecular biology diagnostic techniques such as metagenomics and amplicon sequencing. Metagenomics for example has already been used to generate models for a variety of conditions 12,25–28. These NGS assays would greatly benefit from access to improved control materials. Future research should seek to apply the methods in this paper to other fields of NGS diagnostics.

SCs offer a significant advancement in the field of MT diagnostics. The stability, robustness, and cost-effectiveness of SCs make them ideal for use as control materials in MT assays. By utilizing SCs, the field of molecular diagnostics can improve the accuracy and reliability of MT and potentially other NGS tests, thereby making them a mainstay in the modern healthcare system.

## Data Availability

All data produced in the present study are proprietary to Viome Life Sciences but may be available upon reasonable request to the authors

## Author Contributions

MV and RT designed and implemented the study. MV and RT developed the SC preparation method. RT performed the molecular analyses of the SPC samples. GB and LH performed data analysis. All authors contributed to data interpretation. All authors contributed to the writing of the manuscript.

## Financial and competing interests disclosure

All authors are employees of Viome Life Sciences, Inc. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

*   Received April 28, 2023.
*   Revision received April 28, 2023.
*   Accepted May 2, 2023.


*   © 2023, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## Reference

1.  1.Toma, R. et al. Pathogen detection and characterization from throat swabs using unbiased metatranscriptomic analyses. Int J Infect Dis 122, 260–265 (2022).
    
    
2.  2.Shen, N. et al. Gut microbiome activity predicts risk of type 2 diabetes and metformin control in a large human cohort. Medrxiv 2021.08.13.21262051 (2021) doi:10.1101/2021.08.13.21262051.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMS4wOC4xMy4yMTI2MjA1MXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMDUvMDIvMjAyMy4wNC4yOC4yMzI4OTI3My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

3.  3.Banavar, G. et al. The salivary metatranscriptome as an accurate diagnostic indicator of oral cancer. Npj Genom Medicine 6, 105 (2021).
    
    
4.  4.Hicks, S. D. et al. Validation of a Salivary RNA Test for Childhood Autism Spectrum Disorder. Frontiers Genetics 9, p(2018).
    
    
5.  5.Chang, Y.-S. et al. Metatranscriptomic Analysis of Human Lung Metagenomes from Patients with Lung Cancer. Genes-basel 12, 1458 (2021).
    
    
6.  6.Marques-Coelho, D. et al. Differential transcript usage unravels gene expression alterations in Alzheimer’s disease human brains. Npj Aging Mech Dis 7, 2 (2021).
    
    
7.  7.Vijay, A. & Valdes, A. M. Role of the gut microbiome in chronic diseases: a narrative review. Eur J Clin Nutr 76, 489–501 (2022).
    
    
8.  8.Ojala, T., Kankuri, E. & Kankainen, M. Understanding human health through metatranscriptomics. Trends Mol Med 29, 376–389 (2023).
    
    
9.  9.Jovel, J. et al. Metagenomics Versus Metatranscriptomics of the Murine Gut Microbiome for Assessing Microbial Metabolism During Inflammation. Front Microbiol 13, 829378 (2022).
    
    
10. 10.Jacobs, J. P. et al. Multi-omics profiles of the intestinal microbiome in irritable bowel syndrome and its bowel habit subtypes. Microbiome 11, 5 (2023).
    
    
11. 11.Huang, Q., Zhang, X. & Hu, Z. Application of Artificial Intelligence Modeling Technology Based on Multi-Omics in Noninvasive Diagnosis of Inflammatory Bowel Disease. J Inflamm Res 14, 1933–1943 (2021).
    
    
12. 12.Luthra, R., Chen, H., Roy-Chowdhuri, S. & Singh, R. R. Next-Generation Sequencing in Clinical Molecular Diagnostics of Cancer: Advantages and Challenges. Cancers 7, 2023–2036 (2015).
    
    
13. 13.Audetat, A. et al. Analytic and Clinical Validation of a Pan-Cancer NGS Liquid Biopsy Test for the Detection of Copy Number Amplifications, Fusions and Exon Skipping Variants. Diagnostics 12, 729 (2022).
    
    
14. 14.Cherry, P. D. et al. Abstract 247: Twist pan-cancer synthetic RNA fusion control for assay development. Cancer Res 83, 247–247 (2023).
    
    
15. 15.Singh, A. K. et al. Detecting copy number variation in next generation sequencing data from diagnostic gene panels. Bmc Med Genomics 14, 214 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12920-021-01059-x&link_type=DOI) 

16. 16.Zanetti, A. et al. Setup and Validation of a Targeted Next-Generation Sequencing Approach for the Diagnosis of Lysosomal Storage Disorders. J Mol Diagnostics 22, 488–502 (2020).
    
    
17. 17.Toma, R. et al. A clinically validated human saliva metatranscriptomic test for global systems biology studies. Biotechniques 74, 31–44 (2023).
    
    
18. 18.Hatch, A. et al. A Robust Metatranscriptomic Technology for Population-Scale Studies of Diet, Gut Microbiome, and Human Health. Int J Genomics 2019, 1718741 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1155/2019/1718741&link_type=DOI) 

19. 19.Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 39, 105–114 (2021).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F02%2F2023.04.28.23289273.atom) 

20. 20.Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 27, 849–864 (2017).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjI3LzUvODQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMDUvMDIvMjAyMy4wNC4yOC4yMzI4OTI3My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

21. 21.Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/28.1.27&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10592173&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F02%2F2023.04.28.23289273.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000084896300007&link_type=ISI) 

22. 22.Huerta-Cepas, J. et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Mol Biol Evol 34, 2115–2122 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msx148.&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00040692&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F05%2F02%2F2023.04.28.23289273.atom) 

23. 23.Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26, 1721–1729 (2016).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjEwOiIyNi8xMi8xNzIxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMDUvMDIvMjAyMy4wNC4yOC4yMzI4OTI3My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

24. 24.Henderson, N. C. & Varadhan, R. Damped Anderson Acceleration With Restarts and Monotonicity Control for Accelerating EM and EM-like Algorithms. J Comput Graph Stat 28, 834–846 (2019).
    
    
25. 25.Kim, D. J. et al. Colorectal cancer diagnostic model utilizing metagenomic and metabolomic data of stool microbial extracellular vesicles. Sci Rep-uk 10, 2860 (2020).
    
    
26. 26.Yu, J. et al. Gene mutational analysis by NGS and its clinical significance in patients with myelodysplastic syndrome and acute myeloid leukemia. Exp Hematology Oncol 9, 2 (2020).
    
    
27. 27.Zaidi, A. H. et al. A blood-based circulating microbial metagenomic panel for early diagnosis and prognosis of oesophageal adenocarcinoma. Brit J Cancer 127, 2016–2024 (2022).
    
    
28. 28.Tian, H. et al. Gut Metagenome as a Potential Diagnostic and Predictive Biomarker in Slow Transit Constipation. Frontiers Medicine 8, 777961 (2022).