SLIMM: species level identification of microorganisms from metagenomes

View article
2831 days ago
RT @jcamthrash: SLIMM: species level identification of microorganisms from metagenomes https://t.co/LtoYla2swo
2885 days ago
RT @thePeerJ: Popular article this week - SLIMM: species level identification of microorganisms from metagenomes https://t.co/qLHWqlaum1 #b…
PeerJ

Main article text

 

Introduction

Method

Nonredundant Reference genomes database

Read mapping against a database of interest

Collecting coverage information of each reference genome

Discarding unlikely genomes based on coverage landscape

Recalculating reads uniqueness after discarding unlikely genomes

Assigning reads to their LCA and calculating abundances at a given rank

Results and Discussion

Datasets

Performance comparison

Supplemental Information

Precision—call Curves: SLIMM vs Existing Methods of 8 different datasets

True Positive Rate (TPR)/recall drawn against precision. SLIMM received the highest performance for all of the datasets by detecting most of the microorganisms in each sample while staying precise.

DOI: 10.7717/peerj.3138/supp-1

Precision—Recall Curves of different SLIMM variants for 8 different datasets

True Positive Rate (TPR)/recall drawn against precision. These plots show the accuracy performance of different SLIMM variants, i.e., SLIMM, SLIMM-DG (with digital normalization), SLIMM-NF (without filtration step based on coverage landscape), SLIMM-NF-DG (without filtration but with digital normalization) and SLIMM using alignment produced by the read mapper Bowtie2. The comparison is done across 8 different datasets. SLIMM’s filtration step produced the highest performance for all of the datasets.

DOI: 10.7717/peerj.3138/supp-2

Violin Plots of the difference between real and predicted abundances: SLIMM vs. Existing Methods

The violin plots show how well the different tools predicted the abundances compared to the actual abundances across eight different datasets. From the plots, we can clearly see that SLIMM has the lowest divergence from the actual abundance for most of the samples.

DOI: 10.7717/peerj.3138/supp-3

Violin Plots of the difference between real and predicted abundances: different SLIMM variants

The violin plots show how well the different variants of SLIMM predicted the abundances compared to the actual abundances across eight different datasets.

DOI: 10.7717/peerj.3138/supp-4

Scatter plots showing predicted vs real abundances

Abundances of 8 different samples predicted by different tools compared to the true abundance used for simulation. SLIMM predicted the abundances more accurately than the other tools.

DOI: 10.7717/peerj.3138/supp-5

Scatter plots showing predicted vs real abundances by different SLIMM variants

Abundances of 8 different samples predicted by different flavors of SLIMM compared to the true abundance used for simulation.

DOI: 10.7717/peerj.3138/supp-6

Supplementary data containing, details of datasets used for this study, accuracy comparison of different methods per datasets, runtime and memory consumption of each method for individual datasets and statistical details (STDDEV, MEAN, Variance, Q1, Q2(median), Q3) of the differences b/n real and predicted abundances.

DOI: 10.7717/peerj.3138/supp-7

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

Temesgen Hailemariam Dadi conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Bernhard Y. Renard, Lothar H. Wieler and Torsten Semmler wrote the paper, reviewed drafts of the paper.

Knut Reinert wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Data Availability

The following information was supplied regarding data availability:

SLIMM was developed in C++ with SeqAn Library (Döring et al., 2008). The program is available for free at https://github.com/seqan/slimm.

Funding

This work is supported by the International Max Planck Research School for Computational Biology and Scientific Computing and by the InfectControl 2020 Project (TFP-TV4). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

28 Citations 6,755 Views 1,129 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more