PRINSEQ++, a multi-threaded tool for fast and efficient quality control and preprocessing of sequencing datasets
Author and article information
Abstract
PRINSEQ++ is a C++ implementation of the very popular software prinseq-lite for quality control and preprocessing of sequencing datasets. PRINSEQ++ can run multi-threaded processes, which makes it more than 10 times faster than the original version. It can read from, and write to, compressed files, drastically reducing the use of hard-drive. PRINSEQ++ can filter, trim and reformat sequences by a variety of options to improve downstream analysis. PRINSEQ++ is freely available on GitHub (https://github.com/Adrian-Cantu/PRINSEQ-plus-plus) and runs on all Unix-like systems.
Cite this as
2019. PRINSEQ++, a multi-threaded tool for fast and efficient quality control and preprocessing of sequencing datasets. PeerJ Preprints 7:e27553v1 https://doi.org/10.7287/peerj.preprints.27553v1Author comment
This is a preprint submission to PeerJ Preprints.
Sections
Supplemental Information
Raw data for timing experiment
Each row is a timing measurement for some input size/ number of threads combination
Summary statistics used to plot figure 1
Each rows indicates the average time, standard deviation, standard error, and .95 confidence interval for the timing measurements for each input size/number of threads combination. This data is derived from sup_table1
Code to plot figure1
jupyter notebook of the code used to plot figure1. This is also available in the github repository
Runtime comparison, Run-time of prinseq-lite and PRINSEQ++ was measured on several FASTQ pair files of different sizes with equivalent options. PRINSEQ++ was run with different number of threads, prinseq-lite single-threaded. Mean speedup of PRINSEQ++ ove
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Vito Adrian Cantu conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Jeffrey Sadural conceived and designed the experiments, performed the experiments, authored or reviewed drafts of the paper, approved the final draft.
Robert Edwards conceived and designed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.
Data Deposition
The following information was supplied regarding data availability:
Funding
VAC and JS were supported by NSF grant number MCB-1441985 to RE; Department of Energy Lawrence Livermore National Laboratory grant B618146 to RE; computational resources were supported by NSF grant DBI-0850356 to RE. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.