Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.

J Comput Biol

School of Computer Science, University of Windsor, Windsor, Canada .

Published: August 2017

Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5563921PMC
http://dx.doi.org/10.1089/cmb.2017.0021DOI Listing

Publication Analysis

Top Keywords

next-generation sequencing
8
genomic sequences
8
reduces number
8
ambiguous nucleotides
8
threshold zseq
8
assembled transcripts
8
sequences
7
zseq
6
zseq approach
4
approach preprocessing
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!