UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction.

Maria Tsagiopoulou Maria Christina Maniou Nikolaos Pechlivanis Anastasis Togkousidis Michaela Kotrová Tobias Hutzenlaub Ilias Kappas Anastasia Chatzidimitriou Fotis Psomopoulos

Front Genet

Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.

Published: May 2021

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8193862	PMC
http://dx.doi.org/10.3389/fgene.2021.660366	DOI Listing

Publication Analysis

Top Keywords

umic preprocessing

preprocessing method

method umi

umi deduplication

deduplication reads

reads correction

correction refinement

refinement high-throughput

high-throughput sequencing

sequencing involves

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!