In silico read normalization using set multi-cover optimization.

Bioinformatics

Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.

Published: October 2018

Motivation: De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important k-mers that form connections between regions in the graph.

Results: Here, normalization is phrased as a set multi-cover problem on reads and a heuristic algorithm, Optimized Read Normalization Algorithm (ORNA), is proposed. ORNA normalizes to the minimum number of reads required to retain all k-mers and their relative k-mer abundances from the original dataset. Hence, all connections from the original graph are preserved. ORNA was tested on various RNA-seq datasets with different coverage values. It was compared to the current normalization algorithms and was found to be performing better. Normalizing error corrected data allows for more accurate assemblies compared to the normalized uncorrected dataset. Further, an application is proposed in which multiple datasets are combined and normalized to predict novel transcripts that would have been missed otherwise. Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with loss of assembly quality in between [1, 30]% depending on reduction stringency.

Availability And Implementation: ORNA is available at https://github.com/SchulzLab/ORNA.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157080PMC
http://dx.doi.org/10.1093/bioinformatics/bty307DOI Listing

Publication Analysis

Top Keywords

read normalization
12
set multi-cover
8
current normalization
8
normalization algorithms
8
normalization algorithm
8
normalization
7
datasets
6
orna
5
silico read
4
normalization set
4

Similar Publications

Objective: Osteoarthritis (OA) represents a condition under the influence of central nervous system (CNS) regulatory mechanisms. This investigation aims to examine the causal association between viral infections of the central nervous system (VICNS) and inflammatory diseases of the central nervous system (IDCNS) and knee osteoarthritis (KOA) at the genetic level.

Methods: In this investigation, VICNS and IDCNS were considered as primary exposure variables, while KOA served as the primary outcome.

View Article and Find Full Text PDF

Decoding the chicken gastrointestinal microbiome.

BMC Microbiol

January 2025

School of Biological Sciences, Institute for Global Food Security, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK.

Metataxonomic studies have underpinned a vast understanding of microbial communities residing within livestock gastrointestinal tracts, albeit studies have often not been combined to provide a global census. Consequently, in this study we characterised the overall and common 'core' chicken microbiota associated with the gastrointestinal tract (GIT), whilst assessing the effects of GIT site, bird breed, age and geographical location on the GIT resident microbes using metataxonomic data compiled from studies completed across the world. Specifically, bacterial 16S ribosomal DNA sequences from GIT samples associated with various breeds, differing in age, GIT sites (caecum, faeces, ileum and jejunum) and geographical location were obtained from the Sequence Read Archive and analysed using the MGnify pipeline.

View Article and Find Full Text PDF

Tip-of-the-pen states in Mandarin handwriting: The effect of brief non-target language exposure.

Mem Cognit

January 2025

Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, Ministry of Education, and Center for Studies of Psychological Application, School of Psychology, South China Normal University, 55 West Zhongshan Ave, Guangzhou, 510631, Guangdong, China.

The tip-of-the-pen (TOP) is a phenomenon in which individuals fail to completely retrieve the orthographic information of a known character, and mainly occurs in Mandarin (a non-alphabetic language in which the orthography is largely independent of the phonology). The present study examined whether and how long-term language experience and brief exposure to non-target language affected TOP rates in Mandarin handwriting. In Experiment 1, high and low proficiency Mandarin-English bilinguals completed a Mandarin character dictation task before and after watching a short English movie.

View Article and Find Full Text PDF

Background: transmission can occur during pregnancy if the mother contracts the infection for the first time. Treatment strategies include the use of antimicrobial medications and providing supportive care. Spiramycin is commonly used to treat toxoplasmosis in pregnant women and to hinder the disease's transmission.

View Article and Find Full Text PDF

Absolute bacterial biomass estimation in the human gut is crucial for understanding microbiome dynamics and host-microbe interactions. Current methods for quantifying bacterial biomass in stool, such as flow cytometry, qPCR, or spike-ins (i.e.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!