Motivation: De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important k-mers that form connections between regions in the graph.
Results: Here, normalization is phrased as a set multi-cover problem on reads and a heuristic algorithm, Optimized Read Normalization Algorithm (ORNA), is proposed. ORNA normalizes to the minimum number of reads required to retain all k-mers and their relative k-mer abundances from the original dataset. Hence, all connections from the original graph are preserved. ORNA was tested on various RNA-seq datasets with different coverage values. It was compared to the current normalization algorithms and was found to be performing better. Normalizing error corrected data allows for more accurate assemblies compared to the normalized uncorrected dataset. Further, an application is proposed in which multiple datasets are combined and normalized to predict novel transcripts that would have been missed otherwise. Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with loss of assembly quality in between [1, 30]% depending on reduction stringency.
Availability And Implementation: ORNA is available at https://github.com/SchulzLab/ORNA.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157080 | PMC |
http://dx.doi.org/10.1093/bioinformatics/bty307 | DOI Listing |
Aging Clin Exp Res
January 2025
Department of Joint Surgery, HongHui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, 710054, China.
Objective: Osteoarthritis (OA) represents a condition under the influence of central nervous system (CNS) regulatory mechanisms. This investigation aims to examine the causal association between viral infections of the central nervous system (VICNS) and inflammatory diseases of the central nervous system (IDCNS) and knee osteoarthritis (KOA) at the genetic level.
Methods: In this investigation, VICNS and IDCNS were considered as primary exposure variables, while KOA served as the primary outcome.
BMC Microbiol
January 2025
School of Biological Sciences, Institute for Global Food Security, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK.
Metataxonomic studies have underpinned a vast understanding of microbial communities residing within livestock gastrointestinal tracts, albeit studies have often not been combined to provide a global census. Consequently, in this study we characterised the overall and common 'core' chicken microbiota associated with the gastrointestinal tract (GIT), whilst assessing the effects of GIT site, bird breed, age and geographical location on the GIT resident microbes using metataxonomic data compiled from studies completed across the world. Specifically, bacterial 16S ribosomal DNA sequences from GIT samples associated with various breeds, differing in age, GIT sites (caecum, faeces, ileum and jejunum) and geographical location were obtained from the Sequence Read Archive and analysed using the MGnify pipeline.
View Article and Find Full Text PDFMem Cognit
January 2025
Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, Ministry of Education, and Center for Studies of Psychological Application, School of Psychology, South China Normal University, 55 West Zhongshan Ave, Guangzhou, 510631, Guangdong, China.
The tip-of-the-pen (TOP) is a phenomenon in which individuals fail to completely retrieve the orthographic information of a known character, and mainly occurs in Mandarin (a non-alphabetic language in which the orthography is largely independent of the phonology). The present study examined whether and how long-term language experience and brief exposure to non-target language affected TOP rates in Mandarin handwriting. In Experiment 1, high and low proficiency Mandarin-English bilinguals completed a Mandarin character dictation task before and after watching a short English movie.
View Article and Find Full Text PDFIran J Pharm Res
October 2024
Laboratory of Molecular Parasitology, Scientific Center of Zoology and Hydroecology, Yerevan, Armenia.
Background: transmission can occur during pregnancy if the mother contracts the infection for the first time. Treatment strategies include the use of antimicrobial medications and providing supportive care. Spiramycin is commonly used to treat toxoplasmosis in pregnant women and to hinder the disease's transmission.
View Article and Find Full Text PDFAbsolute bacterial biomass estimation in the human gut is crucial for understanding microbiome dynamics and host-microbe interactions. Current methods for quantifying bacterial biomass in stool, such as flow cytometry, qPCR, or spike-ins (i.e.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!