We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HMMER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long read sequencing data and in the context of pseudogenes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10802276 | PMC |
http://dx.doi.org/10.1101/2023.12.31.573773 | DOI Listing |
BMC Genomics
January 2025
State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China.
Background: Populus tomentosa, known as Chinese white poplar, is indigenous and distributed across large areas of China, where it plays multiple important roles in forestry, agriculture, conservation, and urban horticulture. However, limited accessibility to the mitochondrial (mt) genome of P. tomentosa impedes phylogenetic and population genetic analyses and restricts functional gene research in Salicaceae family.
View Article and Find Full Text PDFNAR Genom Bioinform
March 2025
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region.
View Article and Find Full Text PDFPLoS Pathog
January 2025
School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
Long non-coding RNAs (lncRNAs) are essential components of innate immunity, maintaining the functionality of immune systems that control virus infection. However, how lncRNAs engage immune responses during influenza A virus (IAV) infection remains unclear. Here, we show that lncRNA USP30-AS1 is up-regulated by infection of multiple different IAV subtypes and is required for tuning inflammatory and antiviral response in IAV infection.
View Article and Find Full Text PDFMitochondrial DNA B Resour
December 2024
School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China.
Thunb. (1784) is primarily distributed in eastern Asia, has a total length of 152,778 bp and consists of a large single copy (LSC) region of 84,517 bp, a small single copy (SSC) region of 18,277 bp, and two inverted repeat (IRs) regions of 24,992 bp . The GC content is 37.
View Article and Find Full Text PDFMitochondrial DNA B Resour
December 2024
Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem. Sun Yat-Sen), Nanjing, China.
Hemsl. 1889 is an endemic deciduous shrub in China, belonging to the family Ericaceae. In this study, the first complete chloroplast genome of was assembled and annotated.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!