nail: software for high-speed, high-sensitivity protein sequence annotation.

bioRxiv

R. Ken Coit College of Pharmacy, University of Arizona, Tucson, Arizona, USA.

Published: January 2024

Unlabelled: " Fast is fine, but accuracy is final. " -- Wyatt Earp.

Background: The extreme diversity of newly sequenced organisms and considerable scale of modern sequence databases lead to a tension between competing needs for sensitivity and speed in sequence annotation, with multiple tools displacing the venerable BLAST software suite on one axis or another. Alignment based on profile hidden Markov models (pHMMs) has demonstrated state of art sensitivity, while recent algorithmic advances have resulted in hyper-fast annotation tools with sensitivity close to that of BLAST.

Results: Here, we introduce a new tool that bridges the gap between advances in these two directions, reaching speeds comparable to fast annotation methods such as MMseqs2 while retaining most of the sensitivity offered by pHMMs. The tool, called nail, implements a heuristic approximation of the pHMM Forward/Backward (FB) algorithm by identifying a sparse subset of the cells in the FB dynamic programming matrix that contains most of the probability mass. The method produces an accurate approximation of pHMM scores and E-values with high speed and small memory requirements. On a protein benchmark, nail recovers the majority of recall difference between MMseqs2 and HMMER, with run time ~26x faster than HMMER3 (only ~2.4x slower than MMseqs2's sensitive variant). nail is released under the open BSD-3-clause license and is available for download at https://github.com/TravisWheelerLab/nail.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10862755PMC
http://dx.doi.org/10.1101/2024.01.27.577580DOI Listing

Publication Analysis

Top Keywords

sequence annotation
8
approximation phmm
8
nail
4
nail software
4
software high-speed
4
high-speed high-sensitivity
4
high-sensitivity protein
4
protein sequence
4
annotation
4
annotation unlabelled
4

Similar Publications

Krait2: a versatile software for microsatellite investigation, visualization and marker development.

BMC Genomics

January 2025

Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 610225, China.

Background: Microsatellites are highly polymorphic repeat sequences ubiquitously interspersed throughout almost all genomes which are widely used as powerful molecular markers in diverse fields. Microsatellite expansions play pivotal roles in gene expression regulation and are implicated in various neurological diseases and cancers. Although much effort has been devoted to developing efficient tools for microsatellite identification, there is still a lack of a powerful tool for large-scale microsatellite analysis.

View Article and Find Full Text PDF

A cross-tissue transcriptome-wide association study identifies new susceptibility genes for benign prostatic hyperplasia.

Sci Rep

January 2025

Department of Urology, The Second Hospital & Clinical Medical School, Lanzhou University, Lanzhou, 730030, People's Republic of China.

Benign prostatic hyperplasia (BPH) is a prevalent urinary system disorder. Despite evidence of a significant genetic component from previous studies, the specific pathogenic genes and biological mechanisms are still largely unknown. The study utilized the FinnGen R10 dataset, encompassing 177,901 individuals (36,601 cases and 141,300 controls), and the GTEx v8 EQTLs files to conduct single-tissue and cross-tissue transcriptome-wide association studies (TWAS).

View Article and Find Full Text PDF

This study identified the amino acid sequences of peptides generated from the enzymatic hydrolysis of goat milk proteins from two different sources and annotated their functional activities. Peptidomics and molecular docking approaches were used to investigate the antioxidant and ACE inhibitory properties of the unique peptides, revealing the molecular mechanisms underlying their bioactivity. In vitro experiments showed that the IC50 values for ACE inhibition of the four peptides (LSMTDTR, QEALELIR, NIPVGILR, and QAQNVQHY) were 2.

View Article and Find Full Text PDF

More than 50% of families with suspected rare monogenic diseases remain unsolved after whole-genome analysis by short-read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare-disease cohort of 98 samples from 41 families, using nanopore sequencing, achieving per sample ∼36× average coverage and 32-kb read N50 from a single flow cell.

View Article and Find Full Text PDF

Six novel phages belonging to the family were isolated using as a host. Phages MuffinTheCat, Badulia, DesireeRose, Bee17, SCoupsA, and LuzDeMundo were purified from environmental samples by students participating in the Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) program at Alliance University, New York. The phages have linear dsDNA genomes 15,438-15,636 bp with 112-120 bp inverted terminal repeats.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!