Unlabelled: " Fast is fine, but accuracy is final. " -- Wyatt Earp.
Background: The extreme diversity of newly sequenced organisms and considerable scale of modern sequence databases lead to a tension between competing needs for sensitivity and speed in sequence annotation, with multiple tools displacing the venerable BLAST software suite on one axis or another. Alignment based on profile hidden Markov models (pHMMs) has demonstrated state of art sensitivity, while recent algorithmic advances have resulted in hyper-fast annotation tools with sensitivity close to that of BLAST.
Results: Here, we introduce a new tool that bridges the gap between advances in these two directions, reaching speeds comparable to fast annotation methods such as MMseqs2 while retaining most of the sensitivity offered by pHMMs. The tool, called nail, implements a heuristic approximation of the pHMM Forward/Backward (FB) algorithm by identifying a sparse subset of the cells in the FB dynamic programming matrix that contains most of the probability mass. The method produces an accurate approximation of pHMM scores and E-values with high speed and small memory requirements. On a protein benchmark, nail recovers the majority of recall difference between MMseqs2 and HMMER, with run time ~26x faster than HMMER3 (only ~2.4x slower than MMseqs2's sensitive variant). nail is released under the open BSD-3-clause license and is available for download at https://github.com/TravisWheelerLab/nail.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10862755 | PMC |
http://dx.doi.org/10.1101/2024.01.27.577580 | DOI Listing |
BMC Genomics
January 2025
Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 610225, China.
Background: Microsatellites are highly polymorphic repeat sequences ubiquitously interspersed throughout almost all genomes which are widely used as powerful molecular markers in diverse fields. Microsatellite expansions play pivotal roles in gene expression regulation and are implicated in various neurological diseases and cancers. Although much effort has been devoted to developing efficient tools for microsatellite identification, there is still a lack of a powerful tool for large-scale microsatellite analysis.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Urology, The Second Hospital & Clinical Medical School, Lanzhou University, Lanzhou, 730030, People's Republic of China.
Benign prostatic hyperplasia (BPH) is a prevalent urinary system disorder. Despite evidence of a significant genetic component from previous studies, the specific pathogenic genes and biological mechanisms are still largely unknown. The study utilized the FinnGen R10 dataset, encompassing 177,901 individuals (36,601 cases and 141,300 controls), and the GTEx v8 EQTLs files to conduct single-tissue and cross-tissue transcriptome-wide association studies (TWAS).
View Article and Find Full Text PDFInt J Biol Macromol
January 2025
College of Food Engineering and Nutritional Science, Shaanxi Normal University, Xi'an 710119, Shaanxi, China. Electronic address:
This study identified the amino acid sequences of peptides generated from the enzymatic hydrolysis of goat milk proteins from two different sources and annotated their functional activities. Peptidomics and molecular docking approaches were used to investigate the antioxidant and ACE inhibitory properties of the unique peptides, revealing the molecular mechanisms underlying their bioactivity. In vitro experiments showed that the IC50 values for ACE inhibition of the four peptides (LSMTDTR, QEALELIR, NIPVGILR, and QAQNVQHY) were 2.
View Article and Find Full Text PDFAm J Hum Genet
January 2025
UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA. Electronic address:
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole-genome analysis by short-read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare-disease cohort of 98 samples from 41 families, using nanopore sequencing, achieving per sample ∼36× average coverage and 32-kb read N50 from a single flow cell.
View Article and Find Full Text PDFViruses
January 2025
Biological Sciences Department, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Six novel phages belonging to the family were isolated using as a host. Phages MuffinTheCat, Badulia, DesireeRose, Bee17, SCoupsA, and LuzDeMundo were purified from environmental samples by students participating in the Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) program at Alliance University, New York. The phages have linear dsDNA genomes 15,438-15,636 bp with 112-120 bp inverted terminal repeats.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!