Background: Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics.
Results: We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy.
Conclusion: HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123343 | PMC |
http://dx.doi.org/10.1186/s12864-016-3097-0 | DOI Listing |
Protein Cell
January 2025
Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China.
Extensive epigenetic reprogramming involves in memory CD8+ T-cell differentiation. The elaborate epigenetic rewiring underlying the heterogeneous functional states of CD8+ T cells remains hidden. Here, we profile single-cell chromatin accessibility and map enhancer-promoter interactomes to characterize the differentiation trajectory of memory CD8+ T cells.
View Article and Find Full Text PDFBrief Bioinform
November 2024
Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China.
Alternative polyadenylation (APA) is an important driver of transcriptome diversity that generates messenger RNA isoforms with distinct 3' ends. The rapid development of single-cell and spatial transcriptomic technologies opened up new opportunities for exploring APA data to discover hidden cell subpopulations invisible in conventional gene expression analysis. However, conventional gene-level analysis tools are not fully applicable to APA data, and commonly used unsupervised dimensionality reduction methods often disregard experimentally derived annotations such as cell type identities.
View Article and Find Full Text PDFNPJ Precis Oncol
January 2025
Division of Neonatology and Center for Newborn Care, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
Medulloblastoma (MB) is an aggressive pediatric brain tumor with distinct molecular heterogeneity. Identifying subtype-specific signatures within Group 3 and Group 4 remains challenging due to shared cytogenetic alterations and limitations of conventional differential gene expression analysis. To uncover the underlying molecular signatures and hidden regulators, we used the Cavalli transcriptomic profile of 470 Group 3 and Group 4 MB patients to reconstruct subtype-specific regulatory networks.
View Article and Find Full Text PDFAntimicrob Agents Chemother
December 2024
Programa de Investigación en Enfermedades Tropicales, Escuela de Medicina Veterinaria, Universidad Nacional, Heredia, Costa Rica.
Brucellosis has therapeutic challenges due to 3%-15% relapses/therapeutic failures (R/TF) after antibiotic treatment. Therefore, determining the antibiotic concentration in tissues, the physiopathological parameters, and the R/TF after treatment is relevant. After exploring different antibiotic quantities, we found that a combined dose of 100 µg/g of doxycycline (for 45 days) and 7.
View Article and Find Full Text PDFFront Immunol
January 2025
Mike Petryk School of Dentistry, Faculty of Medicine and Dentistry, College of Health Sciences, University of Alberta, Edmonton, AB, Canada.
Once thought to be in a terminally differentiated state, macrophages are now understood to be highly pliable, attuned and receptive to environmental cues that control and align responses. In development of purpose, the centrality of metabolic pathways has emerged. Thus, macrophage inflammatory or reparative phenotypes are tightly linked to catabolic and anabolic metabolism, with further fine tuning of specific gene expression patterns in specific settings.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!