Motivation: In the post-genomic era, automatic annotation of protein sequences using computational homology-based methods is highly desirable. However, often protein sequences diverge to an extent where detection of homology and automatic annotation transfer is not straightforward. Sophisticated approaches to detect such distant relationships are needed. We propose a new approach to identify deep evolutionary relationships of proteins to overcome shortcomings of the available methods.

Results: We have developed a method to identify remote homologues more effectively from any protein sequence database by using several cascading events with Hidden Markov Models (C-HMM). We have implemented clustering of hits and profile generation of hit clusters to effectively reduce the computational timings of the cascaded sequence searches. Our C-HMM approach could cover 94, 83 and 40% coverage at family, superfamily and fold levels, respectively, when applied on diverse protein folds. We have compared C-HMM with various remote homology detection methods and discuss the trade-offs between coverage and false positives.

Availability And Implementation: A standalone package implemented in Java along with a detailed documentation can be downloaded from https://github.com/RSLabNCBS/C-HMM SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Contact: mini@ncbs.res.in.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btv538DOI Listing

Publication Analysis

Top Keywords

remote homology
8
homology detection
8
hidden markov
8
automatic annotation
8
protein sequences
8
rapid enhanced
4
enhanced remote
4
detection cascading
4
cascading hidden
4
markov model
4

Similar Publications

Transcription factors (TFs) are the main regulators of eukaryotic gene expression. The cooperative binding of at least two TFs to genomic DNA is a major mechanism of transcription regulation. Massive analysis of the co-occurrence of overrepresented pairs of motifs for different target TFs studied in ChIP-seq experiments can clarify the mechanisms of TF cooperation.

View Article and Find Full Text PDF

Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins.

View Article and Find Full Text PDF

Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements.

View Article and Find Full Text PDF
Article Synopsis
  • The text discusses a new framework called BioM3 that allows for the design of proteins using natural language prompts, integrating text and protein representation in a novel way.
  • This framework operates in three stages: aligning protein and text representations, refining text embeddings, and generating protein sequences using a specific model.
  • BioM3 has shown impressive results in various protein-related tasks and successfully generates proteins with characteristics similar to naturally occurring ones, validated through experimental tests.
View Article and Find Full Text PDF

Protein-protein interaction (PPI) networks are a fundamental resource for modeling cellular and molecular function, and a large and sophisticated toolbox has been developed to leverage their structure and topological organization to predict the functional roles of under-studied genes, proteins, and pathways. However, the overwhelming majority of experimentally-determined interactions from which such networks are constructed come from a small number of well-studied model organisms. Indeed, most species lack even a single experimentally-determined interaction in these databases, much less a network to enable the analysis of cellular function, and methods for computational PPI prediction are too noisy to apply directly.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!