Motivation: Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in multiple alignments. The main issue addressed in this paper is the possibility of finding a general approach that would allow an exhaustive analysis of all the possible simplified alphabets, using substitution matrices like PAM and BLOSUM as a measure for scoring.

Results: The computational approach presented in this paper has led to a computer program called AlphaSimp (Alphabet Simplifier) that can perform an exhaustive analysis of the possible simplified amino acid alphabets, using a branch and bound algorithm together with standard or user-defined substitution matrices. The program returns a ranked list of the highest-scoring simplified alphabets. When the extent of the simplification is limited and the simplified alphabets are maintained above ten symbols the program is able to complete the analysis in minutes or even seconds on a personal computer. However, the performance becomes worse, taking up to several hours, for highly simplified alphabets.

Availability: AlphaSimp and other accessory programs are available at http://bioinformatics.cribi.unipd.it/alphasimp

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/18.8.1102DOI Listing

Publication Analysis

Top Keywords

simplified alphabets
16
substitution matrices
12
amino acid
8
acid alphabets
8
alphabets branch
8
branch bound
8
bound algorithm
8
exhaustive analysis
8
analysis simplified
8
alphabets
6

Similar Publications

BAD2matrix: Phylogenomic matrix concatenation, indel coding, and more.

Appl Plant Sci

September 2024

Lewis B. and Dorothy Cullman Program for Molecular Systematics The New York Botanical Garden, Bronx New York USA.

Premise: Common steps in phylogenomic matrix production include biological sequence concatenation, morphological data concatenation, insertion/deletion (indel) coding, gene content (presence/absence) coding, removing uninformative characters for parsimony analysis, recording with reduced amino acid alphabets, and occupancy filtering. Existing software does not accomplish these tasks on a phylogenomic scale using a single program.

Methods And Results: BAD2matrix is a Python script that performs the above-mentioned steps in phylogenomic matrix construction for DNA or amino acid sequences as well as morphological data.

View Article and Find Full Text PDF
Article Synopsis
  • Handwriting recognition systems require both specialized hardware and software, and creating one with sustainable materials poses various challenges.
  • A new flexible and electrically conductive wood-derived hydrogel array is developed as a handwriting input panel, utilizing materials like lignin, polypyrrole, and polyacrylic acid.
  • The system achieves efficient handwritten recognition through a 5×5 signal matrix and simplified algorithms, potentially leading to future applications in wearable technology and healthcare devices.
View Article and Find Full Text PDF

Construction of amino acids reduced alphabets from molecular descriptors for interpretation of N-carbamylase, luciferase and PI3K mutations.

Biosystems

December 2024

CHIMA Grupo de Química Matemática, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia.

The classification of amino acids has proven to be a useful tool for understanding the importance of sequence in protein function. The reduced amino acid alphabets are an example of these classifications, which, when built from physicochemical, structural and quantum characteristics of the amino acids, allow it to simplify the representation of the sequences, being useful in the modelling, design and understanding of proteins. So, an objective selection of amino acids properties is important, due classes formed in a reduced alphabet depend on the descriptors used for classification.

View Article and Find Full Text PDF

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy.

View Article and Find Full Text PDF

Reports an error in "Emotional context and predictability in naturalistic reading aloud" by Jessica M. Alexander and George A. Buzzell (, Advanced Online Publication, Sep 14, 2023, np).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!