Motivation: Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in multiple alignments. The main issue addressed in this paper is the possibility of finding a general approach that would allow an exhaustive analysis of all the possible simplified alphabets, using substitution matrices like PAM and BLOSUM as a measure for scoring.
Results: The computational approach presented in this paper has led to a computer program called AlphaSimp (Alphabet Simplifier) that can perform an exhaustive analysis of the possible simplified amino acid alphabets, using a branch and bound algorithm together with standard or user-defined substitution matrices. The program returns a ranked list of the highest-scoring simplified alphabets. When the extent of the simplification is limited and the simplified alphabets are maintained above ten symbols the program is able to complete the analysis in minutes or even seconds on a personal computer. However, the performance becomes worse, taking up to several hours, for highly simplified alphabets.
Availability: AlphaSimp and other accessory programs are available at http://bioinformatics.cribi.unipd.it/alphasimp
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/18.8.1102 | DOI Listing |
Appl Plant Sci
September 2024
Lewis B. and Dorothy Cullman Program for Molecular Systematics The New York Botanical Garden, Bronx New York USA.
Premise: Common steps in phylogenomic matrix production include biological sequence concatenation, morphological data concatenation, insertion/deletion (indel) coding, gene content (presence/absence) coding, removing uninformative characters for parsimony analysis, recording with reduced amino acid alphabets, and occupancy filtering. Existing software does not accomplish these tasks on a phylogenomic scale using a single program.
Methods And Results: BAD2matrix is a Python script that performs the above-mentioned steps in phylogenomic matrix construction for DNA or amino acid sequences as well as morphological data.
Adv Sci (Weinh)
December 2024
Department of Applied Physics, Aalto University, Aalto, FI-00076, Finland.
Biosystems
December 2024
CHIMA Grupo de Química Matemática, Universidad de Pamplona, Km 1 Vía Bucaramanga, Pamplona, Colombia.
The classification of amino acids has proven to be a useful tool for understanding the importance of sequence in protein function. The reduced amino acid alphabets are an example of these classifications, which, when built from physicochemical, structural and quantum characteristics of the amino acids, allow it to simplify the representation of the sequences, being useful in the modelling, design and understanding of proteins. So, an objective selection of amino acids properties is important, due classes formed in a reduced alphabet depend on the descriptors used for classification.
View Article and Find Full Text PDFNat Commun
July 2024
Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy.
View Article and Find Full Text PDFReports an error in "Emotional context and predictability in naturalistic reading aloud" by Jessica M. Alexander and George A. Buzzell (, Advanced Online Publication, Sep 14, 2023, np).
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!