Unlabelled: The information criterion of minimum message length (MML) provides a powerful statistical framework for inductive reasoning from observed data. We apply MML to the problem of protein sequence comparison using finite state models with Dirichlet distributions. The resulting framework allows us to supersede the ad hoc cost functions commonly used in the field, by systematically addressing the problem of arbitrariness in alignment parameters, and the disconnect between substitution scores and gap costs. Furthermore, our framework enables the generation of marginal probability landscapes over all possible alignment hypotheses, with potential to facilitate the users to simultaneously rationalize and assess competing alignment relationships between protein sequences, beyond simply reporting a single (best) alignment. We demonstrate the performance of our program on benchmarks containing distantly related protein sequences.

Availability And Implementation: The open-source program supporting this work is available from: http://lcb.infotech.monash.edu.au/seqmmligner.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612809PMC
http://dx.doi.org/10.1093/bioinformatics/btz368DOI Listing

Publication Analysis

Top Keywords

protein sequences
8
marginal probability
8
probability landscapes
8
finite state
8
state models
8
models dirichlet
8
statistical compression
4
protein
4
compression protein
4
sequences inference
4

Similar Publications

The HAK/KUP/KT (High-affinity K transporters/K uptake permeases/K transporters) is the largest and most dominant potassium transporter family in plants, playing a crucial role in various biological processes. However, our understanding of HAK/KUP/KT gene family in potato ( L.) remains limited and unclear.

View Article and Find Full Text PDF

Introduction: Chimeric antigen receptor (CAR) expressing T-cells have shown great promise for the future of cancer immunotherapy with the recent clinical successes achieved in treating different hematologic cancers. Despite these early successes, several challenges remain in the field that require to be solved for the therapy to be more efficacious. One such challenge is the lack of long-term persistence of CD28 based CAR T-cells in patients.

View Article and Find Full Text PDF

Background: Therapeutic antibodies for the treatment of neurological disease show great potential, but their applications are rather limited due to limited brain exposure. The most well-studied approach to enhance brain influx of protein therapeutics, is receptor-mediated transcytosis (RMT) by targeting nutrient receptors to shuttle protein therapeutics over the blood-brain barrier (BBB) along with their endogenous cargos. While higher brain exposure is achieved with RMT, the timeframe is short due to rather fast brain clearance.

View Article and Find Full Text PDF

Background: Pseudogalium is a new monotypic genus with two subspecies in China and one in Japan, which holds a distinctive phylogenetic position and ecological significance within the tribe Rubieae. Chloroplast genomes contain abundant information for resolving phylogenetic relationships. To investigate the phylogenetics of P.

View Article and Find Full Text PDF

Background: Circular (circ)RNAs have emerged as crucial contributors to cancer progression. Nonetheless, the expression regulation, biological functions, and underlying mechanisms of circRNAs in mediating hepatocellular carcinoma (HCC) progression remain insufficiently elucidated.

Methods: We identified circUCK2(2,3) through circRNA sequencing, RT-PCR, and Sanger sequencing.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!