Background: Classification of certain proteins with specific functions is momentous for biological research. Encoding approaches of protein sequences for feature extraction play an important role in protein classification. Many computational methods (namely classifiers) are used for classification on protein sequences according to various encoding approaches. Commonly, protein sequences keep certain labels corresponding to different categories of biological functions (e.g., bacterial type IV secreted effectors or not), which makes protein prediction a fantasy. As to protein prediction, a kernel set of protein sequences keeping certain labels certified by biological experiments should be existent in advance. However, it has been hardly ever seen in prevailing researches. Therefore, unsupervised learning rather than supervised learning (e.g. classification) should be considered. As to protein classification, various classifiers may help to evaluate the effectiveness of different encoding approaches. Besides, variable selection from an encoded feature representing protein sequences is an important issue that also needs to be considered.
Results: Focusing on the latter problem, we propose a new method for variable selection from an encoded feature representing protein sequences. Taking a benchmark dataset containing 1947 protein sequences as a case, experiments are made to identify bacterial type IV secreted effectors (T4SE) from protein sequences, which are composed of 399 T4SE and 1548 non-T4SE. Comparable and quantified results are obtained only using certain components of the encoded feature, i.e., position-specific scoring matix, and that indicates the effectiveness of our method.
Conclusions: Certain variables other than an encoded feature they belong to do work for discrimination between different types of proteins. In addition, ensemble classifiers with an automatic assignment of different base classifiers do achieve a better classification result.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7590791 | PMC |
http://dx.doi.org/10.1186/s12859-020-03826-6 | DOI Listing |
Nat Commun
December 2024
Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA.
The cis-regulatory elements encoded in an mRNA determine its stability and translational output. While there has been a considerable effort to understand the factors driving mRNA stability, the regulatory frameworks governing translational control remain more elusive. We have developed a novel massively parallel reporter assay (MPRA) to measure mRNA translation, named Nascent Peptide Translating Ribosome Affinity Purification (NaP-TRAP).
View Article and Find Full Text PDFNat Commun
December 2024
Laboratory of Aging Research and Cancer Drug Target, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu, China.
The immune escape capacities of XBB variants necessitate the authorization of vaccines with these antigens. In this study, we produce three recombinant trimeric proteins from the RBD sequences of Delta, BA.5, and XBB.
View Article and Find Full Text PDFNat Commun
December 2024
Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK.
The rate and pattern of mutagenesis in cancer genomes is significantly influenced by DNA accessibility and active biological processes. Here we show that efficient sites of replication initiation drive and modulate specific mutational processes in cancer. Sites of replication initiation impede nucleotide excision repair in melanoma and are off-targets for activation-induced deaminase (AICDA) activity in lymphomas.
View Article and Find Full Text PDFNat Commun
December 2024
Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.
The genomes of human gut bacteria in the genus Bacteroides include numerous operons for biosynthesis of diverse capsular polysaccharides (CPSs). The first two genes of each CPS operon encode a locus-specific paralog of transcription elongation factor NusG (called UpxY), which enhances transcript elongation, and a UpxZ protein that inhibits noncognate UpxYs. This process, together with promoter inversions, ensures that a single CPS operon is transcribed in most cells.
View Article and Find Full Text PDFNat Commun
December 2024
Oncology Bioinformatics, Genentech, South San Francisco, CA, USA.
Based on the success of cancer immunotherapy, personalized cancer vaccines have emerged as a leading oncology treatment. Antigen presentation on MHC class I (MHC-I) is crucial for the adaptive immune response to cancer cells, necessitating highly predictive computational methods to model this phenomenon. Here, we introduce HLApollo, a transformer-based model for peptide-MHC-I (pMHC-I) presentation prediction, leveraging the language of peptides, MHC, and source proteins.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!