Organism-specific training improves performance of linear B-cell epitope prediction.

Bioinformatics

Department of Computer Science, College of Engineering and Physical Sciences, Aston University, Birmingham B4 7ET, UK.

Published: December 2021

Motivation: In silico identification of linear B-cell epitopes represents an important step in the development of diagnostic tests and vaccine candidates, by providing potential high-probability targets for experimental investigation. Current predictive tools were developed under a generalist approach, training models with heterogeneous datasets to develop predictors that can be deployed for a wide variety of pathogens. However, continuous advances in processing power and the increasing amount of epitope data for a broad range of pathogens indicate that training organism or taxon-specific models may become a feasible alternative, with unexplored potential gains in predictive performance.

Results: This article shows how organism-specific training of epitope prediction models can yield substantial performance gains across several quality metrics when compared to models trained with heterogeneous and hybrid data, and with a variety of widely used predictors from the literature. These results suggest a promising alternative for the development of custom-tailored predictive models with high predictive power, which can be easily implemented and deployed for the investigation of specific pathogens.

Availability And Implementation: The data underlying this article, as well as the full reproducibility scripts, are available at https://github.com/fcampelo/OrgSpec-paper. The R package that implements the organism-specific pipeline functions is available at https://github.com/fcampelo/epitopes.

Supplementary Information: Supplementary materials are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665745PMC
http://dx.doi.org/10.1093/bioinformatics/btab536DOI Listing

Publication Analysis

Top Keywords

organism-specific training
8
linear b-cell
8
epitope prediction
8
models
5
training improves
4
improves performance
4
performance linear
4
b-cell epitope
4
prediction motivation
4
motivation silico
4

Similar Publications

Enzymes catalyze diverse biochemical reactions and are building blocks of cellular and metabolic pathways. Data and metadata of enzymes are distributed across databases and are archived in various formats. The enzyme databases provide utilities for efficient searches and downloading enzyme records in batch mode but do not support organism-specific extraction of subsets of data.

View Article and Find Full Text PDF

A compendium of ruminant gastrointestinal phage genomes revealed a higher proportion of lytic phages than in any other environments.

Microbiome

April 2024

Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China.

Background: Ruminants are important livestock animals that have a unique digestive system comprising multiple stomach compartments. Despite significant progress in the study of microbiome in the gastrointestinal tract (GIT) sites of ruminants, we still lack an understanding of the viral community of ruminants. Here, we surveyed its viral ecology using 2333 samples from 10 sites along the GIT of 8 ruminant species.

View Article and Find Full Text PDF

Background: Invasive mold infections (IMIs) such as aspergillosis, mucormycosis, fusariosis, and lomentosporiosis are associated with high morbidity and mortality, particularly in immunocompromised patients, with mortality rates as high as 40% to 80%. Outcomes could be substantially improved with early initiation of appropriate antifungal therapy, yet early diagnosis remains difficult to establish and often requires multidisciplinary teams evaluating clinical and radiological findings plus supportive mycological findings. Universal digital high resolution melting analysis (U-dHRM) may enable rapid and robust diagnosis of IMI.

View Article and Find Full Text PDF

Background: Streptococcus thermophilus is an important strain widely used in dairy fermentation, with distinct urea metabolism characteristics compared to other lactic acid bacteria. The conversion of urea by S. thermophilus has been shown to affect the flavor and acidification characteristics of milk.

View Article and Find Full Text PDF

Recent advancements in Protein Language Models (pLMs) have enabled high-throughput analysis of proteins through primary sequence alone. At the same time, newfound evidence illustrates that codon usage bias is remarkably predictive and can even change the final structure of a protein. Here, we explore these findings by extending the traditional vocabulary of pLMs from amino acids to codons to encapsulate more information inside CoDing Sequences (CDS).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!