Antibodies specifically bind to antigens and are an essential part of the immune system. Hence, antibodies are powerful tools in research and diagnostics. High-throughput sequencing technologies have promoted comprehensive profiling of the immune repertoire, which has resulted in large amounts of antibody sequences that remain to be further analyzed. In this study, antibodies were downloaded from IMGT/LIGM-DB and Sequence Read Archive databases. Contributing features from antibody heavy chains were formulated as numerical inputs and fed into an ensemble machine learning classifier to classify the antigen specificity of six classes of antibodies, namely anti-HIV-1, anti-influenza virus, anti-pneumococcal polysaccharide, anti-citrullinated protein, anti-tetanus toxoid and anti-hepatitis B virus. The classifier was validated using cross-validation and a testing dataset. The ensemble classifier achieved a macro-average area under the receiver operating characteristic curve (AUC) of 0.9246 from the 10-fold cross-validation, and 0.9264 for the testing dataset. Among the contributing features, the contribution of the complementarity-determining regions was 53.1% and that of framework regions was 46.9%, and the amino acid mutation rates occupied the first and second ranks among the top five contributing features. The classifier and insights provided in this study could promote the mechanistic study, isolation and utilization of potential therapeutic antibodies.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bib/bbab516DOI Listing

Publication Analysis

Top Keywords

contributing features
12
testing dataset
8
antibodies
6
classifier
5
heavy chain
4
chain sequence-based
4
sequence-based classifier
4
classifier specificity
4
specificity human
4
human antibodies
4

Similar Publications

Generative Adversarial Networks for Neuroimage Translation.

J Comput Biol

December 2024

Electrical, Computer and Biomedical Engineering, Toronto Metropolitan University, Toronto, Canada.

Image-to-image translation has gained popularity in the medical field to transform images from one domain to another. Medical image synthesis via domain transformation is advantageous in its ability to augment an image dataset where images for a given class are limited. From the learning perspective, this process contributes to the data-oriented robustness of the model by inherently broadening the model's exposure to more diverse visual data and enabling it to learn more generalized features.

View Article and Find Full Text PDF

GeniePool 2.0: advancing variant analysis through CHM13-T2T, AlphaMissense, gnomAD V4 integration, and variant co-occurrence queries.

Database (Oxford)

December 2024

The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel.

Originally developed to meet the challenges of genomic data deluge, GeniePool emerged as a pioneering platform, enabling efficient storage, accessibility, and analysis of vast genomic datasets, enabled due to its data lake architecture. Building on this foundation, GeniePool 2.0 advances genomic analysis through the integration of cutting-edge variant databases, such as CHM13-T2T, AlphaMissense, and gnomAD V4, coupled with the capability for variant co-occurrence queries.

View Article and Find Full Text PDF

Lung adenocarcinoma (LUAD) represents one of the most common subtypes of lung cancer with high rates of incidence and mortality, which contributes to substantial health and economic demand across the globe. Treatment today mainly consists of surgery, radiotherapy, and chemotherapy, but their efficacy in advanced stages is often suboptimal and emphasizes the clear need for new biomarkers and therapeutic targets. Using comprehensive bioinformatics analyses consisting of the Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO), Human Protein Atlas (HPA) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), immune infiltration analysis and functional enrichment analysis, and single-cell analysis, we examined the potential of keratin 18 (KRT18) as a candidate biomarker in advanced LUAD.

View Article and Find Full Text PDF

The duck industry is vital for supplying high-quality protein, making research into the development of duck skeletal muscle critical for improving meat and egg production. In this study, we leveraged Oxford Nanopore Technologies (ONT) sequencing to perform full-length transcriptome sequencing of myoblasts harvested from the leg muscles of duck embryos at embryonic day 13 (E13), specifically examining both the proliferative (GM) and differentiation (DM) phases. Our analysis identified a total of 5797 novel transcripts along with 2332 long non-coding RNAs (lncRNAs), revealing substantial changes in gene expression linked to muscle development.

View Article and Find Full Text PDF

TIPE () has been identified as an oncogene and participates in tumor biology. However, how its role in the metabolism of tumor cells during melanoma development remains unclear. Here, we demonstrated that TIPE promoted glycolysis by interacting with pyruvate kinase M2 (PKM2) in melanoma.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!