Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples . We propose the , a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models' test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the dataset. We then conclude that robust prediction of TCR recognition is still far for being solved.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9634250 | PMC |
http://dx.doi.org/10.3389/fimmu.2022.1014256 | DOI Listing |
Reduced bacteria concentrations in wastewater is a key indicator of the efficacy of water resource recovery facilities (WRRFs). However, monitoring the presence of bacterial concentrations in real time at each stage of the WRRF is challenging as it requires taking and processing water samples offline. Although few studies have been proposed to predict bacterial concentrations using data-driven models, generalizing these models to unseen data from different WRRFs remains challenging.
View Article and Find Full Text PDFInt J Med Inform
December 2024
Chongqing Cancer Multiomics Big Data Application Engineering Research Center, Chongqing University Cancer Hospital, Chongqing 400030, China. Electronic address:
Background: With advancements in healthcare, traditional VTE risk assessment tools are increasingly insufficient to meet the demands of high-quality care, underscoring the need for innovative and specialized assessment methods.
Objective: Owing to the remarkable success of machine learning in supervised learning and disease prediction, our objective is to develop a reliable and efficient model for assessing VTE risk by leveraging the fundamental data and clinical characteristics of colorectal cancer patients within our medical facility.
Methods: Six commonly used machine learning algorithms were utilized in our study to predict the occurrence of VTE in patients with rectal cancer.
Front Artif Intell
December 2024
School of Industrial Engineering and Management, Oklahoma State University, Stillwater, OK, United States.
The ability to accurately predict the yields of different crop genotypes in response to weather variability is crucial for developing climate resilient crop cultivars. Genotype-environment interactions introduce large variations in crop-climate responses, and are hard to factor in to breeding programs. Data-driven approaches, particularly those based on machine learning, can help guide breeding efforts by factoring in genotype-environment interactions when making yield predictions.
View Article and Find Full Text PDFPLoS One
December 2024
Digital Environment Research Institute (DERI), Queen Mary University of London, London, United Kingdom.
Deep learning techniques are increasingly being used to classify medical imaging data with high accuracy. Despite this, due to often limited training data, these models can lack sufficient generalizability to predict unseen test data, produced in different domains, with comparable performance. This study focuses on thyroid histopathology image classification and investigates whether a Generative Adversarial Network [GAN], trained with just 156 patient samples, can produce high quality synthetic images to sufficiently augment training data and improve overall model generalizability.
View Article and Find Full Text PDFInt J Med Inform
December 2024
Adelaide Dental School, University of Adelaide, Adelaide, SA5000, Australia; Research and Innovations, Dental Loop Pty Ltd, Adelaide, SA5000, Australia. Electronic address:
Background: The automated segmentation of individual teeth from 3D models of the human dental arch is challenging due to variations in tooth alignment, arch form and overall maxillofacial anatomy. Domain adaptation is a specialised technique in deep learning which allows models to adapt to data from different domains, such as varying tooth and dental arch forms, without requiring human annotations.
Purpose: This study aimed to segment individual teeth from various dental arch morphologies in 3D intraoral scans using domain adaptation.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!