HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors.

Anja Conev Romanos Fasoulis Sarah Hall-Swan Rodrigo Ferreira Lydia E Kavraki

iScience

Department of Computer Science, Rice University, Houston, TX, USA.

Published: January 2024

Peptide-HLA (pHLA) binding prediction is essential in screening peptide candidates for personalized peptide vaccines. Machine learning (ML) pHLA binding prediction tools are trained on vast amounts of data and are effective in screening peptide candidates. Most ML models report the ability to generalize to HLA alleles unseen during training ("pan-allele" models). However, the use of datasets with imbalanced allele content raises concerns about biased model performance. First, we examine the data bias of two ML-based pan-allele pHLA binding predictors. We find that the pHLA datasets overrepresent alleles from geographic populations of high-income countries. Second, we show that the identified data bias is perpetuated within ML models, leading to algorithmic bias and subpar performance for alleles expressed in low-income geographic populations. We draw attention to the potential therapeutic consequences of this bias, and we challenge the use of the term "pan-allele" to describe models trained with currently available public datasets.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10770483	PMC
http://dx.doi.org/10.1016/j.isci.2023.108613	DOI Listing

Publication Analysis

Top Keywords

phla binding

binding predictors

binding prediction

screening peptide

peptide candidates

data bias

geographic populations

hlaequity examining

examining biases

biases pan-allele

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!