Fixing imbalanced binary classification: An asymmetric Bayesian learning approach.

Letícia F M Reis Diego C Nascimento Paulo H Ferreira Francisco Louzada

PLoS One

Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, São Paulo, Brazil.

Published: October 2024

Many statistical and machine learning models for binary data assume that the data is well-balanced, which can lead to inaccurate predictions when it’s not.
To solve this issue, researchers suggest using asymmetric link functions in binary regression instead of traditional ones like logit or probit.
This study introduces new classification functions based on the Lomax distribution, showing that these models, particularly the reverse power double Lomax, outperform traditional methods in handling imbalanced data, providing clearer differentiation in predictive probabilities.

Most statistical and machine learning models used for binary data modeling and classification assume that the data are balanced. However, this assumption can lead to poor predictive performance and bias in parameter estimation when there is an imbalance in the data due to the threshold election for the binary classification. To address this challenge, several authors suggest using asymmetric link functions in binary regression, instead of the traditional symmetric functions such as logit or probit, aiming to highlight characteristics that would help the classification task. Therefore, this study aims to introduce new classification functions based on the Lomax distribution (and its variations; including power and reverse versions). The proposed Bayesian functions have proven asymmetry and were implemented in a Stan program into the R workflow. Additionally, these functions showed promising results in real-world data applications, outperforming classical link functions in terms of metrics. For instance, in the first example, comparing the reverse power double Lomax (RPDLomax) with the logit link showed that, regardless of the data imbalance, the RPDLomax model assigns effectively lower mean posterior predictive probabilities to failure and higher probabilities to success (21.4% and 63.7%, respectively), unlike Logistic regression, which does not clearly distinguish between the mean posterior predictive probabilities for these two classes (36.0% and 39.5% for failure and success, respectively). That is, the proposed asymmetric Lomax approach is a competitive model for differentiating binary data classification in imbalanced tasks against the Logistic approach.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11482710	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0311246	PLOS

Publication Analysis

Top Keywords

binary classification

binary data

link functions

posterior predictive

predictive probabilities

classification

data

functions

binary

fixing imbalanced

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!