Discriminative machine learning for maximal representative subsampling.

Tony Hauptmann Sophie Fellenz Laksan Nathan Oliver Tüscher Stefan Kramer

Sci Rep

Institute of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.

Published: November 2023

Biased samples in social sciences can distort research results, prompting new methods based on positive-unlabeled learning to combat this issue.* -
The proposed methods, Maximum Representative Subsampling (MRS) and Soft-MRS, use machine learning to either remove biased samples or adjust their weights by referencing a representative dataset.* -
Experiments showed that MRS is recommended for general classification tasks, while Soft-MRS is better when the bias of the outcome variable is important, with applications demonstrated in resilience research and voting behavior analysis.*

Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS - Soft-MRS - that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684887	PMC
http://dx.doi.org/10.1038/s41598-023-48177-3	DOI Listing

Publication Analysis

Top Keywords

machine learning

data set

sample weights

representative subsampling

downstream classification

classification tasks

bias

discriminative machine

learning

learning maximal

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered

A PHP Error was encountered