In this work, we survey a breadth of literature that has revealed the limitations of predominant practices for dataset collection and use in the field of machine learning. We cover studies that critically review the design and development of datasets with a focus on negative societal impacts and poor outcomes for system performance. We also cover approaches to filtering and augmenting data and modeling techniques aimed at mitigating the impact of bias in datasets. Finally, we discuss works that have studied data practices, cultures, and disciplinary norms and discuss implications for the legal, ethical, and functional challenges the field continues to face. Based on these findings, we advocate for the use of both qualitative and quantitative approaches to more carefully document and analyze datasets during the creation and usage phases.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8600147PMC
http://dx.doi.org/10.1016/j.patter.2021.100336DOI Listing

Publication Analysis

Top Keywords

machine learning
8
data discontents
4
discontents survey
4
survey dataset
4
dataset development
4
development machine
4
learning work
4
work survey
4
survey breadth
4
breadth literature
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!