Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science.
View Article and Find Full Text PDFIn this work, we survey a breadth of literature that has revealed the limitations of predominant practices for dataset collection and use in the field of machine learning. We cover studies that critically review the design and development of datasets with a focus on negative societal impacts and poor outcomes for system performance. We also cover approaches to filtering and augmenting data and modeling techniques aimed at mitigating the impact of bias in datasets.
View Article and Find Full Text PDFThe contribution of Black female scholars to our understanding of data and their limits of representation hint at a more empathetic vision for data science that we should all learn from.
View Article and Find Full Text PDFIn data science, there's long been an acknowledgment of the way data can flatten and dehumanize the people they represent. This limitation becomes most obvious when considering the pure inability of such numbers and figures to truly capture the reality of lives lost in this pandemic.
View Article and Find Full Text PDF