Dataset Reuse: Toward Translating Principles to Practice.

Laura Koesten Pavlos Vougiouklis Elena Simperl Paul Groth

Patterns (N Y)

University of Amsterdam, Amsterdam 1090 GH, the Netherlands.

Published: November 2020

The web provides access to millions of datasets that can have additional impact when used beyond their original context. We have little empirical insight into what makes a dataset more reusable than others and which of the existing guidelines and frameworks, if any, make a difference. In this paper, we explore potential reuse features through a literature review and present a case study on datasets on GitHub, a popular open platform for sharing code and data. We describe a corpus of more than 1.4 million data files, from over 65,000 repositories. Using GitHub's engagement metrics as proxies for dataset reuse, we relate them to reuse features from the literature and devise an initial model, using deep neural networks, to predict a dataset's reusability. This demonstrates the practical gap between principles and actionable insights that allow data publishers and tools designers to implement functionalities that provably facilitate reuse.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691392	PMC
http://dx.doi.org/10.1016/j.patter.2020.100136	DOI Listing

Publication Analysis

Top Keywords

dataset reuse

reuse features

features literature

reuse translating

translating principles

principles practice

practice web

web access

access millions

millions datasets

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!