Publications by authors named "Andrea Thomer"

Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation histories, and reuse contexts in 94,755 publications that cover 59 years from 1963 to 2022.

View Article and Find Full Text PDF

The purpose of this study was to evaluate, revise, and extend the Informed Consent Ontology (ICO) for expressing clinical permissions, including reuse of residual clinical biospecimens and health data. This study followed a formative evaluation design and used a bottom-up modeling approach. Data were collected from the literature on US federal regulations and a study of clinical consent forms.

View Article and Find Full Text PDF

Nurse scientists are increasingly interested in conducting secondary research using real world collections of biospecimens and health data. The purposes of this scoping review are to (a) identify federal regulations and norms that bear authority or give guidance over reuse of residual clinical biospecimens and health data, (b) summarize domain experts' interpretations of permissions of such reuse, and (c) summarize key issues for interpreting regulations and norms. Final analysis included 25 manuscripts and 23 regulations and norms.

View Article and Find Full Text PDF

Background: The lack of machine-interpretable representations of consent permissions precludes development of tools that act upon permissions across information ecosystems, at scale.

Objectives: To report the process, results, and lessons learned while annotating permissions in clinical consent forms.

Methods: We conducted a retrospective analysis of clinical consent forms.

View Article and Find Full Text PDF

Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e.

View Article and Find Full Text PDF

The widespread use of social media has created a valuable but underused source of data for the environmental sciences. We demonstrate the potential for images posted to the website Twitter to capture variability in vegetation phenology across United States National Parks. We process a subset of images posted to Twitter within eight U.

View Article and Find Full Text PDF

Site-Based Data Curation (SBDC) is an approach to managing research data that prioritizes sharing and reuse of data collected at scientifically significant sites. The SBDC framework is based on geobiology research at natural hot spring sites in Yellowstone National Park as an exemplar case of high value field data in contemporary, cross-disciplinary earth systems science. Through stakeholder analysis and investigation of data artifacts, we determined that meaningful and valid reuse of digital hot spring data requires systematic documentation of sampling processes and particular contextual information about the site of data collection.

View Article and Find Full Text PDF

The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions.

View Article and Find Full Text PDF

Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent.

View Article and Find Full Text PDF

Legacy data from natural history collections contain invaluable and irreplaceable information about biodiversity in the recent past, providing a baseline for detecting change and forecasting the future of biodiversity on a human-dominated planet. However, these data are often not available in formats that facilitate use and synthesis. New approaches are needed to enhance the rates of digitization and data quality improvement.

View Article and Find Full Text PDF