Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6866600PMC
http://dx.doi.org/10.1093/database/baz059DOI Listing

Publication Analysis

Top Keywords

metadata
15
authoring metadata
12
association rule
8
rule mining
8
public repositories
8
metadata acquisition
8
acquisition process
8
recommendations authoring
8
recommendations
5
mining ontologies
4

Similar Publications

Neurochemical Databases: Purpose and Expectations.

ACS Chem Neurosci

December 2024

University of Bordeaux, CNRS, Institut des Neurosciences Intégratives et Cognitives d'Aquitaine INCIA CNRS UMR5287, F-33000 Bordeaux, France.

The exploration of increasingly specific brain structures and their relationships, in more nuanced ways, has facilitated the generation of databases for gene expression, connectivity, cell morphology, and electrophysiology. However, neurochemistry, the study of neurochemical environment and transmission, has not yet warranted a public database, despite the plethora of data published. From our viewpoint, a neurochemical database is overdue and would allow the field of neurochemistry to develop facilitating, standardization and reference values, reproducibility, resource efficiency, preservation and accessibility of raw data, hypothesis development and exploration, and metadata analysis.

View Article and Find Full Text PDF

This paper presents the Cadenza Woodwind Dataset. This publicly available data is synthesised audio for woodwind quartets including renderings of each instrument in isolation. The data was created to be used as training data within Cadenza's second open machine learning challenge (CAD2) for the task on rebalancing classical music ensembles.

View Article and Find Full Text PDF

Motivation: Microbial signatures in the human microbiome are closely associated with various human diseases, driving the development of machine learning models for microbiome-based disease prediction. Despite progress, challenges remain in enhancing prediction accuracy, generalizability, and interpretability. Confounding factors, such as host's gender, age, and body mass index, significantly influence the human microbiome, complicating microbiome-based predictions.

View Article and Find Full Text PDF

Motivation: We are witnessing an enormous growth in the amount of molecular profiling (-omics) data. The integration of multi-omics data is challenging. Moreover, human multi-omics data may be privacy-sensitive and can be misused to de-anonymize and (re-)identify individuals.

View Article and Find Full Text PDF

Background: Amplicon sequencing of kingdom-specific tags such as 16S rRNA gene for bacteria and internal transcribed spacer (ITS) region for fungi are widely used for investigating microbial communities. So far most human studies have focused on bacteria while studies on host-associated fungi in health and disease have only recently started to accumulate. To enable cost-effective parallel analysis of bacterial and fungal communities in human and environmental samples, we developed a method where 16S rRNA gene and ITS1 amplicons were pooled together for a single Illumina MiSeq or HiSeq run and analysed after primer-based segregation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!