Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at which new information is being published makes manual curation increasingly unable to cope. Devising new curation strategies that leverage upon data mining and text analysis is, therefore, a promising avenue to help life science databases to cope with the deluge of novel information. In this article, we describe the integration of text mining technologies in the curation pipeline of the RegulonDB database, and discuss how the process can enhance the productivity of the curators. Specifically, a named entity recognition approach is used to pre-annotate terms referring to a set of domain entities which are potentially relevant for the curation process. The annotated documents are presented to the curator, who, thanks to a custom-designed interface, can select sentences containing specific types of entities, thus restricting the amount of text that needs to be inspected. Additionally, a module capable of computing semantic similarity between sentences across the entire collection of articles to be curated is being integrated in the system. We tested the module using three sets of scientific articles and six domain experts. All these improvements are gradually enabling us to obtain a high throughput curation process with the same quality as manual curation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467564PMC
http://dx.doi.org/10.1093/database/bax012DOI Listing

Publication Analysis

Top Keywords

manual curation
8
curation process
8
curation
7
strategies digital
4
digital semi-automated
4
semi-automated curation
4
curation regulondb
4
regulondb experimentally
4
experimentally generated
4
generated biological
4

Similar Publications

Automated stenosis estimation of coronary angiographies using end-to-end learning.

Int J Cardiovasc Imaging

January 2025

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

The initial evaluation of stenosis during coronary angiography is typically performed by visual assessment. Visual assessment has limited accuracy compared to fractional flow reserve and quantitative coronary angiography, which are more time-consuming and costly. Applying deep learning might yield a faster and more accurate stenosis assessment.

View Article and Find Full Text PDF

Post-translational modifications (PTMs) play pivotal roles in regulating cellular signaling, fine-tuning protein function, and orchestrating complex biological processes. Despite their importance, the lack of comprehensive tools for studying PTMs from a pathway-centric perspective has limited our ability to understand how PTMs modulate cellular pathways on a molecular level. Here, we present PTMNavigator, a tool integrated into the ProteomicsDB platform that offers an interactive interface for researchers to overlay experimental PTM data with pathway diagrams.

View Article and Find Full Text PDF

Objectives: The National Library of Medicine (NLM) currently indexes close to a million articles each year pertaining to more than 5300 medicine and life sciences journals. Of these, a significant number of articles contain critical information about the structure, genetics, and function of genes and proteins in normal and disease states. These articles are identified by the NLM curators, and a manual link is created between these articles and the corresponding gene records at the NCBI Gene database.

View Article and Find Full Text PDF

Introduction: Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental condition diagnosed clinically based on phenotypic characteristics and criteria such as the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Due to its significant social, emotional, and psychological impacts, early identification and diagnosis are crucial for starting early intervention and improving outcomes. A screening tool is imperative in identifying young children at risk so timely intervention can be instituted.

View Article and Find Full Text PDF

The anti-MRSA resource: a comprehensive archive of anti-MRSA peptides and essential oils.

J Biomol Struct Dyn

January 2025

Biochemistry and Bioinformatics Laboratory, Department of Applied Sciences, Indian Institute of Information Technology Allahabad (IIIT-A), Prayagraj, Uttar Pradesh, India.

Methicillin-resistant (MRSA), a major cause of fatalities due to Antimicrobial Resistance (AMR), can act as an opportunistic pathogen despite being part of the normal human flora. MRSA infections, such as skin infections, pneumonia, sepsis, and surgical site infections, have risen significantly, with bloodstream infection cases increasing from 21% in 2016 to 35% in 2020. This surge has prompted research into alternative treatments like nanomaterials, photodynamic therapy, antimicrobial peptides (AMPs), and essential oils (EOs).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!