A graph-search framework for associating gene identifiers with documents.

BMC Bioinformatics

Department of Machine Learning, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.

Published: October 2006

Background: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method.

Results: We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach.

Conclusion: The utility of a named entity recognition (NER) system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1617121PMC
http://dx.doi.org/10.1186/1471-2105-7-440DOI Listing

Publication Analysis

Top Keywords

ner systems
20
named entity
12
entity recognition
12
recognition ner
12
gene identifiers
8
systems
8
geneid-ranking systems
8
ner
6
performance
5
graph-search framework
4

Similar Publications

Dual-double stem cell therapy, which integrates mesenchymal stem cells (MSCs) and hematopoietic stem cells (HSCs), represents a cutting-edge approach in regenerative medicine, particularly for conditions such as ovarian decline, premature ovarian insufficiency (POI), and induced ovarian failure. This therapy leverages the unique properties of MSCs and HSCs, enhancing tissue repair, immune modulation, and overall regenerative outcomes. MSCs, known for their ability to differentiate into various cell types, provide a supportive microenvironment and secrete bioactive molecules that promote angiogenesis and reduce inflammation.

View Article and Find Full Text PDF

Background: Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish.

View Article and Find Full Text PDF

Rationale And Objectives: Accurate assessment of hip morphology is crucial for the diagnosis and management of hip pathologies. Traditional manual measurements are prone to mistakes and inter- and intra-reader variability. Artificial intelligence (AI) could mitigate such issues by providing accurate and reproducible measurements.

View Article and Find Full Text PDF

Up to the mountains and down to the wetlands: Thirty years' migration of cropland in China since 1990.

J Environ Manage

January 2025

Department of Urban and Rural Planning, School of Landscape Architecture, Beijing Forestry University, Beijing, 100083, China. Electronic address:

Cropland changes are crucial aspects of land-use/land-cover changes (LUCC), which profoundly influence agricultural sustainability and terrestrial ecosystem health. In the context of dynamic shifts within the natural environment, coupled with the evolution of agricultural practices and the transformation of agrarian systems and policies, the trajectory of farmland alteration has exhibited significant divergence across various nations and regions. This article delves into the intriguing phenomenon of China's cropland migrating up to mountains and down to wetlands and analyses its spatiotemporal pattern evolution from 1990 to 2020.

View Article and Find Full Text PDF

The expressway green channel is an essential transportation policy for moving fresh agricultural products in China. In order to extract knowledge from various records, this study presents a cutting-edge approach to extract information from textual records of failure cases in the vertical field of expressway green channel. We proposed a hybrid approach based on BIO labeling, pre-trained model, deep learning and CRF to build a named entity recognition (NER) model with the optimal prediction performance.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!