Deriving pathway maps from automated text analysis using a grammar-based approach.

J Bioinform Comput Biol

School of Humanities and Informatics, University of Skövde, Box 408, 541 28 Skövde, Sweden.

Published: April 2006

We demonstrate how automated text analysis can be used to support the large-scale analysis of metabolic and regulatory pathways by deriving pathway maps from textual descriptions found in the scientific literature. The main assumption is that correct syntactic analysis combined with domain-specific heuristics provides a good basis for relation extraction. Our method uses an algorithm that searches through the syntactic trees produced by a parser based on a Referent Grammar formalism, identifies relations mentioned in the sentence, and classifies them with respect to their semantic class and epistemic status (facts, counterfactuals, hypotheses). The semantic categories used in the classification are based on the relation set used in KEGG (Kyoto Encyclopedia of Genes and Genomes), so that pathway maps using KEGG notation can be automatically generated. We present the current version of the relation extraction algorithm and an evaluation based on a corpus of abstracts obtained from PubMed. The results indicate that the method is able to combine a reasonable coverage with high accuracy. We found that 61% of all sentences were parsed, and 97% of the parse trees were judged to be correct. The extraction algorithm was tested on a sample of 300 parse trees and was found to produce correct extractions in 90.5% of the cases.

Download full-text PDF

Source
http://dx.doi.org/10.1142/s0219720006002041DOI Listing

Publication Analysis

Top Keywords

pathway maps
12
deriving pathway
8
automated text
8
text analysis
8
relation extraction
8
extraction algorithm
8
parse trees
8
maps automated
4
analysis
4
analysis grammar-based
4

Similar Publications

Higher-order transient structures and the principle of dynamic connectivity in membrane signaling.

Proc Natl Acad Sci U S A

January 2025

Laboratory of Molecular Neurobiology and Biophysics, The Rockefeller University, New York, NY 10065.

We examine the role of higher-order transient structures (HOTS) in M2R regulation of GIRK channels. Electron microscopic membrane protein location maps show that both proteins form HOTS that exhibit a statistical bias to be near each other. Theoretical calculations and electrophysiological measurements suggest that channel activity is isolated near larger M2R HOTS.

View Article and Find Full Text PDF

The aim of this study is to screen key target genes of osteoarthritis associated with aging and to preliminarily explore the associated immune infiltration cells and potential drugs. Differentially expressed senescence-related genes (DESRGs) selected from Cellular senescence-related genes (SRGs) and differentially expressed genes (DEGs) were analyzed using Gene Ontology enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interaction networks. Hub genes in DESRGs were selected based on degree, and diagnostic genes were further screened by gene expression and receiver operating characteristic (ROC) curve.

View Article and Find Full Text PDF

Crohn's disease (CD) is a chronic inflammatory bowel condition, and colon adenocarcinoma (COAD), as one of the most prevalent malignant tumors of the digestive tract, has been indicated by research to have a close association with CD. This study employs bioinformatics techniques to uncover the potential molecular links between CD and COAD. In this study, two data series related to CD were identified from the Gene Expression Omnibus (GEO) database under specific criteria, and relevant COAD gene data were obtained from The Cancer Genome Atlas (TCGA).

View Article and Find Full Text PDF

A comprehensive benchmark study of methods for identifying significantly perturbed subnetworks in cancer.

Brief Bioinform

November 2024

Department of Microbiology and Immunology, University at Buffalo, The State University of New York, 955 Main Street, Buffalo, New York, NY 14203, United States.

Network-based methods utilize protein-protein interaction information to identify significantly perturbed subnetworks in cancer and to propose key molecular pathways. Numerous methods have been developed, but to date, a rigorous benchmark analysis to compare the performance of existing approaches is lacking. In this paper, we proposed a novel benchmarking framework using synthetic data and conducted a comprehensive analysis to investigate the ability of existing methods to detect target genes and subnetworks and to control false positives, and how they perform in the presence of topological biases at both gene and subnetwork levels.

View Article and Find Full Text PDF

Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is a common metabolism-related multisystem clinical disorder, often accompanied by a high comorbidity of mild cognitive impairment (MCI). Increasing evidence suggests that the amygdala is crucial in cognitive processing during metabolic dysfunction. Nevertheless, the role of the amygdala in the neural mechanisms of MASLD with MCI (MCI_MASLD) remains unclear.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!