Extraction of semantic biomedical relations from text using conditional random fields.

BMC Bioinformatics

Siemens AG, Corporate Technology, Information and Communications, Otto-Hahn-Ring 6, 81739 Munich, Germany.

Published: April 2008

Background: The increasing amount of published literature in biomedicine represents an immense source of knowledge, which can only efficiently be accessed by a new generation of automated information extraction tools. Named entity recognition of well-defined objects, such as genes or proteins, has achieved a sufficient level of maturity such that it can form the basis for the next step: the extraction of relations that exist between the recognized entities. Whereas most early work focused on the mere detection of relations, the classification of the type of relation is also of great importance and this is the focus of this work. In this paper we describe an approach that extracts both the existence of a relation and its type. Our work is based on Conditional Random Fields, which have been applied with much success to the task of named entity recognition.

Results: We benchmark our approach on two different tasks. The first task is the identification of semantic relations between diseases and treatments. The available data set consists of manually annotated PubMed abstracts. The second task is the identification of relations between genes and diseases from a set of concise phrases, so-called GeneRIF (Gene Reference Into Function) phrases. In our experimental setting, we do not assume that the entities are given, as is often the case in previous relation extraction work. Rather the extraction of the entities is solved as a subproblem. Compared with other state-of-the-art approaches, we achieve very competitive results on both data sets. To demonstrate the scalability of our solution, we apply our approach to the complete human GeneRIF database. The resulting gene-disease network contains 34758 semantic associations between 4939 genes and 1745 diseases. The gene-disease network is publicly available as a machine-readable RDF graph.

Conclusion: We extend the framework of Conditional Random Fields towards the annotation of semantic relations from text and apply it to the biomedical domain. Our approach is based on a rich set of textual features and achieves a performance that is competitive to leading approaches. The model is quite general and can be extended to handle arbitrary biological entities and relation types. The resulting gene-disease network shows that the GeneRIF database provides a rich knowledge source for text mining. Current work is focused on improving the accuracy of detection of entities as well as entity boundaries, which will also greatly improve the relation extraction performance.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386138PMC
http://dx.doi.org/10.1186/1471-2105-9-207DOI Listing

Publication Analysis

Top Keywords

conditional random
12
random fields
12
gene-disease network
12
relations text
8
named entity
8
work focused
8
task identification
8
semantic relations
8
relation extraction
8
generif database
8

Similar Publications

Irrigating fresh produce with contaminated water contributes to the burden of foodborne illness. Identifying fecal contamination of irrigation waters and characterizing fecal sources and associated environmental factors can help inform fresh produce safety and health hazard management. Using two previously collected data sets, we developed and evaluated the performance of logistic regression and conditional random forest models for predicting general and human-specific fecal contamination of ponds in southwest Georgia used for fresh produce irrigation.

View Article and Find Full Text PDF

Background: In developing countries, due to improper management of domestic animals' exposures, under-five (U5) children have been affected by diarrhoea. However, there is no evidence that shows the presence of diarrhoea-causing pathogens in the faeces of U5 children and animals residing in the same houses in the Sidama region, Ethiopia.

Methods: A laboratory-based matched case-control study was conducted on children aged 6-48 months in the Sidama region of Ethiopia from February to June 2023.

View Article and Find Full Text PDF

Objectives: Heterogeneity of treatment effect (HTE) is a concern in substance use disorder (SUD) treatments but has not been rigorously examined. This exploratory study applied a causal forest approach to examine HTE in psychosocial SUD treatments, considering multiple covariates simultaneously.

Methods: Data from 12 randomized controlled trials of nine psychosocial treatments were obtained from the National Institute on Drug Abuse Clinical Trials Network.

View Article and Find Full Text PDF

Developing a Sleep Algxorithm to Support a Digital Medicine System: Noninterventional, Observational Sleep Study.

JMIR Ment Health

December 2024

Otsuka Pharmaceutical Development & Commercialization, Inc, 508 Carnegie Center Drive, Princeton, NJ, 08540, United States, 1 609 535 9035.

Background: Sleep-wake patterns are important behavioral biomarkers for patients with serious mental illness (SMI), providing insight into their well-being. The gold standard for monitoring sleep is polysomnography (PSG), which requires a sleep lab facility; however, advances in wearable sensor technology allow for real-world sleep-wake monitoring.

Objective: The goal of this study was to develop a PSG-validated sleep algorithm using accelerometer (ACC) and electrocardiogram (ECG) data from a wearable patch to accurately quantify sleep in a real-world setting.

View Article and Find Full Text PDF

Machine learning helps reveal key factors affecting tire wear particulate matter emissions.

Environ Int

December 2024

Tianjin Key Laboratory of Urban Transport Emission Research, College of Environmental Science and Engineering, Nankai University, 1st Floor, Nankai University Press, No.94 weijin Road, Nankai District, Tianjin 300071, China. Electronic address:

Tire wear particles (TWPs) are generated with every rotation of the tire. However, obtaining TWPs under real driving conditions and revealing key factors affecting TWPs are challenging. In this study, we obtained a TWPs dataset by simulating tire wear process under real driving conditions using a tire wear simulator and custom-designed test conditions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!