ChemEx: information extraction system for chemical data curation.

BMC Bioinformatics

Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand.

Published: May 2013

Background: Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together.

Results: We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests.

Conclusions: ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from http://www.biotec.or.th/isl/ChemEx.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521388PMC
http://dx.doi.org/10.1186/1471-2105-13-S17-S9DOI Listing

Publication Analysis

Top Keywords

chemical data
12
data curation
12
extraction system
8
structure image
8
image recognition
8
chemical
6
text
5
chemex
4
chemex extraction
4
system chemical
4

Similar Publications

Access to information about chemicals in products and articles is critical for supporting enforcement of chemical regulations, assessing risks from chemicals, allowing informed consumer choices, and enabling product circularity. In this work, we identified and evaluated available databases (DBs) on chemicals in products and articles from the literature using a defined protocol and from European national market surveillance authorities, nongovernmental agencies, and industrial sector groups using questionnaires. This is the first comprehensive review of DBs that provide information about chemicals in products and articles.

View Article and Find Full Text PDF

Introduction: Prostate cancer (PCa) is the commonest urologic cancer worldwide and the leading cause of male cancer deaths in Nigeria. In Nigeria, orchidectomy remains the primary androgen deprivation therapy. Dihydrotestosterone (DHT) is the active prostatic androgen, but its relationship with PCa severity has not been extensively studied in Africa.

View Article and Find Full Text PDF

Introduction: The present study aimed to explore the epidemiologic threats and factors associated with the coronavirus disease 2019 (COVID-19)-associated mucormycosis (CAM) epidemic that emerged in Egypt during the second COVID-19 wave. The study also aimed to explore the diagnostic features and the role of surgical interventions of CAM on the outcome of the disease in a central referral hospital.

Methodology: The study included 64 CAM patients from a referral hospital for CAM and a similar number of matched controls from COVID-19 patients who did not develop CAM.

View Article and Find Full Text PDF

The homeotic transformation of stamens into pistil-like structures (pistillody) causes cytoplasmic male sterility (CMS). This phenomenon is widely present in plants, and might be induced by intracellular communication (mitochondrial retrograde signaling), but its systemic regulating mechanism is still unclear. In this study, morphological observation showed that the stamens transformed into pistil-like structures, leading to flat and dehiscent pistils, and fruit set decrease in sua-CMS (MS K326, somatic fusion between Nicotiana.

View Article and Find Full Text PDF

Epidemiological evidence has shown that the regular ingestion of vegetables and fruits is associated with reduced risk of developing chronic diseases. The introduction of the 3Rs (replacement, reduction, and refinement) principle into animal experiments has led to the use of valid, cost-effective, and efficient alternative and complementary invertebrate animal models which are simpler and lower in the phylogenetic hierarchy. Caenorhabditis elegans (C.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!