Exploiting and integrating rich features for biological literature classification.

BMC Bioinformatics

State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

Published: April 2008

Background: Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. How to effectively select and integrate different types of features to improve the biological literature classification performance is the major issue studied in this paper.

Results: To efficiently classify the biological literatures, we propose a novel feature value schema TF*ML, features covering from lower level domain independent "string feature" to higher level domain dependent "semantic template feature", and proper integrations among the features. Compared to our previous approaches, the performance is improved in terms of AUC and F-Score by 11.5% and 8.8% respectively, and outperforms the best performance achieved in BioCreAtIvE 2006.

Conclusions: Different types of features possess different discriminative capabilities in literature classification; proper integration of domain independent and dependent features would significantly improve the performance and overcome the over-fitting on data distribution.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2349297PMC
http://dx.doi.org/10.1186/1471-2105-9-S3-S4DOI Listing

Publication Analysis

Top Keywords

literature classification
12
features improve
12
features
9
biological literature
8
domain dependent
8
dependent features
8
classification performance
8
types features
8
level domain
8
domain independent
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!