Feature selection techniques for maximum entropy based biomedical named entity recognition.

J Biomed Inform

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal 721 302, India.

Published: October 2009

Named entity recognition is an extremely important and fundamental task of biomedical text mining. Biomedical named entities include mentions of proteins, genes, DNA, RNA, etc which often have complex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and SVM have been widely used for learning to recognize such entities from an annotated corpus. The identification of appropriate feature templates and the selection of the important feature values play a very important role in the success of these methods. In this paper, we provide a study on word clustering and selection based feature reduction approaches for named entity recognition using a maximum entropy classifier. The identification and selection of features are largely done automatically without using domain knowledge. The performance of the system is found to be superior to existing systems which do not use domain knowledge.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2008.12.012DOI Listing

Publication Analysis

Top Keywords

named entity
12
entity recognition
12
maximum entropy
8
biomedical named
8
domain knowledge
8
feature
4
feature selection
4
selection techniques
4
techniques maximum
4
entropy based
4

Similar Publications

Calcium requirements in growing Japanese quail from 21 to 35 days post-hatch.

Poult Sci

December 2024

Department of Animal Sciences, Faculty of Agriculture, University of Zabol, Sistan, 98661-5538, Iran. Electronic address:

An experiment was conducted to estimate the optimal calcium (Ca) requirement for growth performance and bone health in quail from 21 to 35 days posthatch. Five dietary treatments containing 0.45, 0.

View Article and Find Full Text PDF

Alginate/gelatin blend fibers for functional high-performance air filtration applications.

Int J Biol Macromol

December 2024

Department of Textile Engineering, Istanbul Technical University, Istanbul, Turkey. Electronic address:

Currently, the primary composition of fibrous filter materials predominantly relies on synthetic polymers derived from petroleum. The utilization of these polymers, as well as their production process, has a negative impact on the environment. Consequently, the adoption of air filter media fabricated from natural fibers would yield significant environmental benefits.

View Article and Find Full Text PDF

Due to the large size and lack of fine-grained annotation, Whole Slide Images (WSIs) analysis is commonly approached as a Multiple Instance Learning (MIL) problem. However, previous studies only learn from training data, posing a stark contrast to how human clinicians teach each other and reason about histopathologic entities and factors. Here, we present a novel knowledge concept-based MIL framework, named ConcepPath, to fill this gap.

View Article and Find Full Text PDF

Phylogeography, taxonomy, and conservation of the endangered brown howler monkey, (Primates, Atelidae), of the Atlantic Forest.

Front Genet

December 2024

Programa de Pós-Graduação em Ecologia e Evolução da Biodiversidade, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil.

The brown howler, , endemic to the Atlantic Forest of Brazil and Argentina, is threatened by habitat loss and fragmentation, hunting, and its susceptibility to yellow fever. Two subspecies have been recognized, but their names, validity, and geographic ranges have been controversial. We obtained samples covering the species' entire distribution in Brazil and Argentina to clarify these issues by investigating their genetic diversity and structure and assessing their evolutionary history.

View Article and Find Full Text PDF

The expressway green channel is an essential transportation policy for moving fresh agricultural products in China. In order to extract knowledge from various records, this study presents a cutting-edge approach to extract information from textual records of failure cases in the vertical field of expressway green channel. We proposed a hybrid approach based on BIO labeling, pre-trained model, deep learning and CRF to build a named entity recognition (NER) model with the optimal prediction performance.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!