Evaluating the effect of unbalanced data in biomedical document classification.

J Integr Bioinform

ESEI, Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain.

Published: September 2011

Nowadays, document classification has become an interesting research field. Partly, this is due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents. Related with this domain, imbalanced data is a well-known problem in many practical applications of knowledge discovery and its effects on the performance of standard classifiers are remarkable. In this paper, we investigate the application of a Bayesian Network (BN) model for the triage of documents, which are represented by the association of different MeSH terms. Our results show that BNs are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels. Moreover, we perform an extensive experimental evaluation to investigate if the classification of Medline documents using a BN classifier poses additional challenges when dealing with class-imbalanced prediction. The evaluation involves two methods, under-sampling and cost-sensitive learning. We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.

Download full-text PDF

Source
http://dx.doi.org/10.2390/biecoll-jib-2011-177DOI Listing

Publication Analysis

Top Keywords

document classification
8
mesh terms
8
medline documents
8
evaluating unbalanced
4
unbalanced data
4
data biomedical
4
biomedical document
4
classification
4
classification nowadays
4
nowadays document
4

Similar Publications

The Characterization, Biological Activities, and Potential Applications of the Antimicrobial Peptides Derived from Bacillus spp.: A Comprehensive Review.

Probiotics Antimicrob Proteins

December 2024

Food Nutrition and Health Research Center, School of Advanced Manufacturing, Fuzhou University, Jinjiang, 362200, Fujian, China.

This paper provides a comprehensive review of antimicrobial peptides (AMPs) derived from Bacillus spp. The classification and structure of Bacillus-derived AMPs encompass a diverse range. There are 89 documented Bacillus-derived AMPs, which exhibit varied sources, amino acid sequences, and molecular structures.

View Article and Find Full Text PDF

Characterizing tick diversity among caprine hosts of Kerala, India: a phylogenetic study.

Mol Biol Rep

December 2024

Teaching Veterinary Clinical Complex, College of Veterinary and Animal Sciences (CVAS), Kerala Veterinary and Animal Sciences University (KVASU), Mannuthy, Thrissur, Kerala, 680651, India.

Background: Ticks are prominent vectors of numerous pathogens that adversely affect human and animal health. Monitoring tick population dynamics is key in developing ideal tick-borne disease surveillance systems and critical vector control programmes. This study aimed to conduct the morphological and molecular characterization of ticks infesting domesticated goats in Kerala, India.

View Article and Find Full Text PDF

Afaan Oromo is a resource-scarce language with limited tools developed for its processing, posing significant challenges for natural language tasks. The tools designed for English do not work efficiently for Afaan Oromo due to the linguistic differences and lack of well-structured resources. To address this challenge, this work proposes a topic modeling framework for unstructured health-related documents in Afaan Oromo using latent dirichlet allocation (LDA) algorithms.

View Article and Find Full Text PDF

Purposes: The objective of this study was to investigate intra-articular distal radius fractures, aiming to provide a comprehensive analysis of fracture patterns and discuss the corresponding treatment strategies for each pattern.

Methods: 294 cases of intra-articular distal radius fractures lines were collected and clustered thorough K-means and hierarchical clustering algorithm. The demographic data of patients and the clinical treatment outcomes were recorded.

View Article and Find Full Text PDF

Background: The purpose of this study was to report the clinical and psychological outcomes of using a locking compression plate (LCP) as a sequential external fixator following the distraction phase in the treatment of tibial bone defects caused by fracture-related infection (FRI).

Methods: We retrospectively analyzed the clinical records and consecutive X-ray images of patients with tibial bone defects who were treated with an LCP as a sequential external fixator following the distraction phase, between June 2017 and December 2022. The ASAMI criteria were applied to assess the bone and functional outcomes, and postoperative complications were evaluated by using the Paley classification.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!