With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (, Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)). These approaches are based on the combination of semantically related columns by taking advantage of semantic information extracted from ontological dictionaries. Their main advantage is that they not only eliminate features but can also combine them, minimizing (low-loss) or avoiding (lossless) the loss of information. The most recent (and accurate) techniques included in this group are based on using evolutionary algorithms to find how many features can be grouped to reduce false positive (FP) and false negative (FN) errors obtained. The main limitation of these evolutionary-based schemes is the computational requirements derived from the use of optimization algorithms. The contribution of this study is a new lossless feature reduction scheme exploiting information from ontological dictionaries, which achieves slightly better accuracy (specially in FP errors) than optimization-based approaches but using far fewer computational resources. Instead of using computationally expensive evolutionary algorithms, our proposal determines whether two columns (synsets) can be combined by observing whether the instances included in a dataset (, training dataset) containing these synsets are mostly of the same class. The study includes experiments using three datasets and a detailed comparison with two previous optimization-based approaches.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323001PMC
http://dx.doi.org/10.7717/peerj-cs.2206DOI Listing

Publication Analysis

Top Keywords

ontological dictionaries
16
feature reduction
8
evolutionary algorithms
8
optimization-based approaches
8
enhanced algorithm
4
algorithm semantic-based
4
semantic-based feature
4
reduction spam
4
spam filtering
4
filtering advent
4

Similar Publications

With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (, Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)).

View Article and Find Full Text PDF

An ontology network for Diabetes Mellitus in Mexico.

J Biomed Semantics

October 2021

Universidad de la Republica, Julio Herrera y Reissig 565, Montevideo, Uruguay.

Background: Medical experts in the domain of Diabetes Mellitus (DM) acquire specific knowledge from diabetic patients through monitoring and interaction. This allows them to know the disease and information about other conditions or comorbidities, treatments, and typical consequences of the Mexican population. This indicates that an expert in a domain knows technical information about the domain and contextual factors that interact with it in the real world, contributing to new knowledge generation.

View Article and Find Full Text PDF

Background: The Kentucky Cancer Registry (KCR) is a central cancer registry for the state of Kentucky that receives data about incident cancer cases from all healthcare facilities in the state within 6 months of diagnosis. Similar to all other U.S.

View Article and Find Full Text PDF

Background: Recently, exchanging data and information has become a significant challenge in medicine. Such data include abnormal states. Establishing a unified representation framework of abnormal states can be a difficult task because of the diverse and heterogeneous nature of these states.

View Article and Find Full Text PDF
Article Synopsis
  • The Crop Ontology (CO) is a system developed by the Generation Challenge Program to support integrated crop breeding by providing standardized trait names, making it easier for breeders to access and share genotypic and phenotypic data.
  • CO includes detailed descriptions of measurement methods and scales, along with images, to enhance data discovery and is integrated with the Integrated Breeding (IB) fieldbooks for efficient annotation of field data.
  • It features online tools for continuous maintenance and allows for cross-referencing with other databases like Plant Ontology (PO) and Trait Ontology (TO), facilitating access to related genetic and climatic data.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!