Using ontology network structure in text mining.

AMIA Annu Symp Proc

Consortium for Health Informatics Research (CHIR), HSR&D/RR&D Center of Excellence: Maximizing Rehabilitation Outcomes, Tampa, FL.

Published: November 2010

Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041319PMC

Publication Analysis

Top Keywords

text mining
12
statistical text
8
ontology
4
ontology network
4
network structure
4
structure text
4
mining statistical
4
mining treats
4
treats documents
4
documents bags
4

Similar Publications

New Insights into the Pathogenesis of Alcoholic Liver Disease Based on Global Research.

Dig Dis Sci

January 2025

Provincial-Level Key Laboratory for Molecular Medicine of Major Diseases and The Prevention and Treatment With Traditional Chinese Medicine Research in Gansu Colleges and University, Gansu University of Chinese Medicine, Lanzhou, China.

Background And Aims: Alcoholic liver disease (ALD) is the leading cause of death among alcohol-related diseases, yet its pathogenesis remains incompletely understood. This article employs data mining methods to conduct an indepth study of articles on ALD published in the past three decades, aiming to elucidate the pathogenesis of ALD.

Methods: Firstly, articles related to the pathogenesis of ALD were retrieved from the Web of Science (WOS) database.

View Article and Find Full Text PDF

During the Covid-19 pandemic, the widespread use of social media platforms has facilitated the dissemination of information, fake news, and propaganda, serving as a vital source of self-reported symptoms related to Covid-19. Existing graph-based models, such as Graph Neural Networks (GNNs), have achieved notable success in Natural Language Processing (NLP). However, utilizing GNN-based models for propaganda detection remains challenging because of the challenges related to mining distinct word interactions and storing nonconsecutive and broad contextual data.

View Article and Find Full Text PDF

Citronellol (CT) is a naturally occurring lipophilic monoterpenoid which has shown anticancer effects in numerous cancerous cell lines. This study was, therefore, designed to examine CT's potential as an anticancer agent against glioblastoma (GBM). Network pharmacology analysis was employed to identify potential anticancer targets of CT.

View Article and Find Full Text PDF

Unveiling unexpected adverse events: post-marketing safety surveillance of gilteritinib and midostaurin from the FDA Adverse Event Reporting database.

Ther Adv Drug Saf

January 2025

Department of Pharmacy, Daping Hospital, Army Medical University, No. 10 Changjiang Branch Road, Yuzhong District, Chongqing 400042, China.

Background: Gilteritinib and midostaurin are FLT3 inhibitors that have made significant progress in the treatment of acute myeloid leukemia. However, their real-world safety profile in a large sample population is incomplete.

Objectives: We aimed to provide a pharmacovigilance study of the adverse events (AEs) associated with gilteritinib and midostaurin through the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) database.

View Article and Find Full Text PDF

Dynamic functional connectivity (DFC) has shown promise in the diagnosis of Autism Spectrum Disorder (ASD). However, extracting highly discriminative information from the complex DFC matrix remains a challenging task. In this paper, we propose an ASD classification framework PSA-FCN which is based on time-aligned DFC and Prob-Sparse Self-Attention to address this problem.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!