Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041319 | PMC |
Dig Dis Sci
January 2025
Provincial-Level Key Laboratory for Molecular Medicine of Major Diseases and The Prevention and Treatment With Traditional Chinese Medicine Research in Gansu Colleges and University, Gansu University of Chinese Medicine, Lanzhou, China.
Background And Aims: Alcoholic liver disease (ALD) is the leading cause of death among alcohol-related diseases, yet its pathogenesis remains incompletely understood. This article employs data mining methods to conduct an indepth study of articles on ALD published in the past three decades, aiming to elucidate the pathogenesis of ALD.
Methods: Firstly, articles related to the pathogenesis of ALD were retrieved from the Web of Science (WOS) database.
Sci Rep
January 2025
EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, 11586, Riyadh, Saudi Arabia.
During the Covid-19 pandemic, the widespread use of social media platforms has facilitated the dissemination of information, fake news, and propaganda, serving as a vital source of self-reported symptoms related to Covid-19. Existing graph-based models, such as Graph Neural Networks (GNNs), have achieved notable success in Natural Language Processing (NLP). However, utilizing GNN-based models for propaganda detection remains challenging because of the challenges related to mining distinct word interactions and storing nonconsecutive and broad contextual data.
View Article and Find Full Text PDFFood Sci Nutr
January 2025
Department of Chemistry, Thomas J. R. Faulkner College of Science and Technology University of Liberia Monrovia Montserrado County Liberia.
Citronellol (CT) is a naturally occurring lipophilic monoterpenoid which has shown anticancer effects in numerous cancerous cell lines. This study was, therefore, designed to examine CT's potential as an anticancer agent against glioblastoma (GBM). Network pharmacology analysis was employed to identify potential anticancer targets of CT.
View Article and Find Full Text PDFTher Adv Drug Saf
January 2025
Department of Pharmacy, Daping Hospital, Army Medical University, No. 10 Changjiang Branch Road, Yuzhong District, Chongqing 400042, China.
Background: Gilteritinib and midostaurin are FLT3 inhibitors that have made significant progress in the treatment of acute myeloid leukemia. However, their real-world safety profile in a large sample population is incomplete.
Objectives: We aimed to provide a pharmacovigilance study of the adverse events (AEs) associated with gilteritinib and midostaurin through the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) database.
Heliyon
January 2025
School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
Dynamic functional connectivity (DFC) has shown promise in the diagnosis of Autism Spectrum Disorder (ASD). However, extracting highly discriminative information from the complex DFC matrix remains a challenging task. In this paper, we propose an ASD classification framework PSA-FCN which is based on time-aligned DFC and Prob-Sparse Self-Attention to address this problem.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!