A topic modeling approach for analyzing and categorizing electronic healthcare documents in Afaan Oromo without label information.

Etana Fikadu Dinsa Mrinal Das Teklu Urgessa Abebe

Sci Rep

Department of CSE, Adama Science and Technology University, Oromia, Ethiopia.

Published: December 2024

Afaan Oromo is a resource-scarce language with limited tools developed for its processing, posing significant challenges for natural language tasks. The tools designed for English do not work efficiently for Afaan Oromo due to the linguistic differences and lack of well-structured resources. To address this challenge, this work proposes a topic modeling framework for unstructured health-related documents in Afaan Oromo using latent dirichlet allocation (LDA) algorithms. All collected documents lack label information, which poses significant challenges for categorizing the documents and applying the supervised learning methods. So, we utilize the LDA model since it offers solutions to this problem by allowing discovery of the latent topics of the documents without requiring the predefined labels. The model takes a word dictionary to extract hidden topics by evaluating word patterns and distributions across the dataset. Then it extracts the most relevant document topics and generates weight values for each word in the documents per topic. Next, we classify the topics using the represented keyword as input and assign class labels based on human evaluations topic coherence. This model could be applied to classifying medical documents and used to find specialists who best suitable for patients' requests from the obtained information. As a conclusion of our findings, the topic modeling using LDA gave the promised value of 79.17% accuracy and 79.66% F1 score for test documents of the dataset.

Download full-text PDF	Source
http://dx.doi.org/10.1038/s41598-024-83743-3	DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11686009	PMC

Publication Analysis

Top Keywords

afaan oromo

topic modeling

documents

documents afaan

topic

modeling approach

approach analyzing

analyzing categorizing

categorizing electronic

electronic healthcare

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered