Objectives: Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction.
Methods: For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall.
Results: From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier.
Conclusion: Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jflm.2017.07.001 | DOI Listing |
Ann Epidemiol
December 2024
Department of Internal Medicine, University of Botswana, Gaborone, Botswana.
Identifying and monitoring adverse effects (AEs) are integral to ensuring patient safety in clinical trials. Research sponsors and regulatory bodies have put into place a variety of policies and procedures to guide researchers in protecting patient safety during clinical trials. However, it remains unclear how these policies and procedures should be adapted for trials in implementation science.
View Article and Find Full Text PDFMicrobiome
December 2024
Faculty of Medicine, Human Microbiome Research Program, University of Helsinki, Helsinki, Finland.
Background: Amplicon sequencing of kingdom-specific tags such as 16S rRNA gene for bacteria and internal transcribed spacer (ITS) region for fungi are widely used for investigating microbial communities. So far most human studies have focused on bacteria while studies on host-associated fungi in health and disease have only recently started to accumulate. To enable cost-effective parallel analysis of bacterial and fungal communities in human and environmental samples, we developed a method where 16S rRNA gene and ITS1 amplicons were pooled together for a single Illumina MiSeq or HiSeq run and analysed after primer-based segregation.
View Article and Find Full Text PDFSci Rep
December 2024
University of Central Florida, Orlando, FL, 32816, USA.
Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens.
View Article and Find Full Text PDFSouth Afr J Crit Care
July 2024
Division of Physiotherapy, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.
Background: Growing evidence is describing the long-term morbidity experienced by critical illness survivors, a major contributing factor being impaired physical function. Consensus is yet to be reached on which physical function measures should be included in this population. This review aimed to describe physical functioning measurement instruments used in longitudinal studies of critical illness survivors, based on the International Classification of Function (ICF).
View Article and Find Full Text PDFJ Am Med Inform Assoc
December 2024
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, United States.
Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques.
Materials And Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!