Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

J Forensic Leg Med

Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia; Department of Community Medicine, Shaheed Mohtarma Benazir Bhutto Medical University, Larkana, Pakistan.

Published: July 2018

Objectives: Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction.

Methods: For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall.

Results: From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier.

Conclusion: Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jflm.2017.07.001DOI Listing

Publication Analysis

Top Keywords

text classification
24
classification techniques
16
autopsy reports
12
term frequency
12
forensic autopsy
8
classification
8
predict death
8
feature
8
schemes feature
8
feature extraction
8

Similar Publications

Identifying and monitoring adverse effects (AEs) are integral to ensuring patient safety in clinical trials. Research sponsors and regulatory bodies have put into place a variety of policies and procedures to guide researchers in protecting patient safety during clinical trials. However, it remains unclear how these policies and procedures should be adapted for trials in implementation science.

View Article and Find Full Text PDF

Background: Amplicon sequencing of kingdom-specific tags such as 16S rRNA gene for bacteria and internal transcribed spacer (ITS) region for fungi are widely used for investigating microbial communities. So far most human studies have focused on bacteria while studies on host-associated fungi in health and disease have only recently started to accumulate. To enable cost-effective parallel analysis of bacterial and fungal communities in human and environmental samples, we developed a method where 16S rRNA gene and ITS1 amplicons were pooled together for a single Illumina MiSeq or HiSeq run and analysed after primer-based segregation.

View Article and Find Full Text PDF

Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens.

View Article and Find Full Text PDF

Physical function measures in ICU survivors, where to now? A scoping review.

South Afr J Crit Care

July 2024

Division of Physiotherapy, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.

Background: Growing evidence is describing the long-term morbidity experienced by critical illness survivors, a major contributing factor being impaired physical function. Consensus is yet to be reached on which physical function measures should be included in this population. This review aimed to describe physical functioning measurement instruments used in longitudinal studies of critical illness survivors, based on the International Classification of Function (ICF).

View Article and Find Full Text PDF

Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques.

Materials And Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!