Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

Ghulam Mujtaba Liyana Shuib Ram Gopal Raj Retnagowri Rajandram Khairunisa Shaikh

J Forensic Leg Med

Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia; Department of Community Medicine, Shaheed Mohtarma Benazir Bhutto Medical University, Larkana, Pakistan.

Published: July 2018

Objectives: Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction.

Methods: For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall.

Results: From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier.

Conclusion: Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.jflm.2017.07.001	DOI Listing

Publication Analysis

Top Keywords

text classification

classification techniques

autopsy reports

term frequency

forensic autopsy

classification

predict death

feature

schemes feature

feature extraction

Similar Publications

Conceptualizing patient-level adverse effects in implementation trials.

Ann Epidemiol

December 2024

Department of Internal Medicine, University of Botswana, Gaborone, Botswana.

Charles W Goss Lindsey M Filiatreau Lisa R Hirschhorn Mark D Huffman Aaloke Mody

Identifying and monitoring adverse effects (AEs) are integral to ensuring patient safety in clinical trials. Research sponsors and regulatory bodies have put into place a variety of policies and procedures to guide researchers in protecting patient safety during clinical trials. However, it remains unclear how these policies and procedures should be adapted for trials in implementation science.

View Article and Find Full Text PDF

Similar Publications

Metagenome-validated combined amplicon sequencing and text mining-based annotations for simultaneous profiling of bacteria and fungi: vaginal microbiota and mycobiota in healthy women.

Microbiome

December 2024

Faculty of Medicine, Human Microbiome Research Program, University of Helsinki, Helsinki, Finland.

Seppo Virtanen Schahzad Saqib Tinja Kanerva Rebecka Ventin-Holmberg Pekka Nieminen

Background: Amplicon sequencing of kingdom-specific tags such as 16S rRNA gene for bacteria and internal transcribed spacer (ITS) region for fungi are widely used for investigating microbial communities. So far most human studies have focused on bacteria while studies on host-associated fungi in health and disease have only recently started to accumulate. To enable cost-effective parallel analysis of bacterial and fungal communities in human and environmental samples, we developed a method where 16S rRNA gene and ITS1 amplicons were pooled together for a single Illumina MiSeq or HiSeq run and analysed after primer-based segregation.

View Article and Find Full Text PDF

Similar Publications

Parameter-efficient fine-tuning of large language models using semantic knowledge tuning.

Sci Rep

December 2024

University of Central Florida, Orlando, FL, 32816, USA.

Nusrat Jahan Prottasha Asif Mahmud Md Shohanur Islam Sobuj Prakash Bhat Md Kowsher

Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens.

View Article and Find Full Text PDF

Similar Publications

Physical function measures in ICU survivors, where to now? A scoping review.

South Afr J Crit Care

July 2024

Division of Physiotherapy, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.

I du Plessis S D Hanekom A R Lupton-Smith

Background: Growing evidence is describing the long-term morbidity experienced by critical illness survivors, a major contributing factor being impaired physical function. Consensus is yet to be reached on which physical function measures should be included in this population. This review aimed to describe physical functioning measurement instruments used in longitudinal studies of critical illness survivors, based on the International Classification of Function (ICF).

View Article and Find Full Text PDF

Similar Publications

CARE-SD: classifier-based analysis for recognizing provider stigmatizing and doubt marker labels in electronic health records: model development and validation.

J Am Med Inform Assoc

December 2024

Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, United States.

Andrew Walker Annie Thorne Sudeshna Das Jennifer Love Hannah L F Cooper

Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques.

Materials And Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!