Defining a Preprocessing Pipeline for the MULTI-SITA Project and General Medical Italian Natural Language Data.

Stud Health Technol Inform

Department of Informatics, Bioengineering, Robotics and System Engineering, University of Genoa, Genoa, Italy.

Published: October 2023

The application of Natural Language Processing (NLP) to medical data has revolutionized different aspects of health care. The benefits obtained from the implementation of this technique spill over into several areas, including in the implementation of chatbots, which can provide medical assistance remotely. Every possible application of NLP depends on one first main step: the pre-processing of the corpus retrieved. The raw data must be prepared with the aim to be used efficiently for further analysis. Considerable progress has been made in this direction for the English language but for other languages, such as Italian, the state of the art is not equivalently advanced, especially for texts containing technical medical terms. The aim of this work is to identify and develop a preprocessing pipeline suitable for medical data written in Italian. The pipeline has been developed in Python environment, employing Enchant, ntlk modules and Hugging Face's BERT and BART-based models. Then, it has been tested on real conversations typed between patients and physicians regarding medical questions. The algorithm has been developed within the MULTI-SITA project of the Italian Society of Anti-Infective Therapy (SITA), but shows a flexible structure that can adapt to a large variety of data.

Download full-text PDF

Source
http://dx.doi.org/10.3233/SHTI230737DOI Listing

Publication Analysis

Top Keywords

preprocessing pipeline
8
multi-sita project
8
natural language
8
medical data
8
medical
6
data
5
defining preprocessing
4
pipeline multi-sita
4
project general
4
general medical
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!