Neural machine translation of clinical texts between long distance languages.

Xabier Soto Olatz Perez-de-Viñaspre Gorka Labaka Maite Oronoz

J Am Med Inform Assoc

Faculty of Informatics, Computer Languages and Systems, Ixa Research Group, University of the Basque Country (UPV/EHU), Donostia, Spain.

Published: December 2019

Objective: To analyze techniques for machine translation of electronic health records (EHRs) between long distance languages, using Basque and Spanish as a reference. We studied distinct configurations of neural machine translation systems and used different methods to overcome the lack of a bilingual corpus of clinical texts or health records in Basque and Spanish.

Materials And Methods: We trained recurrent neural networks on an out-of-domain corpus with different hyperparameter values. Subsequently, we used the optimal configuration to evaluate machine translation of EHR templates between Basque and Spanish, using manual translations of the Basque templates into Spanish as a standard. We successively added to the training corpus clinical resources, including a Spanish-Basque dictionary derived from resources built for the machine translation of the Spanish edition of SNOMED CT into Basque, artificial sentences in Spanish and Basque derived from frequently occurring relationships in SNOMED CT, and Spanish monolingual EHRs. Apart from calculating bilingual evaluation understudy (BLEU) values, we tested the performance in the clinical domain by human evaluation.

Results: We achieved slight improvements from our reference system by tuning some hyperparameters using an out-of-domain bilingual corpus, obtaining 10.67 BLEU points for Basque-to-Spanish clinical domain translation. The inclusion of clinical terminology in Spanish and Basque and the application of the back-translation technique on monolingual EHRs significantly improved the performance, obtaining 21.59 BLEU points. This was confirmed by the human evaluation performed by 2 clinicians, ranking our machine translations close to the human translations.

Discussion: We showed that, even after optimizing the hyperparameters out-of-domain, the inclusion of available resources from the clinical domain and applied methods were beneficial for the described objective, managing to obtain adequate translations of EHR templates.

Conclusion: We have developed a system which is able to properly translate health record templates from Basque to Spanish without making use of any bilingual corpus of clinical texts or health records.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7647170	PMC
http://dx.doi.org/10.1093/jamia/ocz110	DOI Listing

Publication Analysis

Top Keywords

machine translation

clinical texts

health records

basque spanish

bilingual corpus

corpus clinical

clinical domain

neural machine

clinical

long distance

Similar Publications

FDA reviewed artificial intelligence-enabled products applicable to emergency medicine.

Am J Emerg Med

December 2024

Department of Emergency Medicine, Mayo Clinic, Rochester, MN, USA.

Jacob Morey John Schupbach Derick Jones Laura Walker Rachel Lindor

Objective: To identify and assess artificial intelligence (AI)-enabled products reviewed by the U.S. Food and Drug Administration (FDA) that are potentially applicable to emergency medicine (EM).

View Article and Find Full Text PDF

Similar Publications

Analyzing the TotalSegmentator for facial feature removal in head CT scans.

Radiography (Lond)

January 2025

Department of Radiology, Charité Universitätsmedizin Berlin, Berlin, Germany; Berlin Institute of Health, Berlin, Germany.

M Lindholz R Ruppel S Schulze-Weddige G L Baumgärtner I Schobert

Background: Facial recognition technology in medical imaging, particularly with head scans, poses privacy risks due to identifiable facial features. This study evaluates the use of facial recognition software in identifying facial features from head CT scans and explores a defacing pipeline using TotalSegmentator to reduce re-identification risks while preserving data integrity for research.

Methods: 1404 high-quality renderings from the UCLH EIT Stroke dataset, both with and without defacing were analysed.

View Article and Find Full Text PDF

Similar Publications

Molecular dynamics simulation based prediction of T-cell epitopes for the production of effector molecules for liver cancer immunotherapy.

PLoS One

January 2025

School of Information and Technology, Wenzhou Business College, Wenzhou, Zhejiang, China.

Sidra Zafar Yuhe Bai Syed Aun Muhammad Jinlei Guo Haris Khurram

Liver cancer is the sixth most frequent malignancy and the fourth major cause of deaths worldwide. The current treatments are only effective in early stages of cancer. To overcome the therapeutic challenges and exploration of immunotherapeutic options, broad spectral therapeutic vaccines could have significant impact.

View Article and Find Full Text PDF

Similar Publications

Digital Health Technology Research Funded by the National Institutes of Health.

JAMA Netw Open

January 2025

National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland.

Pablo Cure Thomas Radman Jaime Mihoko Doyle Audie A Atienza Joshua P Fessel

Importance: Digital health in biomedical research and its expanding list of potential clinical applications are rapidly evolving. A combination of new digital health technologies (DHTs), novel uses of existing DHTs through artificial intelligence- and machine learning-based algorithms, and improved integration and analysis of data from multiple sources has enabled broader use and delivery of these tools for research and health care purposes. The aim of this study was to assess the growth and overall trajectory of DHT funding through a National Institutes of Health (NIH)-wide grant portfolio analysis.

View Article and Find Full Text PDF

Similar Publications

Basic Science and Pathogenesis.

Alzheimers Dement

December 2024

Chambers-Grundy Center for Transformative Neuroscience, Department of Brain Health, School of Integrated Health Sciences, University of Nevada Las Vegas, Las Vegas, NV, USA.

Feixiong Cheng Yuan Hou Pengyue Zhang Fan Fan Jonathan L Haines

Background: Although high-throughput DNA/RNA sequencing technologies have generated massive genetic and genomic data in human disease, translation of these findings into new patient treatment has not materialized by lack of effective approaches, such as Artificial Intelligence (AL) and Machine Learning (ML) tools.

Method: To address this problem, we have used AI/ML approaches, Mendelian randomization (MR), and large patient's genetic and functional genomic data to evaluate druggable targets using Alzheimer's disease (AD) as a prototypical example. We utilized the genomic instruments from 9 expression quantitative trait loci (eQTL) and 3 protein quantitative trait loci (pQTL) datasets across five human brain regions from three biobanks.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!