An error analysis for image-based multi-modal neural machine translation.

Mach Transl

Huawei Noah's Ark Lab, Hong Kong, Hong Kong.

Published: April 2019

AI Article Synopsis

  • The article presents a quantitative analysis of multi-modal neural machine translation (MNMT) models that incorporate visual features into translation tasks, focusing on how these models are trained with paired sentence-image data.
  • It compares two types of MNMT models: one that uses global image features (single vector for the whole image) and another that uses spatial features (multiple vectors for different parts of the image), analyzing the impact on translation quality.
  • Results show that multi-modal models enhance translation accuracy, particularly with simpler structures using global features, and significantly reduce errors across various types of translations, not just those related to strong visual elements.

Article Abstract

In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use and image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both . In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6579783PMC
http://dx.doi.org/10.1007/s10590-019-09226-9DOI Listing

Publication Analysis

Top Keywords

mnmt models
16
error analysis
12
multi-modal neural
8
neural machine
8
machine translation
8
visual features
8
multi-modal models
8
models
7
multi-modal
5
analysis image-based
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!