Lip Reading by Alternating between Spatiotemporal and Spatial Convolutions.

Dimitrios Tsourounis Dimitris Kastaniotis Spiros Fotopoulos

J Imaging

Department of Physics, University of Patras, 26504 Rio Patra, Greece.

Published: May 2021

Lip reading (LR) is the task of predicting the speech utilizing only the visual information of the speaker. In this work, for the first time, the benefits of alternating between spatiotemporal and spatial convolutions for learning effective features from the LR sequences are studied. In this context, a new learnable module named ALSOS (Alternating Spatiotemporal and Spatial Convolutions) is introduced in the proposed LR system. The ALSOS module consists of spatiotemporal (3D) and spatial (2D) convolutions along with two conversion components (3D-to-2D and 2D-to-3D) providing a sequence-to-sequence-mapping. The designed LR system utilizes the ALSOS module in-between ResNet blocks, as well as Temporal Convolutional Networks (TCNs) in the backend for classification. The whole framework is composed by feedforward convolutional along with residual layers and can be trained end-to-end directly from the image sequences in the word-level LR problem. The ALSOS module can capture spatiotemporal dynamics and can be advantageous in the task of LR when combined with the ResNet topology. Experiments with different combinations of ALSOS with ResNet are performed on a dataset in Greek language simulating a medical support application scenario and on the popular large-scale LRW-500 dataset of English words. Results indicate that the proposed ALSOS module can improve the performance of a LR system. Overall, the insertion of ALSOS module into the ResNet architecture obtained higher classification accuracy since it incorporates the contribution of the temporal information captured at different spatial scales of the framework.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321361	PMC
http://dx.doi.org/10.3390/jimaging7050091	DOI Listing

Publication Analysis

Top Keywords

alsos module

spatiotemporal spatial

spatial convolutions

alternating spatiotemporal

lip reading

alsos

module

spatiotemporal

spatial

reading alternating

Similar Publications

Lip Reading by Alternating between Spatiotemporal and Spatial Convolutions.

J Imaging

May 2021

Department of Physics, University of Patras, 26504 Rio Patra, Greece.

Dimitrios Tsourounis Dimitris Kastaniotis Spiros Fotopoulos

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!