IEEE Trans Pattern Anal Mach Intell
March 2023
An integral part of video analysis and surveillance is temporal activity detection, which means to simultaneously recognize and localize activities in long untrimmed videos. Currently, the most effective methods of temporal activity detection are based on deep learning, and they typically perform very well with large scale annotated videos for training. However, these methods are limited in real applications due to the unavailable videos about certain activity classes and the time-consuming data annotation.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
March 2023
The main challenge in the field of unsupervised machine translation (UMT) is to associate source-target sentences in the latent space. As people who speak different languages share biologically similar visual systems, various unsupervised multi-modal machine translation (UMMT) models have been proposed to improve the performances of UMT by employing visual contents in natural images to facilitate alignment. Commonly, relation information is the important semantic in a sentence.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
January 2023
Scene graph is a structured representation of a scene that can clearly express the objects, attributes, and relationships between objects in the scene. As computer vision technology continues to develop, people are no longer satisfied with simply detecting and recognizing objects in images; instead, people look forward to a higher level of understanding and reasoning about visual scenes. For example, given an image, we want to not only detect and recognize objects in the image, but also understand the relationship between objects (visual relationship detection), and generate a text description (image captioning) based on the image content.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
May 2009
Video surveillance is an alternative approach to staff or self-reporting that has the potential to detect and monitor aggressive behaviors more accurately. In this paper, we propose an automatic algorithm capable of recognizing aggressive behaviors from video records using local binary motion descriptors. The proposed algorithm may increase the accuracy for retrieving aggressive behaviors from video records, and thereby facilitates scientific inquiry into this low frequency but high impact phenomenon that eludes other measurement approaches.
View Article and Find Full Text PDF