This paper presents a comprehensive comparison between Vision Transformers and Convolutional Neural Networks for face recognition related tasks, including extensive experiments on the tasks of face identification and verification. Our study focuses on six state-of-the-art models: EfficientNet, Inception, MobileNet, ResNet, VGG, and Vision Transformers. Our evaluation of these models is based on five diverse datasets: Labeled Faces in the Wild, Real World Occluded Faces, Surveillance Cameras Face, UPM-GTI-Face, and VGG Face 2.
View Article and Find Full Text PDFAutomatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. The primary challenge lies in achieving real-time recognition while managing temporal dependencies that can impact performance. Existing methods employ 3D convolutional or Transformer-based architectures with hand skeleton estimation, but both have limitations.
View Article and Find Full Text PDFBackground: There is widespread agreement amongst clinicians that people with non-specific low back pain (NSLBP) comprise a heterogeneous group and that their management should be individually tailored. One treatment known by its tailored design is the McKenzie method (e.g.
View Article and Find Full Text PDFAlong with society's development, transportation has become a key factor in human daily life, increasing the number of vehicles on the streets. Consequently, the task of finding free parking slots in metropolitan areas can be dramatically challenging, increasing the chance of getting involved in an accident and the carbon footprint, and negatively affecting the driver's health. Therefore, technological resources to deal with parking management and real-time monitoring have become key players in this scenario to speed up the parking process in urban areas.
View Article and Find Full Text PDFThis paper proposes a strategy to segment the playing field in soccer images, suitable for integration in many soccer image analysis applications. The combination of a green chromaticity-based analysis and an analysis of the chromatic distortion using full-color information, both at the pixel-level, allows segmenting the green areas of the images. Then, a fully automatic post-processing block at the region-level discards the green areas that do not belong to the playing field.
View Article and Find Full Text PDFBackground: Virtual reality (VR) technologies have been shown to be beneficial in various areas of health care; to date, there are no systematic reviews examining the effectiveness of VR technology for the treatment of spinal pain.
Purpose: To investigate the effectiveness of VR technology in the management of individuals with acute, subacute, and chronic spinal pain.
Methods: Six electronic databases were searched until November 2019.
Visual hand gesture recognition systems are promising technologies for Human Computer Interaction, as they allow a more immersive and intuitive interaction. Most of these systems are based on the analysis of skeleton information, which is in turn inferred from color, depth, or near-infrared imagery. However, the robust extraction of skeleton information from images is only possible for a subset of hand poses, which restricts the range of gestures that can be recognized.
View Article and Find Full Text PDFIEEE Trans Image Process
July 2018
There has been a significant increase in the availability of 3D players and displays in the last years. Nonetheless, the amount of 3D content has not experimented an increment of such magnitude. To alleviate this problem, many algorithms for converting images and videos from 2D to 3D have been proposed.
View Article and Find Full Text PDFUnmanned Aerial Vehicles (UAVs) are being extensively used nowadays. Therefore, pilots of traditional aerial platforms should adapt their skills to operate them from a Ground Control Station (GCS). Common GCSs provide information in separate screens: one presents the video stream while the other displays information about the mission plan and information coming from other sensors.
View Article and Find Full Text PDFThere is a huge proliferation of surveillance systems that require strategies for detecting different kinds of stationary foreground objects (e.g., unattended packages or illegally parked vehicles).
View Article and Find Full Text PDFMany computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of these kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations.
View Article and Find Full Text PDFAn advanced and user-friendly tool for fast labeling of moving objects captured with surveillance sensors is proposed, which is available to the public. This tool allows the creation of three kinds of labels: moving objects, shadows and occlusions. These labels are created at both the pixel level and object level, which makes them suitable to assess the quality of both moving object detection strategies and tracking algorithms.
View Article and Find Full Text PDFWe propose a new Bayesian framework for automatically determining the position (location and orientation) of an uncalibrated camera using the observations of moving objects and a schematic map of the passable areas of the environment. Our approach takes advantage of static and dynamic information on the scene structures through prior probability distributions for object dynamics. The proposed approach restricts plausible positions where the sensor can be located while taking into account the inherent ambiguity of the given setting.
View Article and Find Full Text PDFLow-cost systems that can obtain a high-quality foreground segmentation almost independently of the existing illumination conditions for indoor environments are very desirable, especially for security and surveillance applications. In this paper, a novel foreground segmentation algorithm that uses only a Kinect depth sensor is proposed to satisfy the aforementioned system characteristics. This is achieved by combining a mixture of Gaussians-based background subtraction algorithm with a new Bayesian network that robustly predicts the foreground/background regions between consecutive time steps.
View Article and Find Full Text PDFElectronic devices endowed with camera platforms require new and powerful machine vision applications, which commonly include moving object detection strategies. To obtain high-quality results, the most recent strategies estimate nonparametrically background and foreground models and combine them by means of a Bayesian classifier. However, typical classifiers are limited by the use of constant prior values and they do not allow the inclusion of additional spatiodependent prior information.
View Article and Find Full Text PDF