Publications by authors named "Rita Cucchiara"

Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This work extends the VATr [1] Styled-HTG approach by addressing the pre-processing and training issues that it faces, which are common to many HTG models.

View Article and Find Full Text PDF

Background And Objective: The functions of an organism and its biological processes result from the expression of genes and proteins. Therefore quantifying and predicting mRNA and protein levels is a crucial aspect of scientific research. Concerning the prediction of mRNA levels, the available approaches use the sequence upstream and downstream of the Transcription Start Site (TSS) as input to neural networks.

View Article and Find Full Text PDF

Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through -nearest neighbor (NN) searches.

View Article and Find Full Text PDF

According to the latest recommendations, there is no longer place for the use of short-acting beta-2 agonist alone in chronic asthma treatment, due to an increased risk of severe exacerbations and exacerbation-related mortality. Current management of asthma is based on the use of inhaled corticosteroids in combination with formoterol as maintenance and as rescue treatment, thanks to the rapid and prolonged action of formoterol. General practitioners must evaluate, with the patient's collaboration, the treatable factors linked to poor asthma control.

View Article and Find Full Text PDF

Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, large research efforts have been devoted to image captioning, i.e.

View Article and Find Full Text PDF

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly.

View Article and Find Full Text PDF

Nowadays, we are witnessing the wide diffusion of active depth sensors. However, the generalization capabilities and performance of the deep face recognition approaches that are based on depth data are hindered by the different sensor technologies and the currently available depth-based datasets, which are limited in size and acquired through the same device. In this paper, we present an analysis on the use of depth maps, as obtained by active depth sensors and deep neural architectures for the face recognition task.

View Article and Find Full Text PDF

In this article we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a single monocular image. Differently from parametric (i.e.

View Article and Find Full Text PDF

Face verification is the task of checking if two provided images contain the face of the same person or not. In this work, we propose a fully-convolutional Siamese architecture to tackle this task, achieving state-of-the-art results on three publicly-released datasets, namely , (HRRFaceD), and . The proposed method takes depth maps as the input, since depth cameras have been proven to be more reliable in different illumination conditions.

View Article and Find Full Text PDF

Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system.

View Article and Find Full Text PDF

Data-driven saliency has recently gained a lot of attention thanks to the use of Convolutional Neural Networks for predicting gaze fixations. In this paper we go beyond standard approaches to saliency prediction, in which gaze maps are computed with a feed-forward network, and present a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms. The core of our solution is a Convolutional LSTM that focuses on the most salient regions of the input image to iteratively refine the predicted saliency map.

View Article and Find Full Text PDF

In this work we aim to predict the driver's focus of attention. The goal is to estimate what a person would pay attention to while driving, and which part of the scene around the vehicle is more critical for the task. To this end we propose a new computer vision model based on a multi-branch deep architecture that integrates three sources of information: raw video, motion and scene semantics.

View Article and Find Full Text PDF

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment.

View Article and Find Full Text PDF

Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In this work, we propose a novel algorithm for detecting social groups in crowds by means of a Correlation Clustering procedure on people trajectories. The affinity between crowd members is learned through an online formulation of the Structural SVM framework and a set of specifically designed features characterizing both their physical and social identity, inspired by Proxemic theory, Granger causality, DTW and Heat-maps.

View Article and Find Full Text PDF

Augmented user experiences in the cultural heritage domain are in increasing demand by the new digital native tourists of 21st century. In this paper, we propose a novel solution that aims at assisting the visitor during an outdoor tour of a cultural site using the unique first person perspective of wearable cameras. In particular, the approach exploits computer vision techniques to retrieve the details by proposing a robust descriptor based on the covariance of local features.

View Article and Find Full Text PDF

There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects.

View Article and Find Full Text PDF

The common paradigm employed for object detection is the sliding window (SW) search. This approach generates grid-distributed patches, at all possible positions and sizes, which are evaluated by a binary classifier: The tradeoff between computational burden and detection accuracy is the real critical point of sliding windows; several methods have been proposed to speed up the search such as adding complementary features. We propose a paradigm that differs from any previous approach since it casts object detection into a statistical-based search using a Monte Carlo sampling for estimating the likelihood density function with Gaussian kernels.

View Article and Find Full Text PDF

In this paper, we define a new paradigm for eight-connection labeling, which employs a general approach to improve neighborhood exploration and minimizes the number of memory accesses. First, we exploit and extend the decision table formalism introducing OR-decision tables, in which multiple alternative actions are managed. An automatic procedure to synthesize the optimal decision tree from the decision table is used, providing the most effective conditions evaluation order.

View Article and Find Full Text PDF

This paper presents a novel and robust approach to consistent labeling for people surveillance in multi-camera systems. A general framework scalable to any number of cameras with overlapped views is devised. An off-line training process automatically computes ground-plane homography and recovers epipolar geometry.

View Article and Find Full Text PDF

In the last decade, computerized tomography (CT) has become the most frequently used imaging modality to obtain a correct pre-operative implant planning. In this work, we present an image analysis and computer vision approach able to identify, from the reconstructed 3D data set, the optimal cutting plane specific to each implant to be planned, in order to obtain the best view of the implant site and to have correct measures. If the patient requires more implants, different cutting planes are automatically identified, and the axial and cross-sectional images can be re-oriented accordingly to each of them.

View Article and Find Full Text PDF

Background: Identification of dark areas inside a melanocytic lesion (ML) is of great importance for melanoma diagnosis, both during clinical examination and employing programs for automated image analysis.

Objective: The aim of our study was to compare two different methods for the automated identification and description of dark areas in epiluminescence microscopy images of MLs and to evaluate their diagnostic capability.

Methods: Two methods for the automated extraction of 'absolute' (ADAs) and 'relative' dark areas (RDAs) and a set of parameters for their description were developed and tested on 339 images of MLs acquired by means of a polarized-light videomicroscope.

View Article and Find Full Text PDF

The aim of this study was to provide mathematical descriptors for the border of pigmented skin lesion images and to assess their efficacy for distinction among different lesion groups. New descriptors such as lesion slope and lesion slope regularity are introduced and mathematically defined. A new algorithm based on the Catmull-Rom spline method and the computation of the gray-level gradient of points extracted by interpolation of normal direction on spline points was employed.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Notice

Message: fwrite(): Write of 34 bytes failed with errno=28 No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 272

Backtrace:

A PHP Error was encountered

Severity: Warning

Message: session_write_close(): Failed to write session data using user defined save handler. (session.save_path: /var/lib/php/sessions)

Filename: Unknown

Line Number: 0

Backtrace: