Appearance-based gaze estimation has been widely studied recently with promising performance. The majority of appearance-based gaze estimation methods are developed under the deterministic frameworks. However, the deterministic gaze estimation methods suffer from large performance drop upon challenging eye images in low-resolution, darkness, partial occlusions, etc. To alleviate this problem, in this article, we alternatively reformulate the appearance-based gaze estimation problem under a generative framework. Specifically, we propose a variational inference model, that is, variational gaze estimation network (VGE-Net), to generate multiple gaze maps as complimentary candidates simultaneously supervised by the ground-truth gaze map. To achieve robust estimation, we adaptively fuse the gaze directions predicted on these candidate gaze maps by a regression network through a simple attention mechanism. Experiments on three benchmarks, that is, MPIIGaze, EYEDIAP, and Columbia, demonstrate that our VGE-Net outperforms state-of-the-art gaze estimation methods, especially on challenging cases. Comprehensive ablation studies also validate the effectiveness of our contributions. The code will be publicly released.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCYB.2023.3312392 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!