IEEE Trans Image Process
February 2024
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed.
View Article and Find Full Text PDFFace anti-spoofing (FAS) techniques play an important role in defending face recognition systems against spoofing attacks. Existing FAS methods often require a large number of annotated spoofing face data to train effective anti-spoofing models. Considering the attacking nature of spoofing data and its diverse variants, obtaining all the spoofing types in advance is difficult.
View Article and Find Full Text PDFAccurate predictions of future pedestrian trajectory could prevent a considerable number of traffic injuries and improve pedestrian safety. It involves multiple sources of information and real-time interactions, e.g.
View Article and Find Full Text PDF