Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN ). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in J on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2022.3163375 | DOI Listing |
J Imaging
January 2025
Science and Research Department, Moscow Technical University of Communications and Informatics, 111024 Moscow, Russia.
Object detection in images is a fundamental component of many safety-critical systems, such as autonomous driving, video surveillance systems, and robotics. Adversarial patch attacks, being easily implemented in the real world, provide effective counteraction to object detection by state-of-the-art neural-based detectors. It poses a serious danger in various fields of activity.
View Article and Find Full Text PDFJ Imaging
January 2025
School of Artificial Intelligence, Changchun University of Science and Technology, Changchun 130012, China.
For surveillance video management in university laboratories, issues such as occlusion and low-resolution face capture often arise. Traditional face recognition algorithms are typically static and rely heavily on clear images, resulting in inaccurate recognition for low-resolution, small-sized faces. To address the challenges of occlusion and low-resolution person identification, this paper proposes a new face recognition framework by reconstructing Retinaface-Resnet and combining it with Quality-Adaptive Margin (adaface).
View Article and Find Full Text PDFJ Pers Med
January 2025
Department of Thoracic Surgery, Sant'Andrea, Hospital, Sapienza University, 00189 Rome, Italy.
. The optimal surgical approach for thymoma resection is still an object of debate. The increasing experience in robotic-assisted thoracic surgery (RATS) has led to the progressive affirmation of this technique as a valid alternative to Sternotomy, Thoracotomy and Video-Assisted Thoracic Surgery (VATS) in this setting.
View Article and Find Full Text PDFSci Rep
January 2025
Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
With the global population surpassing 8 billion, waste production has skyrocketed, leading to increased pollution that adversely affects both terrestrial and marine ecosystems. Public littering, a significant contributor to this pollution, poses severe threats to marine life due to plastic debris, which can inflict substantial ecological harm. Additionally, this pollution jeopardizes human health through contaminated food and water sources.
View Article and Find Full Text PDFData Brief
February 2025
Universidade da Coruña, CITIC Research Center, A Coruña 15071, Spain.
This paper presents a synthetic dataset of labeled game situations in recordings of federated handball and basketball matches played in Galicia, Spain. The dataset consists of synthetic data generated from real video frames, including 308,805 labeled handball frames and 56,578 labeled basketball frames extracted from 2105 handball and 383 basketball 5-s video clips. Experts manually labeled the video clips based on the respective sports, while the individual frames were automatically labeled using computer vision and machine learning techniques.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!