On the pitfalls of Batch Normalization for end-to-end video learning: A study on surgical workflow analysis.

Med Image Anal

Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany.

Published: May 2024

Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequence modeling. Yet, BN-related issues are hardly studied for long video understanding, despite the ubiquitous use of BN in CNNs (Convolutional Neural Networks) for feature extraction. Especially in surgical workflow analysis, where the lack of pretrained feature extractors has led to complex, multi-stage training pipelines, limited awareness of BN issues may have hidden the benefits of training CNNs and temporal models end to end. In this paper, we analyze pitfalls of BN in video learning, including issues specific to online tasks such as a 'cheating' effect in anticipation. We observe that BN's properties create major obstacles for end-to-end learning. However, using BN-free backbones, even simple CNN-LSTMs beat the state of the art on three surgical workflow benchmarks by utilizing adequate end-to-end training strategies which maximize temporal context. We conclude that awareness of BN's pitfalls is crucial for effective end-to-end learning in surgical tasks. By reproducing results on natural-video datasets, we hope our insights will benefit other areas of video learning as well. Code is available at: https://gitlab.com/nct_tso_public/pitfalls_bn.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.media.2024.103126DOI Listing

Publication Analysis

Top Keywords

video learning
12
surgical workflow
12
workflow analysis
8
end-to-end learning
8
learning
5
pitfalls batch
4
batch normalization
4
end-to-end
4
normalization end-to-end
4
video
4

Similar Publications

Public transportation systems play a vital role in modern cities, but they face growing security challenges, particularly related to incidents of violence. Detecting and responding to violence in real time is crucial for ensuring passenger safety and the smooth operation of these transport networks. To address this issue, we propose an advanced artificial intelligence (AI) solution for identifying unsafe behaviours in public transport.

View Article and Find Full Text PDF

Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these methods are unable to effectively eliminate illumination variation and motion artifact disturbances, and their substantial computational resource requirements significantly limit their applicability in real-world scenarios.

View Article and Find Full Text PDF

Attention deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder among children and adolescents. Behavioral detection and analysis play a crucial role in ADHD diagnosis and assessment by objectively quantifying hyperactivity and impulsivity symptoms. Existing video-based action recognition algorithms focus on object or interpersonal interactions, they may overlook ADHD-specific behaviors.

View Article and Find Full Text PDF

In experimental pain studies involving animals, subjective pain reports are not feasible. Current methods for detecting pain-related behaviors rely on human observation, which is time-consuming and labor-intensive, particularly for lengthy video recordings. Automating the quantification of these behaviors poses substantial challenges.

View Article and Find Full Text PDF

A Novel Video Compression Approach Based on Two-Stage Learning.

Entropy (Basel)

December 2024

School of Software Technology, Dalian University of Technology, Dalian 116024, China.

In recent years, the rapid growth of video data posed challenges for storage and transmission. Video compression techniques provided a viable solution to this problem. In this study, we proposed a bidirectional coding video compression model named DeepBiVC, which was based on two-stage learning.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!