On the pitfalls of Batch Normalization for end-to-end video learning: A study on surgical workflow analysis.

Dominik Rivoir Isabel Funke Stefanie Speidel

Med Image Anal

Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany.

Published: May 2024

Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequence modeling. Yet, BN-related issues are hardly studied for long video understanding, despite the ubiquitous use of BN in CNNs (Convolutional Neural Networks) for feature extraction. Especially in surgical workflow analysis, where the lack of pretrained feature extractors has led to complex, multi-stage training pipelines, limited awareness of BN issues may have hidden the benefits of training CNNs and temporal models end to end. In this paper, we analyze pitfalls of BN in video learning, including issues specific to online tasks such as a 'cheating' effect in anticipation. We observe that BN's properties create major obstacles for end-to-end learning. However, using BN-free backbones, even simple CNN-LSTMs beat the state of the art on three surgical workflow benchmarks by utilizing adequate end-to-end training strategies which maximize temporal context. We conclude that awareness of BN's pitfalls is crucial for effective end-to-end learning in surgical tasks. By reproducing results on natural-video datasets, we hope our insights will benefit other areas of video learning as well. Code is available at: https://gitlab.com/nct_tso_public/pitfalls_bn.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.media.2024.103126	DOI Listing

Publication Analysis

Top Keywords

video learning

surgical workflow

workflow analysis

end-to-end learning

learning

pitfalls batch

batch normalization

end-to-end

normalization end-to-end

video

Similar Publications

Safety After Dark: A Privacy Compliant and Real-Time Edge Computing Intelligent Video Analytics for Safer Public Transportation.

Sensors (Basel)

December 2024

Australian Urban Research Infrastructure Network (AURIN), University of Melbourne, Melbourne, VIC 3052, Australia.

Johan Barthelemy Umair Iqbal Yan Qian Mehrdad Amirghasemi Pascal Perez

Public transportation systems play a vital role in modern cities, but they face growing security challenges, particularly related to incidents of violence. Detecting and responding to violence in real time is crucial for ensuring passenger safety and the smooth operation of these transport networks. To address this issue, we propose an advanced artificial intelligence (AI) solution for identifying unsafe behaviours in public transport.

View Article and Find Full Text PDF

Similar Publications

MFF-Net: A Lightweight Multi-Frequency Network for Measuring Heart Rhythm from Facial Videos.

Sensors (Basel)

December 2024

College of Electrical Engineering, Sichuan University, Chengdu 610065, China.

Wenqin Yan Jialiang Zhuang Yuheng Chen Yun Zhang Xiujuan Zheng

Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these methods are unable to effectively eliminate illumination variation and motion artifact disturbances, and their substantial computational resource requirements significantly limit their applicability in real-world scenarios.

View Article and Find Full Text PDF

Similar Publications

Keypoints-Based Multi-Cue Feature Fusion Network (MF-Net) for Action Recognition of ADHD Children in TOVA Assessment.

Bioengineering (Basel)

November 2024

College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.

Wanyu Tang Chao Shi Yuanyuan Li Zhonglan Tang Gang Yang

Attention deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder among children and adolescents. Behavioral detection and analysis play a crucial role in ADHD diagnosis and assessment by objectively quantifying hyperactivity and impulsivity symptoms. Existing video-based action recognition algorithms focus on object or interpersonal interactions, they may overlook ADHD-specific behaviors.

View Article and Find Full Text PDF

Similar Publications

Detection of Rat Pain-Related Grooming Behaviors Using Multistream Recurrent Convolutional Networks on Day-Long Video Recordings.

Bioengineering (Basel)

November 2024

School of Mechanical and Electrical Engineering, Sanming University, Sanming 365004, China.

Chien-Cheng Lee Ping-Wing Lui Wei-Wei Gao Zhongjian Gao

In experimental pain studies involving animals, subjective pain reports are not feasible. Current methods for detecting pain-related behaviors rely on human observation, which is time-consuming and labor-intensive, particularly for lengthy video recordings. Automating the quantification of these behaviors poses substantial challenges.

View Article and Find Full Text PDF

Similar Publications

A Novel Video Compression Approach Based on Two-Stage Learning.

Entropy (Basel)

December 2024

School of Software Technology, Dalian University of Technology, Dalian 116024, China.

Dan Shao Ning Wang Pu Chen Yu Liu Lin Lin

In recent years, the rapid growth of video data posed challenges for storage and transmission. Video compression techniques provided a viable solution to this problem. In this study, we proposed a bidirectional coding video compression model named DeepBiVC, which was based on two-stage learning.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!