Re-Thinking the Effectiveness of Batch Normalization and Beyond.

IEEE Trans Pattern Anal Mach Intell

Published: January 2024

Batch normalization (BN) is used by default in many modern deep neural networks due to its effectiveness in accelerating training convergence and boosting inference performance. Recent studies suggest that the effectiveness of BN is due to the Lipschitzness of the loss and gradient, rather than the reduction of internal covariate shift. However, questions remain about whether Lipschitzness is sufficient to explain the effectiveness of BN and whether there is room for vanilla BN to be further improved. To answer these questions, we first prove that when stochastic gradient descent (SGD) is applied to optimize a general non-convex problem, three effects will help convergence to be faster and better: (i) reduction of the gradient Lipschitz constant, (ii) reduction of the expectation of the square of the stochastic gradient, and (iii) reduction of the variance of the stochastic gradient. We demonstrate that vanilla BN only with ReLU can induce the three effects above, rather than Lipschitzness, but vanilla BN with other nonlinearities like Sigmoid, Tanh, and SELU will result in degraded convergence performance. To improve vanilla BN, we propose a new normalization approach, dubbed complete batch normalization (CBN), which changes the placement position of normalization and modifies the structure of vanilla BN based on the theory. It is proven that CBN can elicit all the three effects above, regardless of the nonlinear activation used. Extensive experiments on benchmark datasets CIFAR10, CIFAR100, and ILSVRC2012 validate that CBN makes the training convergence faster, and the training loss converges to a smaller local minimum than vanilla BN. Moreover, CBN helps networks with multiple nonlinear activations (Sigmoid, Tanh, ReLU, SELU, and Swish) achieve higher test accuracy steadily. Specifically, benefitting from CBN, the classification accuracies for networks with Sigmoid, Tanh, and SELU are boosted by more than 15.0%, 4.5%, and 4.0% on average, respectively, which is even comparable to the performance for ReLU.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2023.3319005DOI Listing

Publication Analysis

Top Keywords

batch normalization
12
stochastic gradient
12
three effects
12
sigmoid tanh
12
training convergence
8
convergence faster
8
tanh selu
8
vanilla
6
normalization
5
gradient
5

Similar Publications

Introduction: Neurodegenerative diseases, including Parkinson's, Alzheimer's, and epilepsy, pose significant diagnostic and treatment challenges due to their complexity and the gradual degeneration of central nervous system structures. This study introduces a deep learning framework designed to automate neuro-diagnostics, addressing the limitations of current manual interpretation methods, which are often time-consuming and prone to variability.

Methods: We propose a specialized deep convolutional neural network (DCNN) framework aimed at detecting and classifying neurological anomalies in MRI data.

View Article and Find Full Text PDF

Extracellular vesicles (EVs) contain various glycans during their life cycle, from biogenesis to cellular recognition and uptake by recipient cells. EV glycosylation has substantial diagnostic significance in multiple health conditions, highlighting the necessity of determining an accurate glycosylation pattern for EVs from diverse biological fluids. Reliable and accessible glycan detection techniques help to elaborate the glycosylation-related functional alterations of specific proteins or lipids.

View Article and Find Full Text PDF

The Receptor for Advanced Glycation End Products (RAGE), part of the immunoglobulin superfamily, plays a significant role in various essential functions under both normal and pathological conditions, especially in the progression of Alzheimer's disease (AD). RAGE engages with several damage-associated molecular patterns (DAMPs), including advanced glycation end products (AGEs), beta-amyloid peptide (Aβ), high mobility group box 1 (HMGB1), and S100 calcium-binding proteins. This interaction impairs the brain's ability to clear Aβ, resulting in increased Aβ accumulation, neuronal injury, and mitochondrial dysfunction.

View Article and Find Full Text PDF

Exploring the role of ELOVLs family in lung adenocarcinoma based on bioinformatic analysis and experimental validation.

BMC Cancer

January 2025

Department of Respiratory and Critical Care Medicine, Research Center for Chronic Airway Diseases, Peking University Third Hospital, Peking University Health Science Center, Beijing, China.

Background: The role of lipid metabolic reprogramming in the development of various types of cancer has already been established. However, the exact biological function and significance of the elongation of very-long-chain fatty acids (ELOVLs) gene family, which can affect fatty acid metabolism, is still not well understood in lung adenocarcinoma (LUAD). The aim of our study is to explore whether there are genes related to the pathogenesis of LUAD in the ELOVLs family, and even to guide clinical medication and potential prognostic indicators.

View Article and Find Full Text PDF

The associations between paternal postpartum depressive symptoms and testosterone and cortisol levels in hair over the first two years postpartum.

Prog Neuropsychopharmacol Biol Psychiatry

January 2025

Department of Psychotherapy and Psychosomatic Medicine, Faculty of Medicine, Technische Universität Dresden, Dresden, Germany. Electronic address:

Background: After the birth of a child, also fathers may develop postpartum depression. Altered steroid hormone concentrations are discussed as a possible underlying mechanism, as these have been associated with depressive symptoms in previous studies outside the postpartum period. While higher paternal testosterone levels have been found to protect against paternal postpartum depressive symptoms (PPDS), an association between higher cortisol levels and PPDS has been seen in postpartum mothers, with no comparable studies available on fathers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!