It is important to understand how dropout, a popular regularization method, aids in achieving a good generalization solution during neural network training. In this work, we present a theoretical derivation of an implicit regularization of dropout, which is validated by a series of experiments. Additionally, we numerically study two implications of the implicit regularization, which intuitively rationalizes why dropout helps generalization. First, we find that input weights of hidden neurons tend to condense on isolated orientations trained with dropout. Condensation is a feature in the non-linear learning process, which makes the network less complex. Second, we find that the training with dropout leads to the neural network with a flatter minimum compared with standard gradient descent training, and the implicit regularization is the key to finding flat solutions. Although our theory mainly focuses on dropout used in the last hidden layer, our experiments apply to general dropout in training neural networks. This work points out a distinct characteristic of dropout compared with stochastic gradient descent and serves as an important basis for fully understanding dropout.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2024.3357172 | DOI Listing |
J Pers Soc Psychol
January 2025
Department of Experimental-Clinical and Health Psychology, Ghent University.
Human likes and dislikes can be established or changed in numerous ways. Three of the most well-studied procedures involve exposing people to regularities in the environment (evaluative conditioning, approach-avoidance, mere exposure), to verbal information about upcoming regularities (evaluative conditioning, approach-avoidance, or mere exposure information), or to verbal information about the evaluative properties of an attitude object (persuasive messages). In the present study, we investigated the relation between, on the one hand, different types of experiment-related beliefs (regularity, influence, and hypothesis awareness) and demand reactions (demand compliance and reactance) and, on the other hand, evaluative learning about novel food brands (Experiments 1 and 2) and well-known food brands (Experiment 2) via persuasive messages, experienced regularities, and verbal information about regularities.
View Article and Find Full Text PDFBMC Public Health
January 2025
Grounded Research Hub, Rotherham Doncaster and South Humber NHS Foundation Trust, Doncaster, DN4 8QN, UK.
Background: Households in areas of socio-economic deprivation are more likely to consume diets low in fruit and vegetables. Fresh Street is a place-based fruit and vegetable voucher scheme with vouchers redeemable with local independent (non-supermarket) vendors. Paper vouchers are offered to all households in a geographical area regardless of household type, size, or income with no requirement to demonstrate need.
View Article and Find Full Text PDFACS Nano
January 2025
Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, Institute of Respiratory Health, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.
Sci Rep
December 2024
Center for Research on Microgrids (UPC CROM), Department of Electronic Engineering, Technical University of Catalonia, 08019, Barcelona, Spain.
With rising demand for electricity, integrating renewable energy sources into power networks has become a key challenge. The fast incorporation of clean energy sources, particularly solar and wind power, into the existing power grid in the last several years has raised a major problem in controlling and managing the power grid due to the intermittent nature of these sources. Therefore, in order to ensure the safe RES integration providing high-quality power at a fair price and for the secure and reliable functioning of electrical systems, a precise one-day-ahead solar irradiation and wind speed forecast is essential for a stable and safe hybrid energy system.
View Article and Find Full Text PDFUntrained networks inspired by deep image priors have shown promising capabilities in recovering high-quality images from noisy or partial measurements . Their success is widely attributed to implicit regularization due to the spectral bias of suitable network architectures. However, the application of such network-based priors often entails superfluous architectural decisions, risks of overfitting, and lengthy optimization processes, all of which hinder their practicality.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!