Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models.

Phys Rev Res

Department of Physics, Boston University, Boston, Massachusetts 02215, USA.

Published: March 2022

The bias-variance trade-off is a central concept in supervised learning. In classical statistics, increasing the complexity of a model (e.g., number of parameters) reduces bias but also increases variance. Until recently, it was commonly believed that optimal performance is achieved at intermediate model complexities which strike a balance between bias and variance. Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance using "over-parameterized models" where the number of fit parameters is large enough to perfectly fit the training data. As a result, understanding bias and variance in over-parameterized models has emerged as a fundamental problem in machine learning. Here, we use methods from statistical physics to derive analytic expressions for bias and variance in two minimal models of over-parameterization (linear regression and two-layer neural networks with nonlinear data distributions), allowing us to disentangle properties stemming from the model architecture and random sampling of data. In both models, increasing the number of fit parameters leads to a phase transition where the training error goes to zero and the test error diverges as a result of the variance (while the bias remains finite). Beyond this threshold, the test error of the two-layer neural network decreases due to a monotonic decrease in the bias and variance in contrast with the classical bias-variance trade-off. We also show that in contrast with classical intuition, over-parameterized models can overfit even in the absence of noise and exhibit bias even if the student and teacher models match. We synthesize these results to construct a holistic understanding of generalization error and the bias-variance trade-off in over-parameterized models and relate our results to random matrix theory.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879296PMC
http://dx.doi.org/10.1103/physrevresearch.4.013201DOI Listing

Publication Analysis

Top Keywords

bias variance
20
bias-variance trade-off
12
over-parameterized models
12
bias
8
learning methods
8
number fit
8
fit parameters
8
two-layer neural
8
test error
8
contrast classical
8

Similar Publications

Prediction of time-dependent bearing capacity of concrete pile in cohesive soil using optimized relevance vector machine and long short-term memory models.

Sci Rep

December 2024

Department of Geosciences, Geotechnology, and Materials Engineering for Resources, Graduate School of International Resource Sciences, Akita University, Akita, Japan.

The present investigation employs relevance vector machine (RVM) and long short-term memory (LSTM) models to predict the time-dependent bearing capacity of concrete piles. Each RVM model (SRVM) is configured by each linear, polynomial, gaussian, sigmoid, laplacian, and exponential kernel function. Each SRVM model has been optimized by each genetic (GA_SRVM) and particle swarm optimization (PSO_RVM) algorithm.

View Article and Find Full Text PDF

Medical datasets are vital for advancing Artificial Intelligence (AI) in healthcare. Yet biases in these datasets on which deep-learning models are trained can compromise reliability. This study investigates biases stemming from dataset-creation practices.

View Article and Find Full Text PDF

Feasibility evaluation of big data algorithms for establishing serum protein electrophoresis reference intervals using Hoffmann and refineR methods.

Clin Chim Acta

December 2024

Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China; State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Science, Beijing 100730, China.

Background: Serum protein electrophoresis (SPE) is essential for diagnosing monoclonal gammopathies and a variety of other diseases. Despite its importance, there is a scarcity of SPE parameter reference intervals (RIs) derived from large datasets. This study seeks to fill this gap by establishing sex-specific RIs using Hoffmann and refineR algorithms and assessing the feasibility of these methods.

View Article and Find Full Text PDF

Background: Fasciolosis is a prevalent disease that significantly impairs the health and productivity of cattle and causes significant economic damage. Beyond the individually available studies with varying prevalence rates, there are no pooled national prevalence studies on bovine fasciolosis. Therefore, the current study aims to determine the pooled prevalence and economic significance of fasciolosis among cattle in Ethiopia.

View Article and Find Full Text PDF

Purpose: We performed a systematic review and meta-analysis to examine the associations between telomere length and telomerase activity in subjects with and without metabolic syndrome (MetS).

Methods: The meta-analysis protocol was registered in the PROSPERO database. The PubMed, Embase, Cochrane Library, and LILACS databases were searched for studies reporting telomere length or telomerase activity in adult men and non-pregnant women with and without MetS.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!