Deep neural networks have an inbuilt Occam's razor.

Nat Commun

Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK.

Published: January 2025

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components for supervised learning, we apply a Bayesian picture based on the functions expressed by a DNN. The prior over functions is determined by the network architecture, which we vary by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. Combining this with the prior yields an accurate prediction for the posterior, measured for DNNs trained with stochastic gradient descent. This analysis shows that structured data, together with a specific Occam's razor-like inductive bias towards (Kolmogorov) simple functions that exactly counteracts the exponential growth of the number of functions with complexity, is a key to the success of DNNs.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-024-54813-xDOI Listing

Publication Analysis

Top Keywords

deep neural
8
neural networks
8
network architecture
8
functions
5
networks inbuilt
4
inbuilt occam's
4
occam's razor
4
razor remarkable
4
remarkable performance
4
performance overparameterized
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!