We study the binary and continuous negative-margin perceptrons as simple nonconvex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide.
View Article and Find Full Text PDFCurrent deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for nonconvex optimization.
View Article and Find Full Text PDFThe success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities.
View Article and Find Full Text PDFThe differing ability of polypeptide conformations to act as the native state of proteins has long been rationalized in terms of differing kinetic accessibility or thermodynamic stability. Building on the successful applications of physical concepts and sampling algorithms recently introduced in the study of disordered systems, in particular artificial neural networks, we quantitatively explore how well a quantity known as the local entropy describes the native state of model proteins. In lattice models and all-atom representations of proteins, we are able to efficiently sample high local entropy states and to provide a proof of concept of enhanced stability and folding rate.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
January 2020
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question.
View Article and Find Full Text PDFRectified linear units (ReLUs) have become the main model for the neural units in current deep learning systems. This choice was originally suggested as a way to compensate for the so-called vanishing gradient problem which can undercut stochastic gradient descent learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights.
View Article and Find Full Text PDFStochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relationship between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our (DCM) rule satisfies some basic requirements for biological feasibility: finite and noisy afferent signals, Dale's principle and asymmetry of synaptic connections, locality of the weight update computations.
View Article and Find Full Text PDFStochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2018
Quantum annealers aim at solving nonconvex optimization problems by exploiting cooperative tunneling effects to escape local minima. The underlying idea consists of designing a classical energy function whose ground states are the sought optimal solutions of the original optimization problem and add a controllable quantum transverse field to generate tunneling processes. A key challenge is to identify classes of nonconvex optimization problems for which quantum annealing remains efficient while thermal annealing fails.
View Article and Find Full Text PDFBackground: Distinct RNA species may compete for binding to microRNAs (miRNAs). This competition creates an indirect interaction between miRNA targets, which behave as miRNA sponges and eventually influence each other's expression levels. Theoretical predictions suggest that not only the mean expression levels of targets but also the fluctuations around the means are coupled through miRNAs.
View Article and Find Full Text PDFIn artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space.
View Article and Find Full Text PDFLearning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the possibility of learning efficiently in networks with binary synapses.
View Article and Find Full Text PDFWe show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically.
View Article and Find Full Text PDFUnderstanding the theoretical foundations of how memories are encoded and retrieved in neural populations is a central challenge in neuroscience. A popular theoretical scenario for modeling memory function is the attractor neural network scenario, whose prototype is the Hopfield model. The model simplicity and the locality of the synaptic update rules come at the cost of a poor storage capacity, compared with the capacity achieved with perceptron learning algorithms.
View Article and Find Full Text PDFSystems biology aims at creating mathematical models, i.e., computational reconstructions of biological systems and processes that will result in a new level of understanding-the elucidation of the basic and presumably conserved "design" and "engineering" principles of biomolecular systems.
View Article and Find Full Text PDFWe study several Bayesian inference problems for irreversible stochastic epidemic models on networks from a statistical physics viewpoint. We derive equations which allow us to accurately compute the posterior distribution of the time evolution of the state of each node given some observations. At difference with most existing methods, we allow very general observation models, including unobserved nodes, state observations made at different or unknown times, and observations of infection times, possibly mixed together.
View Article and Find Full Text PDFIn the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.
View Article and Find Full Text PDFWe present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability.
View Article and Find Full Text PDFAdvances in experimental techniques resulted in abundant genomic, transcriptomic, epigenomic, and proteomic data that have the potential to reveal critical drivers of human diseases. Complementary algorithmic developments enable researchers to map these data onto protein-protein interaction networks and infer which signaling pathways are perturbed by a disease. Despite this progress, integrating data across different biological samples or patients remains a substantial challenge because samples from the same disease can be extremely heterogeneous.
View Article and Find Full Text PDFBackground: Ten-Eleven Translocation (TETs)proteins mediate the oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC). Tet1 is expressed at high levels in mouse embryonic stem cells (ESCs), where it mediates the induction of 5hmC decoration on gene-regulatory elements. While the function of Tet1 is known, the mechanisms of its specificity remain unclear.
View Article and Find Full Text PDFThe anterior inferotemporal cortex (IT) is the highest stage along the hierarchy of visual areas that, in primates, processes visual objects. Although several lines of evidence suggest that IT primarily represents visual shape information, some recent studies have argued that neuronal ensembles in IT code the semantic membership of visual objects (i.e.
View Article and Find Full Text PDFSimple models of irreversible dynamical processes such as bootstrap percolation have been successfully applied to describe cascade processes in a large variety of different contexts. However, the problem of analyzing nontypical trajectories, which can be crucial for the understanding of out-of-equilibrium phenomena, is still considered to be intractable in most cases. Here we introduce an efficient method to find and analyze optimized trajectories of cascade processes.
View Article and Find Full Text PDFMicroRNAs (miRNAs) are small RNA molecules, about 22 nucleotide long, which post-transcriptionally regulate their target messenger RNAs (mRNAs). They accomplish key roles in gene regulatory networks, ranging from signaling pathways to tissue morphogenesis, and their aberrant behavior is often associated with the development of various diseases. Recently it has been experimentally shown that the way miRNAs interact with their targets can be described in terms of a titration mechanism.
View Article and Find Full Text PDFCompetitive endogenous (ce)RNAs cross-regulate each other through sequestration of shared microRNAs and form complex regulatory networks based on their microRNA signature. However, the molecular requirements for ceRNA cross-regulation and the extent of ceRNA networks remain unknown. Here, we present a mathematical mass-action model to determine the optimal conditions for ceRNA activity in silico.
View Article and Find Full Text PDF