Theoretical and empirical evidence highlights a positive correlation between the flatness of loss landscapes around minima and generalization. However, most current approaches that seek to find flat minima either incur high computational costs or struggle to balance generalization, training stability, and convergence. This work proposes reshaping the loss landscape to induce the optimizer toward flat regions, an approach that has negligible computational costs and does not compromise training stability, convergence, or efficiency. We focus on nonlinear, loss-dependent reshaping functions underpinned by theoretical insights to reshape the loss landscape. To design these functions, we first identify where and how these functions should be applied. With the aid of recently developed tools in stochastic optimization, theoretical analysis shows that steepening the low-loss landscape improves the rate of sharp minimum escape while flattening the high-and ultralow-loss landscapes enhances training stability and optimization performance, respectively. Simulations and experiments reveal that the subtly designed reshaping functions not only induce optimizers to find flat minima and improve generalization performance but also stabilize training, promote optimization, and keep efficiency. Our approach is evaluated on image classification, adversarial robustness, and natural language processing (NLP) tasks and achieves significant improvement in generalization performance with negligible computational cost. We believe that the new perspective introduced in this work will broadly impact the field of deep neural network training. The code is available at https://github.com/LongJin-lab/LLR.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3462516DOI Listing

Publication Analysis

Top Keywords

loss landscape
12
training stability
12
find flat
8
flat minima
8
computational costs
8
stability convergence
8
negligible computational
8
reshaping functions
8
generalization performance
8
training
5

Similar Publications

Perrault syndrome (PS) is an extremely rare autosomal recessive condition characterized primarily by bilateral sensorineural hearing loss in both genders and primary or secondary ovarian failure in females. Neurological features such as cerebral ataxia, peripheral neuropathy, epilepsy, and intellectual disability are frequent manifestations of PS. To date, six genes have been reported to cause PS, and nearly 100 families have been identified worldwide with this syndrome.

View Article and Find Full Text PDF

Background: Diffuse hemispheric glioma, histone 3 (H3) G34-mutant, has been newly defined in the 2021 WHO classification of central nervous system tumors. Here we sought to define the prognostic roles of clinical, neuroimaging, pathological, and molecular features of these tumors.

Methods: We retrospectively assembled a cohort of 114 patients (median age 22 years) with diffuse hemispheric glioma, H3 G34-mutant, CNS WHO grade 4 and profiled the imaging, histological and molecular landscape of their tumors.

View Article and Find Full Text PDF

Obesity is an established risk factor for breast cancer development and poor prognosis. The adipose environment surrounding breast tumors, which is inflamed in obesity, has been implicated in tumor progression, and TREM2, a transmembrane receptor expressed on macrophages in adipose tissue and tumors, is an emerging therapeutic target for cancer. A better understanding of the mechanisms for the obesity-breast cancer association and the potential benefits of weight loss could help inform treatment strategies.

View Article and Find Full Text PDF

Pancreatic ductal adenocarcinoma (PDAC) contains an extensive stroma that modulates response to therapy, contributing to the dismal prognosis associated with this cancer. Evidence suggests that PDAC stromal composition is shaped by mutations within malignant cells, but most previous work has focused on pre-clinical models driven by KrasG12D and mutant Trp53. Elucidation of the contribution of additional known oncogenic drivers, including KrasG12V mutation and Smad4 loss, is needed to increase understanding of malignant cell-stroma crosstalk in PDAC.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!