XGrad: Boosting Gradient-Based Optimizers With Weight Prediction.

Lei Guan Dongsheng Li Yanqi Shi Jian Meng

IEEE Trans Pattern Anal Mach Intell

Published: October 2024

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2024.3387399	DOI Listing

Publication Analysis

Top Keywords

gradient-based optimizers

weight prediction

dnn models

convergence generalization

future weights

optimizers

training

xgrad

xgrad boosting

gradient-based

Similar Publications

Quasi-Newton optimised Kolmogorov-Arnold Networks for wind farm power prediction.

Heliyon

December 2024

Department of Mechatronics, Aliko Dangote University of Science and Technology, Kano, Nigeria.

Auwalu Saleh Mubarak Zubaida Said Ameen Sagiru Mati Ayodele Lasisi Quadri Noorulhasan Naveed

Having accurate and effective wind energy forecasting that can be easily incorporated into smart networks is important. Appropriate planning and energy generation predictions are necessary for these infrastructures. The production of wind energy is linked to instability and unpredictability.

View Article and Find Full Text PDF

Similar Publications

Heterogeneous graph contrastive learning with gradient balance for drug repositioning.

Brief Bioinform

November 2024

Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China.

Hai Cui Meiyu Duan Haijia Bi Xiaobo Li Xiaodi Hou

Drug repositioning, which involves identifying new therapeutic indications for approved drugs, is pivotal in accelerating drug discovery. Recently, to mitigate the effect of label sparsity on inferring potential drug-disease associations (DDAs), graph contrastive learning (GCL) has emerged as a promising paradigm to supplement high-quality self-supervised signals through designing auxiliary tasks, then transfer shareable knowledge to main task, i.e.

View Article and Find Full Text PDF

Similar Publications

BSDR: A Data-Efficient Deep Learning-Based Hyperspectral Band Selection Algorithm Using Discrete Relaxation.

Sensors (Basel)

December 2024

Wimmera Catchment Management Authority, 24 Darlot St, Horsham, VIC 3400, Australia.

Mohammad Rahman Shyh Wei Teng Manzur Murshed Manoranjan Paul David Brennan

Hyperspectral band selection algorithms are crucial for processing high-dimensional data, which enables dimensionality reduction, improves data analysis, and enhances computational efficiency. Among these, attention-based algorithms have gained prominence by ranking bands based on their discriminative capability. However, they require a large number of model parameters, which increases the need for extensive training data.

View Article and Find Full Text PDF

Similar Publications

Automated shape and thickness optimization for non-matching isogeometric shells using free-form deformation.

Eng Comput

March 2024

Department of Mechanical and Aerospace Engineering, University of California San Diego, 9500 Gilman Drive, Mail Code 0411, La Jolla, CA 92093 USA.

Han Zhao David Kamensky John T Hwang Jiun-Shyan Chen

Isogeometric analysis (IGA) has emerged as a promising approach in the field of structural optimization, benefiting from the seamless integration between the computer-aided design (CAD) geometry and the analysis model by employing non-uniform rational B-splines (NURBS) as basis functions. However, structural optimization for real-world CAD geometries consisting of multiple non-matching NURBS patches remains a challenging task. In this work, we propose a unified formulation for shape and thickness optimization of separately parametrized shell structures by adopting the free-form deformation (FFD) technique, so that continuity with respect to design variables is preserved at patch intersections during optimization.

View Article and Find Full Text PDF

Similar Publications

Optimal wideband digital fractional-order differentiators using gradient based optimizer.

PeerJ Comput Sci

October 2024

College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.

Mohammed Ali Mohammed Moqbel Talal Ahmed Ali Ali Zhu Xiao

In this paper, we propose a novel optimization approach to designing wideband infinite impulse response (IIR) digital fractional order differentiators (DFODs) with improved accuracy at low frequency bands. In the new method, the objective function is formulated as an optimization problem with two tuning parameters to control the error distribution over frequencies. The gradient based optimizer (GBO) is effectively employed on the proposed objective function.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!