In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2024.3387399 | DOI Listing |
Heliyon
December 2024
Department of Mechatronics, Aliko Dangote University of Science and Technology, Kano, Nigeria.
Having accurate and effective wind energy forecasting that can be easily incorporated into smart networks is important. Appropriate planning and energy generation predictions are necessary for these infrastructures. The production of wind energy is linked to instability and unpredictability.
View Article and Find Full Text PDFBrief Bioinform
November 2024
Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China.
Drug repositioning, which involves identifying new therapeutic indications for approved drugs, is pivotal in accelerating drug discovery. Recently, to mitigate the effect of label sparsity on inferring potential drug-disease associations (DDAs), graph contrastive learning (GCL) has emerged as a promising paradigm to supplement high-quality self-supervised signals through designing auxiliary tasks, then transfer shareable knowledge to main task, i.e.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Wimmera Catchment Management Authority, 24 Darlot St, Horsham, VIC 3400, Australia.
Hyperspectral band selection algorithms are crucial for processing high-dimensional data, which enables dimensionality reduction, improves data analysis, and enhances computational efficiency. Among these, attention-based algorithms have gained prominence by ranking bands based on their discriminative capability. However, they require a large number of model parameters, which increases the need for extensive training data.
View Article and Find Full Text PDFEng Comput
March 2024
Department of Mechanical and Aerospace Engineering, University of California San Diego, 9500 Gilman Drive, Mail Code 0411, La Jolla, CA 92093 USA.
Isogeometric analysis (IGA) has emerged as a promising approach in the field of structural optimization, benefiting from the seamless integration between the computer-aided design (CAD) geometry and the analysis model by employing non-uniform rational B-splines (NURBS) as basis functions. However, structural optimization for real-world CAD geometries consisting of multiple non-matching NURBS patches remains a challenging task. In this work, we propose a unified formulation for shape and thickness optimization of separately parametrized shell structures by adopting the free-form deformation (FFD) technique, so that continuity with respect to design variables is preserved at patch intersections during optimization.
View Article and Find Full Text PDFPeerJ Comput Sci
October 2024
College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
In this paper, we propose a novel optimization approach to designing wideband infinite impulse response (IIR) digital fractional order differentiators (DFODs) with improved accuracy at low frequency bands. In the new method, the objective function is formulated as an optimization problem with two tuning parameters to control the error distribution over frequencies. The gradient based optimizer (GBO) is effectively employed on the proposed objective function.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!