AI Article Synopsis

  • Large-scale pretrained language models (PLMs) have driven advances in natural language processing (NLP), but fine-tuning these models often leads to overfitting and poor performance due to their complexity and limited data.
  • To combat this issue, a new fine-tuning strategy called layerwise noise stability regularization (LNSR) is introduced, which adds noise to the input and regularizes the outputs of each layer in the model.
  • Experimental results demonstrate that LNSR outperforms several existing techniques, especially in more complex tasks like question-answering, and enhances the model's ability for generalization across different domains.

Article Abstract

The advent of large-scale pretrained language models (PLMs) has contributed greatly to the progress in natural language processing (NLP). Despite its recent success and wide adoption, fine-tuning a PLM often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named layerwise noise stability regularization (LNSR). Specifically, our method perturbs the input of neural networks with the standard Gaussian or in-manifold noise in the representation space and regularizes each layer's output of the language model. We provide theoretical and experimental analyses to prove the effectiveness of our method. The empirical results show that our proposed method outperforms several state-of-the-art algorithms, such as [Formula: see text] norm and start point (L2-SP), Mixout, FreeLB, and smoothness inducing adversarial regularization and Bregman proximal point optimization (SMART). In addition to evaluating the proposed method on relatively simple text classification tasks, similar to the prior works, we further evaluate the effectiveness of our method on more challenging question-answering (QA) tasks. These tasks present a higher level of difficulty, and they provide a larger amount of training examples for tuning a well-generalized model. Furthermore, the empirical results indicate that our proposed method can improve the ability of language models to domain generalization.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3330926DOI Listing

Publication Analysis

Top Keywords

proposed method
12
pretrained language
8
language model
8
noise stability
8
stability regularization
8
language models
8
effectiveness method
8
method
6
language
5
improving pretrained
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!