Severity: Warning
Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 176
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML
File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 316
Function: require_once
Background: In the domain of natural language processing (NLP), the development and success of advanced language models are predominantly anchored in the richness of available linguistic resources. Languages such as Azerbaijani, which is classified as a low-resource, often face challenges arising from limited labeled datasets, consequently hindering effective model training.
Methodology: The primary objective of this study was to enhance the effectiveness and generalization capabilities of news text classification models using text augmentation techniques. In this study, we solve the problem of working with low-resource languages using translations using the Facebook mBart50 model, as well as the Google Translate API and a combination of mBart50 and Google Translate thus expanding the capabilities when working with text.
Results: The experimental outcomes reveal a promising uptick in classification performance when models are trained on the augmented dataset compared with their counterparts using the original data. This investigation underscores the immense potential of combined data augmentation strategies to bolster the NLP capabilities of underrepresented languages. As a result of our research, we have published our labeled text classification dataset and pre-trained RoBERTa model for the Azerbaijani language.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11041965 | PMC |
http://dx.doi.org/10.7717/peerj-cs.1974 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!