Dataset construction method of cross-lingual summarization based on filtering and text augmentation.

PeerJ Comput Sci

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China.

Published: March 2023

Existing cross-lingual summarization (CLS) datasets consist of inconsistent sample quality and low scale. To address these problems, we propose a method that jointly supervises quality and scale to build CLS datasets. In terms of quality supervision, the method adopts a multi-strategy filtering algorithm to remove low-quality samples of monolingual summarization (MS) from the perspectives of character and semantics, thereby improving the quality of the MS dataset. In terms of scale supervision, the method adopts a text augmentation algorithm based on the pretrained model to increase the size of CLS datasets with quality assurance. This method was used to build an English-Chinese CLS dataset and evaluate it with a reasonable data quality evaluation framework. The evaluation results show that the dataset is of good quality and large size. These outcomes show that the proposed method may comprehensively improve quality and scale, thereby resulting in a high-quality and large-scale CLS dataset at a lower cost.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280405PMC
http://dx.doi.org/10.7717/peerj-cs.1299DOI Listing

Publication Analysis

Top Keywords

cls datasets
12
cross-lingual summarization
8
text augmentation
8
quality
8
quality scale
8
supervision method
8
method adopts
8
cls dataset
8
method
6
dataset
5

Similar Publications

Wavelet Texture Descriptor for Steel Surface Defect Classification.

Materials (Basel)

November 2024

Department of Computer Science and Information Technologies, University of Kasdi Merbah, Ouargla 30000, Algeria.

The accurate and efficient classification of steel surface defects is critical for ensuring product quality and minimizing production costs. This paper proposes a novel method based on wavelet transform and texture descriptors for the robust and precise classification of steel surface defects. By leveraging the multiscale analysis capabilities of wavelet transforms, our method extracts both broad and fine-grained textural features.

View Article and Find Full Text PDF
Article Synopsis
  • Recent guidelines for women living with HIV in high-income countries emphasize shared decision-making, yet there's limited understanding of their infant feeding knowledge and healthcare provider interactions.
  • This study, conducted in Denmark, Finland, and Sweden, involved pregnant women living with HIV, gathering both quantitative data through a survey and qualitative insights from interviews to explore their knowledge and experiences over time.
  • Results revealed that women were confused about breastfeeding in relation to the U=U concept, with Nordic women showing more uncertainty than those from non-Nordic backgrounds, while ongoing monitoring of mothers was not perceived as a barrier to breastfeeding.
View Article and Find Full Text PDF

We propose wake-sleep consolidated learning (WSCL), a learning strategy leveraging complementary learning system (CLS) theory and the wake-sleep phases of the human brain to improve the performance of deep neural networks (DNNs) for visual classification tasks in continual learning (CL) settings. Our method learns continually via the synchronization between distinct wake and sleep phases. During the wake phase, the model is exposed to sensory input and adapts its representations, ensuring stability through a dynamic parameter freezing mechanism and storing episodic memories in a short-term temporary memory (similar to what happens in the hippocampus).

View Article and Find Full Text PDF

In recent years, significant progress has been made in facial expression recognition methods. However, tasks related to facial expression recognition in real environments still require further research. This paper proposes a tri-cross-attention transformer with a multi-feature fusion network (TriCAFFNet) to improve facial expression recognition performance under challenging conditions.

View Article and Find Full Text PDF

Emotion recognition of EEG signals based on contrastive learning graph convolutional model.

J Neural Eng

August 2024

College of electronic and optical engineering & college of flexible electronics (future technology), Nanjing University of Posts and Telecommunications, Jiangsu 210023, People's Republic of China.

Electroencephalogram (EEG) signals offer invaluable insights into the complexities of emotion generation within the brain. Yet, the variability in EEG signals across individuals presents a formidable obstacle for empirical implementations. Our research addresses these challenges innovatively, focusing on the commonalities within distinct subjects' EEG data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!