The cross-lingual plagiarism detection (CLPD) is a challenging problem in natural language processing. Cross-lingual plagiarism is when a text is translated from any other language and used as it is without proper acknowledgment. Most of the existing methods provide good results for monolingual plagiarism detection, whereas the performances of existing methods for the CLPD are very limited. The reason for this is that it is difficult to represent the text from two different languages in a common semantic space. In this article, a novel Siamese architecture-based model is proposed to detect the cross-lingual plagiarism in English-Hindi language pairs. The proposed model combines the convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM) network to learn the semantic similarity among the cross-lingual sentences for the English-Hindi language pairs. In the proposed model, the CNN model learns the local context of words, whereas the Bi-LSTM model learns the global context of sentences in forward and backward directions. The performances of the proposed models are evaluated on the benchmark data set, that is, Microsoft paraphrase corpus, which is converted in the English-Hindi language pairs. The proposed model outperforms other models giving 67%, 72%, and 67% weighted average precision, recall, and F1-measure scores. The experimental results show the effectiveness of the proposed models over the baseline models because the proposed model is very efficient in representing the cross-lingual text very efficiently.

Download full-text PDF

Source
http://dx.doi.org/10.1089/big.2020.0243DOI Listing

Publication Analysis

Top Keywords

cross-lingual plagiarism
16
english-hindi language
16
language pairs
16
proposed model
16
plagiarism detection
12
pairs proposed
12
existing methods
8
model learns
8
proposed models
8
model
7

Similar Publications

The cross-lingual plagiarism detection (CLPD) is a challenging problem in natural language processing. Cross-lingual plagiarism is when a text is translated from any other language and used as it is without proper acknowledgment. Most of the existing methods provide good results for monolingual plagiarism detection, whereas the performances of existing methods for the CLPD are very limited.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!