Semantic Textual Similarity

1. Introduction:
Semantic textual similarity (STS) is a measure of the degree of semantic equivalence between two pieces of text. The task of measuring STS plays a very important role in the quality of many natural language processing tasks such as information retrieval, word sense disambiguation, text summary, quality evaluation of machine translation model, plagiarism detection, etc. Although the previous research in building a model for measuring the STS has been done with the achieved remarkable results, only a few studies are conducted on Vietnamese with the ineffective tool quality. Developing and having application algorithms for measuring the STS in low-resource language pairs, such as Vietnamese, is not simple as in rich-resource languages.

2. Research:
We are focusing on Vietnamese semantic textual similarity task and Vietnamese- English cross-language semantic textual similarity task. Our proposed solution is the hybrid model between the traditional methods such as n-gram-based, vector-based, semantic-based, fuzzy-based, and graph-based with the deep learning methods such as CNN, RNN, and Graph CNN.
We also construct a large and reliable corpora of Vietnamese semantic textual similarity and Vietnamese- English cross-language semantic textual similarity.

References:
[1] Nguyen Le Thanh and Dien Dinh (2017). English-Vietnamese cross-language paraphrase identification method. The Eighth International Symposium on Information and Communication Technology (SoICT ’17), December 07-08, 2017, Nha Trang, Vietnam, © 2017 ACM.
[2] Nguyen Le Thanh, Toan Nguyen Xuan and Dien Dinh (2016). Vietnamese plagiarism detection method. The Seventh International Symposium on Information and Communication Technology (SoICT ’16), December 08-09, 2016, Hochiminh City, Vietnam, © 2016 ACM.