Text Summarization

1. Introduction

With the rapid development of the World Wide Web, huge amount of information is available online. This lead to the problem of information overload and the need of automatic summarization systems. The last decade has seen a growing trend towards automatic summarization, not only in academic but also in industry. Yahoo and Google acquired Summly and Wavii, start-up companies working on news summarization, respectively.
There are few research on Vietnamese text summarization. Most of them used the extraction approach, which selects a subset of existing words, phrases, or sentences in the original text to form the summary. The goal of the CLC summarization team is to create a summary that is closer to what a human might generate, a summary that might contain words not explicitly present in the original.

2. Research
We are focusing on three sub-problems of automatic text summarization: multi-document summarization, sentence fusion, and sentence compression.

  • Multi-document summarization system generates a summary from many documents on the same topic or the same event [2, 4].
  • Sentence fusion is a method that generate a short single sentence summary from a group of similar sentences [1].
  • Sentence compression aims to remove unnecessary words/phrases from a sentence while keeping the sentence grammatically correct [3].

References
[1] An-Vinh Luong, Nhi-Thao Tran, Van-Giau Ung and Minh-Quoc Nghiem (2015). Word Graph-Based Multi-Sentence Compression: Re-ranking Candidates Using Frequent Words. In: Ho Chi Minh city, Vietnam: The Seventh International Conference On Knowledge And Systems Engineering – KSE2015, in press
[2] Van-Giau Ung, An-Vinh Luong, Nhi-Thao Tran and Minh-Quoc Nghiem (2015). Combination of Features for Vietnamese News Multi-Document Summarization. In: Ho Chi Minh city, Vietnam: The Seventh International Conference On Knowledge And Systems Engineering – KSE2015, in press
[3] Nhi-Thao Tran, Van-Giau Ung, An-Vinh Luong, Minh-Quoc Nghiem and Ngan Nguyen. Improving Vietnamese Sentence Compression by Segmenting Meaning Chunks. In: Ho Chi Minh city, Vietnam: The Seventh International Conference On Knowledge And Systems Engineering – KSE2015, in press.
[4] Hy Nguyen, Tung Le, Viet-Thang Luong, Minh-Quoc Nghiem, and Dien Dinh. The Combination of Similarity Measures for Extractive Summarization. The Seventh International Symposium on Information and Communication Technology (SoICT ’16), December 08-09, 2016, Hochiminh City, Vietnam, © 2016 ACM. ISBN 978-1-4503-4815-7/16/12.