Research Trends

 

  1. AUTOMATIC PROCESSING OF SINO–NÔM TEXTS:

    The Nôm script was created by the Vietnamese people based on Chinese characters around the 10th century and was used until the 19th century. Over the course of nearly a thousand years, numerous works in history, literature, medicine, agriculture, geography, and other fields were written in Sino–Nôm. Most of these materials, however, have not yet been “translated” into the modern Vietnamese script (Quốc ngữ). Even the most advanced AI systems in the world today are still unable to translate Nôm characters.
    Therefore, our center focuses on researching and developing an automatic system to “translate” Sino–Nôm documents into National Script. This “translation” system involves multiple sub-tasks, including image classification, image recognition (OCR), text classification, transliteration (also known as phonetic translation into Sino-Vietnamese readings), and semantic translation (interpretation) into the contemporary language.
    The system enables users to read, understand, and explore the vast body of valuable knowledge written in Sino–Nôm and passed down by our ancestors, now accessible through the modern National script. The system is currently available at: https://kimhannom.clc.hcmus.edu.vn or https://kimhannom.fit.hcmus.edu.vn and the KimHanNom app on mobile devices.

    Diagram v2.1

  2. Dictionaries: Vietnamese-Vietnamese, English-Vietnamese, Vietnamese-English, French-Vietnamese, Vietnamese-French, Chinese-Vietnamese, Vietnamese-Chinese, Japanese-Vietnamese, Vietnamese-Japanese, Korean-Vietnamese, Vietnamese-Korean, German-Vietnamese, Vietnamese-German, Russian-Vietnamese, Vietnamese-Russian; general, professional, practical, etymology, Vietnamese WordNet
  3. Corpora:
    • Mono-lingual: VCor, VTB
    • Bi-lingual: English-Vietnamese (EVC), French-Vietnamese (FVC), Korean-Vietnamese (KVC), Lao-Vietnamese (LVC), Vietnamese-Chinese (VCC), Basic Travel Expression Corpus (BTEC)
  4. Tools: Sentence Segmentation (SS), Word Segmentation (WS), POS-Tagger, Chunker, Named Entity Recognition (NER), Parser, Dependency relation, Semantic Tagger
  5. Text Processing: Text Classifier, Text Similarity, Spelling Checker, Grammar Checker, Text Readability, Stylometry, Sentiment analysis
  6. Text Application: