Research Trends

AUTOMATIC PROCESSING OF SINO–NÔM TEXTS:

The Nôm script was created by the Vietnamese people based on Chinese characters around the 10th century and was used until the 19th century. Over the course of nearly a thousand years, numerous works in history, literature, medicine, agriculture, geography, and other fields were written in Sino–Nôm. Most of these materials, however, have not yet been “translated” into the modern Vietnamese script (Quốc ngữ). Even the most advanced AI systems in the world today are still unable to translate Nôm characters.
Therefore, our center focuses on researching and developing an automatic system to “translate” Sino–Nôm documents into National Script. This “translation” system involves multiple sub-tasks, including image classification, image recognition (OCR), text classification, transliteration (also known as phonetic translation into Sino-Vietnamese readings), and semantic translation (interpretation) into the contemporary language.
The system enables users to read, understand, and explore the vast body of valuable knowledge written in Sino–Nôm and passed down by our ancestors, now accessible through the modern National script. The system is currently available at: https://kimhannom.clc.hcmus.edu.vn or https://kimhannom.fit.hcmus.edu.vn and the KimHanNom app on mobile devices.
Dictionaries: Vietnamese-Vietnamese, English-Vietnamese, Vietnamese-English, French-Vietnamese, Vietnamese-French, Chinese-Vietnamese, Vietnamese-Chinese, Japanese-Vietnamese, Vietnamese-Japanese, Korean-Vietnamese, Vietnamese-Korean, German-Vietnamese, Vietnamese-German, Russian-Vietnamese, Vietnamese-Russian; general, professional, practical, etymology, Vietnamese WordNet
Corpora:
- Mono-lingual: VCor, VTB
- Bi-lingual: English-Vietnamese (EVC), French-Vietnamese (FVC), Korean-Vietnamese (KVC), Lao-Vietnamese (LVC), Vietnamese-Chinese (VCC), Basic Travel Expression Corpus (BTEC)
Tools: Sentence Segmentation (SS), Word Segmentation (WS), POS-Tagger, Chunker, Named Entity Recognition (NER), Parser, Dependency relation, Semantic Tagger
Text Processing: Text Classifier, Text Similarity, Spelling Checker, Grammar Checker, Text Readability, Stylometry, Sentiment analysis
Text Application:
- Text Translation, Text Summarization, Text/Opinion mining,
- Language: Contrastive Linguistics; teaching Vietnamese for foreigners, teaching foreign languages for Vietnamese
- Applications for the Blinds: Braille Translator, Talking Dictionaries, English learning for blind children

CLC

Research Trends

Contact