Stylometry

1. Introduction
In the era of technology development, there have been more and more opportunities for us to gain knowledge. This leads to abundance of texts appearing on the Internet every day. In recent years, many scientists have conducted a great number of studies in the field in order to provide statistical results related to individual writing styles.

Stylometry is a quantitative analysis to capture the essence of a particular author’s writing style [1], for instances, ways of using symbols, digits, vocabulary richness, etc.
Stylometry has a significantly diverse range of applications, such as resolving disputed authorship, forensic research, identifying an author’s meta-knowledge, etc. The most well-know example of this is a study carried out by David Robinson, in which he identified which tweets were written by the US President Donald Trump on his Twitter account, which ones were made up by his staff.
Prominent subtasks of stylometry can be divided into five areas as the followings:

  • Authorship attribution
  • Authorship verification
  • Authorship profiling
  • Stylochronometry
  • Adversarial stylometry

Realizing that the research on stylometry has been new; however, it hasn’t obtained enough attention from researchers. Therefore, we have started many studies experimenting statistical methods at various levels, for examples, characteristic level, lexical level, syntactic level, etc.

2. Research
At present, we are conducting 3 main subtasks of stylometry: Authorship attribution, Authorship verification, Authorship profiling.

Regarding Authorship profiling, we conducted a study on the differences between males and females’ writing[1], which has raised many concerns all over the world.

References
[1] Mouton de Gruyter, Corpus Linguistics, Chapter 50
[2] Nhung Nguyen Tuyet, Duc Do Tran Anh, Dien Dinh, “Độ đo phong cách của văn bản tiếng Việt và ứng dụng”, 2017