Abstract
We apply a statistical method, information-based energy, to quantify informative symbolic sequences. To apply this method to literary texts, it is assumed that different words with different occurrence frequencies are at different energy levels, and that the energy-occurrence frequency distribution obeys a Boltzmann distribution. The temperature within the Boltzmann distribution can be an indicator for the author's writing capacity as the repertory of thoughts. The relative temperature of a text is obtained by comparing the energy-occurrence frequency distributions of words collected from one text versus from all texts of the same author. Combining the relative temperature with the Shannon entropy as the text complexity, the information-based energy of the text is defined and can be viewed as a quantitative evaluation of an author's writing performance. We demonstrate the method by analyzing two authors, Shakespeare in English and Jin Yong in Chinese, and find that their well-known works are associated with higher information-based energies. This method can be used to measure the creativity level of a writer's work in linguistics, and can also quantify symbolic sequences in different systems.
Original language | English |
---|---|
Pages (from-to) | 783-789 |
Number of pages | 7 |
Journal | Physica A: Statistical Mechanics and its Applications |
Volume | 468 |
DOIs | |
State | Published - 15 Feb 2017 |
Keywords
- Boltzmann distribution
- Linguistic analysis
- Shannon entropy
- Thermodynamics and statistical mechanics