Bayesian Analysis of the Heterogeneity of Literary Style

We proposed statistical analysis of the heterogeneity of literary style in a set of texts that simultaneously use different stylometric characteristics, like word length and the frequency of function words. The data set consists of several tables with the same number of rows, with the i-th row of al...

Full description

Autores:
Puig, Xavier
Font, Marti
Ginebra, Josep
Tipo de recurso:
Article of journal
Fecha de publicación:
2016
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/66510
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/66510
http://bdigital.unal.edu.co/67538/
Palabra clave:
51 Matemáticas / Mathematics
31 Colecciones de estadística general / Statistics
Authorship
Cluster analysis
Multinomial distribution
Análisi de conglomerados
Atribución
Distribución multinomial.
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:We proposed statistical analysis of the heterogeneity of literary style in a set of texts that simultaneously use different stylometric characteristics, like word length and the frequency of function words. The data set consists of several tables with the same number of rows, with the i-th row of all tables corresponding to the i-th text. The analysis proposed clusters the rows of all these tables simultaneously into groups with homogeneous style, based on a finite mixture of sets of multinomial models, one set for each table.  Different from the usual heuristic cluster analysis approaches, our method naturally incorporates the text size, the discrete nature of the data, and the dependence between categories in the analysis. The model is checked and chosen with the help of posterior predictive checks, together with the use of closed form expressions for the posterior probabilities that each of the models considered to be appropriate. This is illustrated through an analysis of the heterogeneity in Shakespeare’s plays, and by revisiting the authorshipattributionproblem of Tirant lo Blanc.