An evaluation measurement in automatic text classification for authorship attribution

In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may...

Full description

Autores:
Rico Sulayes, Antonio
Tipo de recurso:
Fecha de publicación:
2016
Institución:
Universidad Santo Tomás
Repositorio:
Repositorio Institucional USTA
Idioma:
spa
OAI Identifier:
oai:repository.usta.edu.co:11634/4947
Acceso en línea:
http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093
Palabra clave:
classification systems
evaluation measurements
authorship attribution
sistemas de clasificación
medidas de evaluación
atribución de autoría
sistemas de classificação
medidas de avaliação
atribuição de autoria
Rights
License
Derechos de autor 2016 Ingenio Magno
Description
Summary:In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.