Classification of authors for an automatic recommendation process for criminal responsibility

One problem in classifying tasks is the handling of features that characterize classes. When the list of features is long, a noise resistant algorithm of irrelevant features can be used, or these features can be reduced. Authorship attribution is a task that assigns an anonymous text to a subject on...

Full description

Autores:
amelec, viloria
Pineda Lezama, Omar Bonerge
Chang, Eduardo
Tipo de recurso:
Article of journal
Fecha de publicación:
2020
Institución:
Corporación Universidad de la Costa
Repositorio:
REDICUC - Repositorio CUC
Idioma:
eng
OAI Identifier:
oai:repositorio.cuc.edu.co:11323/7692
Acceso en línea:
https://hdl.handle.net/11323/7692
https://doi.org/10.1016/j.procs.2020.07.098
https://repositorio.cuc.edu.co/
Palabra clave:
Authorship attribution
Classification features
Noise resistant algorithms
Feature reduction
Rights
openAccess
License
CC0 1.0 Universal
Description
Summary:One problem in classifying tasks is the handling of features that characterize classes. When the list of features is long, a noise resistant algorithm of irrelevant features can be used, or these features can be reduced. Authorship attribution is a task that assigns an anonymous text to a subject on a list of possible authors, has been widely addressed as an automatic text classification task. In it, n-grams can produce long lists of features even in small corpora. Despite this, there is a lack of research exposing the effects of using noise-resistant algorithms, reducing traits, or combining both options. This paper responds to this lack by using contributions to discussion forums related to organized crime. The results show that the classifiers evaluated, in general, benefit from feature reduction, and that, thanks to such reduction, even classical algorithms outperform state-of-the-art classifiers considered highly noise resistant.