An evaluation measurement in automatic text classification for authorship attribution

In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may...

Full description

Autores:
Rico Sulayes, Antonio
Tipo de recurso:
Fecha de publicación:
2016
Institución:
Universidad Santo Tomás
Repositorio:
Repositorio Institucional USTA
Idioma:
spa
OAI Identifier:
oai:repository.usta.edu.co:11634/4947
Acceso en línea:
http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093
Palabra clave:
classification systems
evaluation measurements
authorship attribution
sistemas de clasificación
medidas de evaluación
atribución de autoría
sistemas de classificação
medidas de avaliação
atribuição de autoria
Rights
License
Derechos de autor 2016 Ingenio Magno
id SANTTOMAS2_631639298f77a5a37d9bdf427b54cb83
oai_identifier_str oai:repository.usta.edu.co:11634/4947
network_acronym_str SANTTOMAS2
network_name_str Repositorio Institucional USTA
repository_id_str
dc.title.spa.fl_str_mv An evaluation measurement in automatic text classification for authorship attribution
dc.title.alternative.eng.fl_str_mv An evaluation measurement in automatic text classification for authorship attribution
dc.title.alternative.por.fl_str_mv Medida de avaliação na classificação automática de texto para atribuição de autoria
title An evaluation measurement in automatic text classification for authorship attribution
spellingShingle An evaluation measurement in automatic text classification for authorship attribution
classification systems
evaluation measurements
authorship attribution
sistemas de clasificación
medidas de evaluación
atribución de autoría
sistemas de classificação
medidas de avaliação
atribuição de autoria
title_short An evaluation measurement in automatic text classification for authorship attribution
title_full An evaluation measurement in automatic text classification for authorship attribution
title_fullStr An evaluation measurement in automatic text classification for authorship attribution
title_full_unstemmed An evaluation measurement in automatic text classification for authorship attribution
title_sort An evaluation measurement in automatic text classification for authorship attribution
dc.creator.fl_str_mv Rico Sulayes, Antonio
dc.contributor.author.spa.fl_str_mv Rico Sulayes, Antonio
dc.subject.proposal.eng.fl_str_mv classification systems
evaluation measurements
authorship attribution
topic classification systems
evaluation measurements
authorship attribution
sistemas de clasificación
medidas de evaluación
atribución de autoría
sistemas de classificação
medidas de avaliação
atribuição de autoria
dc.subject.proposal.spa.fl_str_mv sistemas de clasificación
medidas de evaluación
atribución de autoría
dc.subject.proposal.por.fl_str_mv sistemas de classificação
medidas de avaliação
atribuição de autoria
description In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.
publishDate 2016
dc.date.issued.spa.fl_str_mv 2016-07-07
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.drive.none.fl_str_mv info:eu-repo/semantics/article
dc.identifier.spa.fl_str_mv http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093
url http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.spa.fl_str_mv http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093/1059
dc.relation.citationissue.spa.fl_str_mv Ingenio Magno; Vol. 6 (2015): Ingenio Magno Vol. 6-2; 62-74
2422-2399
2145-9282
dc.rights.spa.fl_str_mv Derechos de autor 2016 Ingenio Magno
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Derechos de autor 2016 Ingenio Magno
http://purl.org/coar/access_right/c_abf2
dc.format.mimetype.spa.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Universidad Santo Tomás Seccional Tunja
institution Universidad Santo Tomás
repository.name.fl_str_mv Repositorio Universidad Santo Tomás
repository.mail.fl_str_mv noreply@usta.edu.co
_version_ 1782026344461238272
spelling Rico Sulayes, Antonio2016-07-07http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.Na atribuição de autoria, uma tarefa que consiste na atribuição correta de um documento anônimo a um autor que faz parte de um conjunto de indivíduos, diversas medidas para a avaliação de sistemas de classificação tem sido usadas pelos pesquisadores da área. Conforme argumentado neste artigo, algumas destas medidas são diametralmente opostas. Para fins de investigação, a avaliação de um sistema de classificação automática de textos, como o utilizado na atribuição de autoria, pode reportar várias medidas diferentes sobre o desempenho do sistema, porém, algumas das figuras utilizadas anteriormente são muito otimistas ou pouco generalizáveis. Além destes problemas, a pesquisa no âmbito legal tem enfatizado a importância de se ter uma taxa de erro para a aceitabilidade judicial não só deste tipo de tarefa de classificação de texto, mas qualquer evidência em geral. Por tudo o que foi citado anteriormente, este artigo propõe o uso de uma medida única na atribuição de autoria. Também são debatidas as implicações associadas à utilização desta medida acima das demais apresentadas por alguns pesquisadores. Além disso, se expõe a importância de apresentar esta medida em combinação com outras condições experimentais relevantes, tais como o número de categorias (ou autores neste contexto).A discussão baseia-se na apresentação de uma série de experimentos de atribuição de autoria que utilizam os textos dos usuários de redes sociais relacionadas com o crime.application/pdfspaUniversidad Santo Tomás Seccional Tunjahttp://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093/1059Ingenio Magno; Vol. 6 (2015): Ingenio Magno Vol. 6-2; 62-742422-23992145-9282Derechos de autor 2016 Ingenio Magnohttp://purl.org/coar/access_right/c_abf2An evaluation measurement in automatic text classification for authorship attributionAn evaluation measurement in automatic text classification for authorship attributionMedida de avaliação na classificação automática de texto para atribuição de autoriainfo:eu-repo/semantics/articlehttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_2df8fbb1classification systemsevaluation measurementsauthorship attributionsistemas de clasificaciónmedidas de evaluaciónatribución de autoríasistemas de classificaçãomedidas de avaliaçãoatribuição de autoria11634/4947oai:repository.usta.edu.co:11634/49472023-07-14 16:33:12.169metadata only accessRepositorio Universidad Santo Tomásnoreply@usta.edu.co