An evaluation measurement in automatic text classification for authorship attribution
In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may...
- Autores:
-
Rico Sulayes, Antonio
- Tipo de recurso:
- Fecha de publicación:
- 2016
- Institución:
- Universidad Santo Tomás
- Repositorio:
- Repositorio Institucional USTA
- Idioma:
- spa
- OAI Identifier:
- oai:repository.usta.edu.co:11634/4947
- Palabra clave:
- classification systems
evaluation measurements
authorship attribution
sistemas de clasificación
medidas de evaluación
atribución de autoría
sistemas de classificação
medidas de avaliação
atribuição de autoria
- Rights
- License
- Derechos de autor 2016 Ingenio Magno
id |
SANTTOMAS2_631639298f77a5a37d9bdf427b54cb83 |
---|---|
oai_identifier_str |
oai:repository.usta.edu.co:11634/4947 |
network_acronym_str |
SANTTOMAS2 |
network_name_str |
Repositorio Institucional USTA |
repository_id_str |
|
dc.title.spa.fl_str_mv |
An evaluation measurement in automatic text classification for authorship attribution |
dc.title.alternative.eng.fl_str_mv |
An evaluation measurement in automatic text classification for authorship attribution |
dc.title.alternative.por.fl_str_mv |
Medida de avaliação na classificação automática de texto para atribuição de autoria |
title |
An evaluation measurement in automatic text classification for authorship attribution |
spellingShingle |
An evaluation measurement in automatic text classification for authorship attribution classification systems evaluation measurements authorship attribution sistemas de clasificación medidas de evaluación atribución de autoría sistemas de classificação medidas de avaliação atribuição de autoria |
title_short |
An evaluation measurement in automatic text classification for authorship attribution |
title_full |
An evaluation measurement in automatic text classification for authorship attribution |
title_fullStr |
An evaluation measurement in automatic text classification for authorship attribution |
title_full_unstemmed |
An evaluation measurement in automatic text classification for authorship attribution |
title_sort |
An evaluation measurement in automatic text classification for authorship attribution |
dc.creator.fl_str_mv |
Rico Sulayes, Antonio |
dc.contributor.author.spa.fl_str_mv |
Rico Sulayes, Antonio |
dc.subject.proposal.eng.fl_str_mv |
classification systems evaluation measurements authorship attribution |
topic |
classification systems evaluation measurements authorship attribution sistemas de clasificación medidas de evaluación atribución de autoría sistemas de classificação medidas de avaliação atribuição de autoria |
dc.subject.proposal.spa.fl_str_mv |
sistemas de clasificación medidas de evaluación atribución de autoría |
dc.subject.proposal.por.fl_str_mv |
sistemas de classificação medidas de avaliação atribuição de autoria |
description |
In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media. |
publishDate |
2016 |
dc.date.issued.spa.fl_str_mv |
2016-07-07 |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.drive.none.fl_str_mv |
info:eu-repo/semantics/article |
dc.identifier.spa.fl_str_mv |
http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093 |
url |
http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093 |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.spa.fl_str_mv |
http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093/1059 |
dc.relation.citationissue.spa.fl_str_mv |
Ingenio Magno; Vol. 6 (2015): Ingenio Magno Vol. 6-2; 62-74 2422-2399 2145-9282 |
dc.rights.spa.fl_str_mv |
Derechos de autor 2016 Ingenio Magno |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Derechos de autor 2016 Ingenio Magno http://purl.org/coar/access_right/c_abf2 |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
Universidad Santo Tomás Seccional Tunja |
institution |
Universidad Santo Tomás |
repository.name.fl_str_mv |
Repositorio Universidad Santo Tomás |
repository.mail.fl_str_mv |
noreply@usta.edu.co |
_version_ |
1782026344461238272 |
spelling |
Rico Sulayes, Antonio2016-07-07http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.Na atribuição de autoria, uma tarefa que consiste na atribuição correta de um documento anônimo a um autor que faz parte de um conjunto de indivíduos, diversas medidas para a avaliação de sistemas de classificação tem sido usadas pelos pesquisadores da área. Conforme argumentado neste artigo, algumas destas medidas são diametralmente opostas. Para fins de investigação, a avaliação de um sistema de classificação automática de textos, como o utilizado na atribuição de autoria, pode reportar várias medidas diferentes sobre o desempenho do sistema, porém, algumas das figuras utilizadas anteriormente são muito otimistas ou pouco generalizáveis. Além destes problemas, a pesquisa no âmbito legal tem enfatizado a importância de se ter uma taxa de erro para a aceitabilidade judicial não só deste tipo de tarefa de classificação de texto, mas qualquer evidência em geral. Por tudo o que foi citado anteriormente, este artigo propõe o uso de uma medida única na atribuição de autoria. Também são debatidas as implicações associadas à utilização desta medida acima das demais apresentadas por alguns pesquisadores. Além disso, se expõe a importância de apresentar esta medida em combinação com outras condições experimentais relevantes, tais como o número de categorias (ou autores neste contexto).A discussão baseia-se na apresentação de uma série de experimentos de atribuição de autoria que utilizam os textos dos usuários de redes sociais relacionadas com o crime.application/pdfspaUniversidad Santo Tomás Seccional Tunjahttp://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093/1059Ingenio Magno; Vol. 6 (2015): Ingenio Magno Vol. 6-2; 62-742422-23992145-9282Derechos de autor 2016 Ingenio Magnohttp://purl.org/coar/access_right/c_abf2An evaluation measurement in automatic text classification for authorship attributionAn evaluation measurement in automatic text classification for authorship attributionMedida de avaliação na classificação automática de texto para atribuição de autoriainfo:eu-repo/semantics/articlehttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_2df8fbb1classification systemsevaluation measurementsauthorship attributionsistemas de clasificaciónmedidas de evaluaciónatribución de autoríasistemas de classificaçãomedidas de avaliaçãoatribuição de autoria11634/4947oai:repository.usta.edu.co:11634/49472023-07-14 16:33:12.169metadata only accessRepositorio Universidad Santo Tomásnoreply@usta.edu.co |