An evaluation measurement in automatic text classification for authorship attribution

In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may...

Full description

Autores:: Rico Sulayes, Antonio

Tipo de recurso:

Fecha de publicación:: 2016

Institución:: Universidad Santo Tomás

Repositorio:: Repositorio Institucional USTA

Idioma:: spa

id	SANTTOMAS2_631639298f77a5a37d9bdf427b54cb83
oai_identifier_str	oai:repository.usta.edu.co:11634/4947
network_acronym_str	SANTTOMAS2
network_name_str	Repositorio Institucional USTA
repository_id_str
dc.title.spa.fl_str_mv	An evaluation measurement in automatic text classification for authorship attribution
dc.title.alternative.eng.fl_str_mv	An evaluation measurement in automatic text classification for authorship attribution
dc.title.alternative.por.fl_str_mv	Medida de avaliação na classificação automática de texto para atribuição de autoria
title	An evaluation measurement in automatic text classification for authorship attribution
spellingShingle	An evaluation measurement in automatic text classification for authorship attribution classification systems evaluation measurements authorship attribution sistemas de clasificación medidas de evaluación atribución de autoría sistemas de classificação medidas de avaliação atribuição de autoria
title_short	An evaluation measurement in automatic text classification for authorship attribution
title_full	An evaluation measurement in automatic text classification for authorship attribution
title_fullStr	An evaluation measurement in automatic text classification for authorship attribution
title_full_unstemmed	An evaluation measurement in automatic text classification for authorship attribution
title_sort	An evaluation measurement in automatic text classification for authorship attribution
dc.creator.fl_str_mv	Rico Sulayes, Antonio
dc.contributor.author.spa.fl_str_mv	Rico Sulayes, Antonio
dc.subject.proposal.eng.fl_str_mv	classification systems evaluation measurements authorship attribution
topic	classification systems evaluation measurements authorship attribution sistemas de clasificación medidas de evaluación atribución de autoría sistemas de classificação medidas de avaliação atribuição de autoria
dc.subject.proposal.spa.fl_str_mv	sistemas de clasificación medidas de evaluación atribución de autoría
dc.subject.proposal.por.fl_str_mv	sistemas de classificação medidas de avaliação atribuição de autoria
description	In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.
publishDate	2016
dc.date.issued.spa.fl_str_mv	2016-07-07
dc.type.coarversion.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.drive.none.fl_str_mv	info:eu-repo/semantics/article
dc.identifier.spa.fl_str_mv	http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093
url	http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.spa.fl_str_mv	http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093/1059
dc.relation.citationissue.spa.fl_str_mv	Ingenio Magno; Vol. 6 (2015): Ingenio Magno Vol. 6-2; 62-74 2422-2399 2145-9282
dc.rights.spa.fl_str_mv	Derechos de autor 2016 Ingenio Magno
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv	Derechos de autor 2016 Ingenio Magno http://purl.org/coar/access_right/c_abf2
dc.format.mimetype.spa.fl_str_mv	application/pdf
dc.publisher.spa.fl_str_mv	Universidad Santo Tomás Seccional Tunja
institution	Universidad Santo Tomás
repository.name.fl_str_mv	Repositorio Universidad Santo Tomás
repository.mail.fl_str_mv	noreply@usta.edu.co
_version_	1782026344461238272
spelling	Rico Sulayes, Antonio2016-07-07http://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.In authorship attribution, the task of correctly assigning an anonymized document to an author within a predefined set of subjects, various measurements to evaluate classification systems have been used in the research literature. As will be discussed in this article, some of these measurements may differ diametrically. For research purposes, the evaluation of an automatic text classification system, such as the one that may be used for authorship attribution, may report a number of different performance measurements. However, some of the previously used figures are either too optimistic or lack generalizability. In addition to this issues, law-oriented research has pointed out the importance of having an error rate for the legal admissibility not only of this type of text classification task but of any piece of potential evidence in general. Considering the circumstances, the use of a single measurement in authorship attribution is proposed in this paper. Also, the implications of using this figure instead of others presented by researchers are discussed. At the same time, the importance of presenting this measurement along other relevant experimental settings, such as the number of categories (or authors in this context), is explained. The discussion is supported with the presentation of a set of authorship attribution experiments that utilize data from users of crime-related social media.Na atribuição de autoria, uma tarefa que consiste na atribuição correta de um documento anônimo a um autor que faz parte de um conjunto de indivíduos, diversas medidas para a avaliação de sistemas de classificação tem sido usadas pelos pesquisadores da área. Conforme argumentado neste artigo, algumas destas medidas são diametralmente opostas. Para fins de investigação, a avaliação de um sistema de classificação automática de textos, como o utilizado na atribuição de autoria, pode reportar várias medidas diferentes sobre o desempenho do sistema, porém, algumas das figuras utilizadas anteriormente são muito otimistas ou pouco generalizáveis. Além destes problemas, a pesquisa no âmbito legal tem enfatizado a importância de se ter uma taxa de erro para a aceitabilidade judicial não só deste tipo de tarefa de classificação de texto, mas qualquer evidência em geral. Por tudo o que foi citado anteriormente, este artigo propõe o uso de uma medida única na atribuição de autoria. Também são debatidas as implicações associadas à utilização desta medida acima das demais apresentadas por alguns pesquisadores. Além disso, se expõe a importância de apresentar esta medida em combinação com outras condições experimentais relevantes, tais como o número de categorias (ou autores neste contexto).A discussão baseia-se na apresentação de uma série de experimentos de atribuição de autoria que utilizam os textos dos usuários de redes sociais relacionadas com o crime.application/pdfspaUniversidad Santo Tomás Seccional Tunjahttp://revistas.ustatunja.edu.co/index.php/ingeniomagno/article/view/1093/1059Ingenio Magno; Vol. 6 (2015): Ingenio Magno Vol. 6-2; 62-742422-23992145-9282Derechos de autor 2016 Ingenio Magnohttp://purl.org/coar/access_right/c_abf2An evaluation measurement in automatic text classification for authorship attributionAn evaluation measurement in automatic text classification for authorship attributionMedida de avaliação na classificação automática de texto para atribuição de autoriainfo:eu-repo/semantics/articlehttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_2df8fbb1classification systemsevaluation measurementsauthorship attributionsistemas de clasificaciónmedidas de evaluaciónatribución de autoríasistemas de classificaçãomedidas de avaliaçãoatribuição de autoria11634/4947oai:repository.usta.edu.co:11634/49472023-07-14 16:33:12.169metadata only accessRepositorio Universidad Santo Tomásnoreply@usta.edu.co

An evaluation measurement in automatic text classification for authorship attribution

Publicaciones similares