Automatic authorship analysis using Deep neural networks

Authorship analysis helps to study the characteristics that distinguish how two different persons write. Writing style can be extracted in several ways, like using bag of words strategies or handcrafted features. However, with the growing of Internet, we have been able to witness an increase in the...

Full description

Autores:: Sierra Loaiza, Sebastian Ernesto

Tipo de recurso:

Fecha de publicación:: 2018

Institución:: Universidad Nacional de Colombia

Repositorio:: Universidad Nacional de Colombia

Idioma:: spa

id	UNACIONAL2_260825b80962f5df92f9d838df4431b2
oai_identifier_str	oai:repositorio.unal.edu.co:unal/76747
network_acronym_str	UNACIONAL2
network_name_str	Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv	Automatic authorship analysis using Deep neural networks
title	Automatic authorship analysis using Deep neural networks
spellingShingle	Automatic authorship analysis using Deep neural networks Machine learning Supervised Learning Representation Learning Automatic Authorship Analysis Authorship Attribution Author Profiling Multimodal Author Profiling
title_short	Automatic authorship analysis using Deep neural networks
title_full	Automatic authorship analysis using Deep neural networks
title_fullStr	Automatic authorship analysis using Deep neural networks
title_full_unstemmed	Automatic authorship analysis using Deep neural networks
title_sort	Automatic authorship analysis using Deep neural networks
dc.creator.fl_str_mv	Sierra Loaiza, Sebastian Ernesto
dc.contributor.author.spa.fl_str_mv	Sierra Loaiza, Sebastian Ernesto
dc.contributor.spa.fl_str_mv	González Osorio, Fabio Augusto
dc.subject.proposal.spa.fl_str_mv	Machine learning Supervised Learning Representation Learning Automatic Authorship Analysis Authorship Attribution Author Profiling Multimodal Author Profiling
topic	Machine learning Supervised Learning Representation Learning Automatic Authorship Analysis Authorship Attribution Author Profiling Multimodal Author Profiling
description	Authorship analysis helps to study the characteristics that distinguish how two different persons write. Writing style can be extracted in several ways, like using bag of words strategies or handcrafted features. However, with the growing of Internet, we have been able to witness an increase in the amount of user generated data in social networks like Facebook or Twitter. There is an increasing need in generating automatic methods capable of analyzing the style of a document for tasks like: determining the age of the author, determining the gender of the author, determining the authorship of the document given a set of possible authors, etc. Previous tasks are better known as author profiling and authorship attribution. Although capturing the style of an author can be a challenging task, in this thesis we explore representation learning strategies, in order to take advantage of the large amount of data generated by social media. In this thesis, we learned proper representations for the text inputs that were able to learn such patterns that are only distinguishable to an author (authorship attribution) or a social group of authors (author profiling). Proposed methods were compared using different publicly available datasets using social media data. Both author profiling and authorship attribution tasks are addressed using representation learning techniques such as convolutional neural networks and gated multimodal units. Our unimodal author profiling approach was submitted to the profiling shared task of the laboratory on digital forensics and stylometry(PAN). For authorship attribution, we proposed a convolutional neural network using character n-grams as input. We found that our approach outperformed standard attribution based methods as well as word based convolutional neural networks. For the author profiling task, we proposed one convolutional neural network for unimodal author profiling and adapted a gated multimodal unit for multimodal author profiling. The multimodal nature of user generated content consists of a scenario where the social group of an author can be determined not only using his/her written texts but using also the images that the user shared across the social networks. Gated multimodal units outperformed standard information fusion strategies: early and late fusion.
publishDate	2018
dc.date.issued.spa.fl_str_mv	2018-08-31
dc.date.accessioned.spa.fl_str_mv	2020-03-30T06:28:02Z
dc.date.available.spa.fl_str_mv	2020-03-30T06:28:02Z
dc.type.spa.fl_str_mv	Trabajo de grado - Maestría
dc.type.driver.spa.fl_str_mv	info:eu-repo/semantics/masterThesis
dc.type.version.spa.fl_str_mv	info:eu-repo/semantics/acceptedVersion
dc.type.content.spa.fl_str_mv	Text
dc.type.redcol.spa.fl_str_mv	http://purl.org/redcol/resource_type/TM
status_str	acceptedVersion
dc.identifier.uri.none.fl_str_mv	https://repositorio.unal.edu.co/handle/unal/76747
dc.identifier.eprints.spa.fl_str_mv	http://bdigital.unal.edu.co/73495/
url	https://repositorio.unal.edu.co/handle/unal/76747 http://bdigital.unal.edu.co/73495/
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.spa.fl_str_mv	http://www.ingenieria.unal.edu.co/mindlab/
dc.relation.ispartof.spa.fl_str_mv	Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas
dc.relation.haspart.spa.fl_str_mv	0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering
dc.relation.references.spa.fl_str_mv	Sierra Loaiza, Sebastian Ernesto (2018) Automatic authorship analysis using Deep neural networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.
dc.rights.spa.fl_str_mv	Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv	Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv	http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.mimetype.spa.fl_str_mv	application/pdf
institution	Universidad Nacional de Colombia
bitstream.url.fl_str_mv	https://repositorio.unal.edu.co/bitstream/unal/76747/1/SebastianSierraLoaiza.2019.pdf https://repositorio.unal.edu.co/bitstream/unal/76747/2/SebastianSierraLoaiza.2019.pdf.jpg
bitstream.checksum.fl_str_mv	a3018b7db3fcc9c64f8ca0b05546038f a1e38afe0c92ba62db3a77f2eb880981
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv	repositorio_nal@unal.edu.co
_version_	1814089658115555328
spelling	Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2González Osorio, Fabio AugustoSierra Loaiza, Sebastian Ernesto3200e0c6-3c38-4fb2-b114-0d5541934ece3002020-03-30T06:28:02Z2020-03-30T06:28:02Z2018-08-31https://repositorio.unal.edu.co/handle/unal/76747http://bdigital.unal.edu.co/73495/Authorship analysis helps to study the characteristics that distinguish how two different persons write. Writing style can be extracted in several ways, like using bag of words strategies or handcrafted features. However, with the growing of Internet, we have been able to witness an increase in the amount of user generated data in social networks like Facebook or Twitter. There is an increasing need in generating automatic methods capable of analyzing the style of a document for tasks like: determining the age of the author, determining the gender of the author, determining the authorship of the document given a set of possible authors, etc. Previous tasks are better known as author profiling and authorship attribution. Although capturing the style of an author can be a challenging task, in this thesis we explore representation learning strategies, in order to take advantage of the large amount of data generated by social media. In this thesis, we learned proper representations for the text inputs that were able to learn such patterns that are only distinguishable to an author (authorship attribution) or a social group of authors (author profiling). Proposed methods were compared using different publicly available datasets using social media data. Both author profiling and authorship attribution tasks are addressed using representation learning techniques such as convolutional neural networks and gated multimodal units. Our unimodal author profiling approach was submitted to the profiling shared task of the laboratory on digital forensics and stylometry(PAN). For authorship attribution, we proposed a convolutional neural network using character n-grams as input. We found that our approach outperformed standard attribution based methods as well as word based convolutional neural networks. For the author profiling task, we proposed one convolutional neural network for unimodal author profiling and adapted a gated multimodal unit for multimodal author profiling. The multimodal nature of user generated content consists of a scenario where the social group of an author can be determined not only using his/her written texts but using also the images that the user shared across the social networks. Gated multimodal units outperformed standard information fusion strategies: early and late fusion.Maestríaapplication/pdfspahttp://www.ingenieria.unal.edu.co/mindlab/Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de Sistemas0 Generalidades / Computer science, information and general works62 Ingeniería y operaciones afines / EngineeringSierra Loaiza, Sebastian Ernesto (2018) Automatic authorship analysis using Deep neural networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.Automatic authorship analysis using Deep neural networksTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMMachine learningSupervised LearningRepresentation LearningAutomatic Authorship AnalysisAuthorship AttributionAuthor ProfilingMultimodal Author ProfilingORIGINALSebastianSierraLoaiza.2019.pdfapplication/pdf2485868https://repositorio.unal.edu.co/bitstream/unal/76747/1/SebastianSierraLoaiza.2019.pdfa3018b7db3fcc9c64f8ca0b05546038fMD51THUMBNAILSebastianSierraLoaiza.2019.pdf.jpgSebastianSierraLoaiza.2019.pdf.jpgGenerated Thumbnailimage/jpeg4270https://repositorio.unal.edu.co/bitstream/unal/76747/2/SebastianSierraLoaiza.2019.pdf.jpga1e38afe0c92ba62db3a77f2eb880981MD52unal/76747oai:repositorio.unal.edu.co:unal/767472023-07-15 23:04:03.358Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co

Automatic authorship analysis using Deep neural networks

Publicaciones similares