Automatic authorship analysis using Deep neural networks
Authorship analysis helps to study the characteristics that distinguish how two different persons write. Writing style can be extracted in several ways, like using bag of words strategies or handcrafted features. However, with the growing of Internet, we have been able to witness an increase in the...
- Autores:
-
Sierra Loaiza, Sebastian Ernesto
- Tipo de recurso:
- Fecha de publicación:
- 2018
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/76747
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/76747
http://bdigital.unal.edu.co/73495/
- Palabra clave:
- Machine learning
Supervised Learning
Representation Learning
Automatic Authorship Analysis
Authorship Attribution
Author Profiling
Multimodal Author Profiling
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_260825b80962f5df92f9d838df4431b2 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/76747 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Automatic authorship analysis using Deep neural networks |
title |
Automatic authorship analysis using Deep neural networks |
spellingShingle |
Automatic authorship analysis using Deep neural networks Machine learning Supervised Learning Representation Learning Automatic Authorship Analysis Authorship Attribution Author Profiling Multimodal Author Profiling |
title_short |
Automatic authorship analysis using Deep neural networks |
title_full |
Automatic authorship analysis using Deep neural networks |
title_fullStr |
Automatic authorship analysis using Deep neural networks |
title_full_unstemmed |
Automatic authorship analysis using Deep neural networks |
title_sort |
Automatic authorship analysis using Deep neural networks |
dc.creator.fl_str_mv |
Sierra Loaiza, Sebastian Ernesto |
dc.contributor.author.spa.fl_str_mv |
Sierra Loaiza, Sebastian Ernesto |
dc.contributor.spa.fl_str_mv |
González Osorio, Fabio Augusto |
dc.subject.proposal.spa.fl_str_mv |
Machine learning Supervised Learning Representation Learning Automatic Authorship Analysis Authorship Attribution Author Profiling Multimodal Author Profiling |
topic |
Machine learning Supervised Learning Representation Learning Automatic Authorship Analysis Authorship Attribution Author Profiling Multimodal Author Profiling |
description |
Authorship analysis helps to study the characteristics that distinguish how two different persons write. Writing style can be extracted in several ways, like using bag of words strategies or handcrafted features. However, with the growing of Internet, we have been able to witness an increase in the amount of user generated data in social networks like Facebook or Twitter. There is an increasing need in generating automatic methods capable of analyzing the style of a document for tasks like: determining the age of the author, determining the gender of the author, determining the authorship of the document given a set of possible authors, etc. Previous tasks are better known as author profiling and authorship attribution. Although capturing the style of an author can be a challenging task, in this thesis we explore representation learning strategies, in order to take advantage of the large amount of data generated by social media. In this thesis, we learned proper representations for the text inputs that were able to learn such patterns that are only distinguishable to an author (authorship attribution) or a social group of authors (author profiling). Proposed methods were compared using different publicly available datasets using social media data. Both author profiling and authorship attribution tasks are addressed using representation learning techniques such as convolutional neural networks and gated multimodal units. Our unimodal author profiling approach was submitted to the profiling shared task of the laboratory on digital forensics and stylometry(PAN). For authorship attribution, we proposed a convolutional neural network using character n-grams as input. We found that our approach outperformed standard attribution based methods as well as word based convolutional neural networks. For the author profiling task, we proposed one convolutional neural network for unimodal author profiling and adapted a gated multimodal unit for multimodal author profiling. The multimodal nature of user generated content consists of a scenario where the social group of an author can be determined not only using his/her written texts but using also the images that the user shared across the social networks. Gated multimodal units outperformed standard information fusion strategies: early and late fusion. |
publishDate |
2018 |
dc.date.issued.spa.fl_str_mv |
2018-08-31 |
dc.date.accessioned.spa.fl_str_mv |
2020-03-30T06:28:02Z |
dc.date.available.spa.fl_str_mv |
2020-03-30T06:28:02Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/76747 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/73495/ |
url |
https://repositorio.unal.edu.co/handle/unal/76747 http://bdigital.unal.edu.co/73495/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.spa.fl_str_mv |
http://www.ingenieria.unal.edu.co/mindlab/ |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas |
dc.relation.haspart.spa.fl_str_mv |
0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering |
dc.relation.references.spa.fl_str_mv |
Sierra Loaiza, Sebastian Ernesto (2018) Automatic authorship analysis using Deep neural networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/76747/1/SebastianSierraLoaiza.2019.pdf https://repositorio.unal.edu.co/bitstream/unal/76747/2/SebastianSierraLoaiza.2019.pdf.jpg |
bitstream.checksum.fl_str_mv |
a3018b7db3fcc9c64f8ca0b05546038f a1e38afe0c92ba62db3a77f2eb880981 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089658115555328 |
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2González Osorio, Fabio AugustoSierra Loaiza, Sebastian Ernesto3200e0c6-3c38-4fb2-b114-0d5541934ece3002020-03-30T06:28:02Z2020-03-30T06:28:02Z2018-08-31https://repositorio.unal.edu.co/handle/unal/76747http://bdigital.unal.edu.co/73495/Authorship analysis helps to study the characteristics that distinguish how two different persons write. Writing style can be extracted in several ways, like using bag of words strategies or handcrafted features. However, with the growing of Internet, we have been able to witness an increase in the amount of user generated data in social networks like Facebook or Twitter. There is an increasing need in generating automatic methods capable of analyzing the style of a document for tasks like: determining the age of the author, determining the gender of the author, determining the authorship of the document given a set of possible authors, etc. Previous tasks are better known as author profiling and authorship attribution. Although capturing the style of an author can be a challenging task, in this thesis we explore representation learning strategies, in order to take advantage of the large amount of data generated by social media. In this thesis, we learned proper representations for the text inputs that were able to learn such patterns that are only distinguishable to an author (authorship attribution) or a social group of authors (author profiling). Proposed methods were compared using different publicly available datasets using social media data. Both author profiling and authorship attribution tasks are addressed using representation learning techniques such as convolutional neural networks and gated multimodal units. Our unimodal author profiling approach was submitted to the profiling shared task of the laboratory on digital forensics and stylometry(PAN). For authorship attribution, we proposed a convolutional neural network using character n-grams as input. We found that our approach outperformed standard attribution based methods as well as word based convolutional neural networks. For the author profiling task, we proposed one convolutional neural network for unimodal author profiling and adapted a gated multimodal unit for multimodal author profiling. The multimodal nature of user generated content consists of a scenario where the social group of an author can be determined not only using his/her written texts but using also the images that the user shared across the social networks. Gated multimodal units outperformed standard information fusion strategies: early and late fusion.Maestríaapplication/pdfspahttp://www.ingenieria.unal.edu.co/mindlab/Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de Sistemas0 Generalidades / Computer science, information and general works62 Ingeniería y operaciones afines / EngineeringSierra Loaiza, Sebastian Ernesto (2018) Automatic authorship analysis using Deep neural networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.Automatic authorship analysis using Deep neural networksTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMMachine learningSupervised LearningRepresentation LearningAutomatic Authorship AnalysisAuthorship AttributionAuthor ProfilingMultimodal Author ProfilingORIGINALSebastianSierraLoaiza.2019.pdfapplication/pdf2485868https://repositorio.unal.edu.co/bitstream/unal/76747/1/SebastianSierraLoaiza.2019.pdfa3018b7db3fcc9c64f8ca0b05546038fMD51THUMBNAILSebastianSierraLoaiza.2019.pdf.jpgSebastianSierraLoaiza.2019.pdf.jpgGenerated Thumbnailimage/jpeg4270https://repositorio.unal.edu.co/bitstream/unal/76747/2/SebastianSierraLoaiza.2019.pdf.jpga1e38afe0c92ba62db3a77f2eb880981MD52unal/76747oai:repositorio.unal.edu.co:unal/767472023-07-15 23:04:03.358Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |