A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election
Abstract. What people say on social media has turned into a rich source of information to understand social behavior. Sentiment analysis of Twitter data has been widely used to capture trends in public opinion regarding important events such as political elections. However, current research in socia...
- Autores:
-
Cerón-Guzmán, Jhon Adrián
- Tipo de recurso:
- Fecha de publicación:
- 2016
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/56482
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/56482
http://bdigital.unal.edu.co/52257/
- Palabra clave:
- 0 Generalidades / Computer science, information and general works
38 Comercio, comunicaciones, transporte / Commerce, communications and transportation
46 Lenguas española y portuguesa / Specific languages
Social media
Twitter
Spanish tweets
Spammer detection
Lexical normalization
Sentiment analysis
Voting intention inference
Politics
Presidential election
Colombia
tweets en español
Detección de spammers
Normalización léxica
Análisis de sentimientos
Inferencia de intención de votación,
Política
Elecciones presidenciales
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_7f1f983b56c373b864b3a462dc92ac03 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/56482 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election |
title |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election |
spellingShingle |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election 0 Generalidades / Computer science, information and general works 38 Comercio, comunicaciones, transporte / Commerce, communications and transportation 46 Lenguas española y portuguesa / Specific languages Social media Spanish tweets Spammer detection Lexical normalization Sentiment analysis Voting intention inference Politics Presidential election Colombia tweets en español Detección de spammers Normalización léxica Análisis de sentimientos Inferencia de intención de votación, Política Elecciones presidenciales |
title_short |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election |
title_full |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election |
title_fullStr |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election |
title_full_unstemmed |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election |
title_sort |
A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election |
dc.creator.fl_str_mv |
Cerón-Guzmán, Jhon Adrián |
dc.contributor.advisor.spa.fl_str_mv |
León-Guzmán, Elizabeth (Thesis advisor) |
dc.contributor.author.spa.fl_str_mv |
Cerón-Guzmán, Jhon Adrián |
dc.subject.ddc.spa.fl_str_mv |
0 Generalidades / Computer science, information and general works 38 Comercio, comunicaciones, transporte / Commerce, communications and transportation 46 Lenguas española y portuguesa / Specific languages |
topic |
0 Generalidades / Computer science, information and general works 38 Comercio, comunicaciones, transporte / Commerce, communications and transportation 46 Lenguas española y portuguesa / Specific languages Social media Spanish tweets Spammer detection Lexical normalization Sentiment analysis Voting intention inference Politics Presidential election Colombia tweets en español Detección de spammers Normalización léxica Análisis de sentimientos Inferencia de intención de votación, Política Elecciones presidenciales |
dc.subject.proposal.spa.fl_str_mv |
Social media Spanish tweets Spammer detection Lexical normalization Sentiment analysis Voting intention inference Politics Presidential election Colombia tweets en español Detección de spammers Normalización léxica Análisis de sentimientos Inferencia de intención de votación, Política Elecciones presidenciales |
description |
Abstract. What people say on social media has turned into a rich source of information to understand social behavior. Sentiment analysis of Twitter data has been widely used to capture trends in public opinion regarding important events such as political elections. However, current research in social media analysis in political domains faces two major problems, namely: sentiment analysis methods implemented are often too simple, and most of the researches have assumed that all users and their tweets are trustworthy. This thesis is aimed at dealing with these problems to achieve more reliable public opinion measurements. Colombia 2014 presidential election was proposed as case study. First, a research on social spammer detection on Twitter was carried out by following machine learning approaches to distinguish spammer accounts from non-spammer ones. Because of the brevity of tweets and the widespread use of mobile devices, Twitter is also a rich source of noisy data containing many non-standard word forms. Since this is a task that exploits the large amount of user-generated texts, the performance of sentiment analysis may drop significantly if several lexical variation phenomena are not dealt with. For that reason, a lexical normalization system of Spanish tweets was developed to improve the quality of natural language analysis, using finite-state transducers and statistical language modeling. Lastly, a sentiment analysis system of Spanish tweets was developed by implementing a supervised classification approach. The system was applied in the Colombian election to infer voting intention. Experimental results highlight the importance of denoising in Twitter data to achieve more reliable public opinion measurements. Together with this, results show the potential of social media analysis to infer vote share, obtaining the lowest mean absolute error and correctly ranking the highest-polling candidates in the first round election. However, such an important method cannot be put forward as a substitute of the traditional polling. |
publishDate |
2016 |
dc.date.issued.spa.fl_str_mv |
2016-05-25 |
dc.date.accessioned.spa.fl_str_mv |
2019-07-02T11:53:15Z |
dc.date.available.spa.fl_str_mv |
2019-07-02T11:53:15Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/56482 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/52257/ |
url |
https://repositorio.unal.edu.co/handle/unal/56482 http://bdigital.unal.edu.co/52257/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Departamento de Ingeniería de Sistemas e Industrial |
dc.relation.references.spa.fl_str_mv |
Cerón-Guzmán, Jhon Adrián (2016) A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/56482/1/johnadrianceronguzman_2016.pdf https://repositorio.unal.edu.co/bitstream/unal/56482/2/johnadrianceronguzman_2016.pdf.jpg |
bitstream.checksum.fl_str_mv |
df3b7496b1fb8aed388fb00932d0710e e66bade35c26b267b675b5bf1ab73262 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089515867832320 |
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2León-Guzmán, Elizabeth (Thesis advisor)b5d17921-2f0a-47ab-b715-e016b7eb773f-1Cerón-Guzmán, Jhon Adriánf7766705-a7a4-40ae-973f-c3955e23b2503002019-07-02T11:53:15Z2019-07-02T11:53:15Z2016-05-25https://repositorio.unal.edu.co/handle/unal/56482http://bdigital.unal.edu.co/52257/Abstract. What people say on social media has turned into a rich source of information to understand social behavior. Sentiment analysis of Twitter data has been widely used to capture trends in public opinion regarding important events such as political elections. However, current research in social media analysis in political domains faces two major problems, namely: sentiment analysis methods implemented are often too simple, and most of the researches have assumed that all users and their tweets are trustworthy. This thesis is aimed at dealing with these problems to achieve more reliable public opinion measurements. Colombia 2014 presidential election was proposed as case study. First, a research on social spammer detection on Twitter was carried out by following machine learning approaches to distinguish spammer accounts from non-spammer ones. Because of the brevity of tweets and the widespread use of mobile devices, Twitter is also a rich source of noisy data containing many non-standard word forms. Since this is a task that exploits the large amount of user-generated texts, the performance of sentiment analysis may drop significantly if several lexical variation phenomena are not dealt with. For that reason, a lexical normalization system of Spanish tweets was developed to improve the quality of natural language analysis, using finite-state transducers and statistical language modeling. Lastly, a sentiment analysis system of Spanish tweets was developed by implementing a supervised classification approach. The system was applied in the Colombian election to infer voting intention. Experimental results highlight the importance of denoising in Twitter data to achieve more reliable public opinion measurements. Together with this, results show the potential of social media analysis to infer vote share, obtaining the lowest mean absolute error and correctly ranking the highest-polling candidates in the first round election. However, such an important method cannot be put forward as a substitute of the traditional polling.Lo que las personas dicen en plataformas de social media se ha convertido en una fuente valiosa de información para entender el comportamiento social. Análisis de sentimientos de datos de Twitter se ha utilizado ámpliamente para capturar tendencias en la opinión pública con respecto a temas importantes como los son las elecciones políticas. Sin embargo, la investigación actual sobre aplicaciones de análisis de social media en contextos políticos enfrenta dos grandes problemas, a saber: se han empleado los métodos más simples de análisis de sentimientos, y se ha asumido que todos los usuarios y sus tweets son dignos de confianza. Esta tesis tiene como objetivo hacer frente a estos problemas con el fin de alcanzar mediciones más fiables de la opinión pública. Las elecciones presidenciales en Colombia de 2014 se propusieron como caso de estudio. En primer lugar, se llevó a cabo una investigación sobre la detección de spammers en Twitter, implementando enfoques de aprendizaje automático para distinguir cuentas spammers de las que no lo son. Debido a la brevedad de los tweets y al ámplio uso de dispositivos móviles, Twitter se ha convertido en una fuente de datos ruídosos que contiene muchas formas de palabra que no son estándar. Al tratarse de una tarea que explota la gran cantidad de texto generado por los usuarios, el desempeño de análisis de sentimientos podría degradarse si no se abordan varios fenómenos de variación léxica presentes en los tweets. Por esta razón, se desarrolló un sistema de normalización léxica de tweets en español, el cual emplea transductores de estado finito y modelado de lenguaje estadístico, a fin de mejorar la calidad del análisis del lenguaje natural. Por último, se desarrolló un sistema de análisis de sentimientos de tweets en español siguiendo un enfoque de clasificación supervisada, el cual se aplicó en el contexto de las citadas elecciones para realizar inferencia de intención de voto. Los resultados experimentales resaltan la importancia de eliminar el ruído de los datos de Twitter que se utilizan para realizar mediciones de la opinión pública. Junto con esto, los resultados muestran el potencial del análisis de social media para inferir la distribución de los votos, obteniendo la media del error absoluto más baja y correctamente clasificando los candidatos de mayor votación en la primera vuelta electoral. Sin embargo, dicho método no puede plantearse como un sustituto del sondeo electoral tradicional.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e IndustrialDepartamento de Ingeniería de Sistemas e IndustrialCerón-Guzmán, Jhon Adrián (2016) A Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential Election. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.0 Generalidades / Computer science, information and general works38 Comercio, comunicaciones, transporte / Commerce, communications and transportation46 Lenguas española y portuguesa / Specific languagesSocial mediaTwitterSpanish tweetsSpammer detectionLexical normalizationSentiment analysisVoting intention inferencePoliticsPresidential electionColombiatweets en españolDetección de spammersNormalización léxicaAnálisis de sentimientosInferencia de intención de votación,PolíticaElecciones presidencialesA Sentiment Analysis Model of Spanish Tweets. Case Study: Colombia 2014 Presidential ElectionTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMORIGINALjohnadrianceronguzman_2016.pdfapplication/pdf548752https://repositorio.unal.edu.co/bitstream/unal/56482/1/johnadrianceronguzman_2016.pdfdf3b7496b1fb8aed388fb00932d0710eMD51THUMBNAILjohnadrianceronguzman_2016.pdf.jpgjohnadrianceronguzman_2016.pdf.jpgGenerated Thumbnailimage/jpeg4025https://repositorio.unal.edu.co/bitstream/unal/56482/2/johnadrianceronguzman_2016.pdf.jpge66bade35c26b267b675b5bf1ab73262MD52unal/56482oai:repositorio.unal.edu.co:unal/564822024-03-23 23:08:40.482Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |