Method for collecting relevant topics from twitter supported by Big Data

There is a fast increase of information and data generation in virtual environments due to microblogging sites such as Twitter, a social network that produces an average of 8, 000 tweets per second, and up to 550 million tweets per day. That's why this and many other social networks are overloa...

Full description

Autores:
silva d, jesus g
Senior Naveda, Alexa
GAMBOA SUAREZ, RAMIRO
Hernández Palma, Hugo
Niebles Nuñez, William
Tipo de recurso:
Article of journal
Fecha de publicación:
2020
Institución:
Corporación Universidad de la Costa
Repositorio:
REDICUC - Repositorio CUC
Idioma:
eng
OAI Identifier:
oai:repositorio.cuc.edu.co:11323/5961
Acceso en línea:
http://hdl.handle.net/11323/5961
https://repositorio.cuc.edu.co/
Palabra clave:
Collection methods
Big Data
Twitter
Rights
openAccess
License
CC0 1.0 Universal
id RCUC2_8aa28bcb9dca2fcb1d58288cbf32622a
oai_identifier_str oai:repositorio.cuc.edu.co:11323/5961
network_acronym_str RCUC2
network_name_str REDICUC - Repositorio CUC
repository_id_str
dc.title.spa.fl_str_mv Method for collecting relevant topics from twitter supported by Big Data
title Method for collecting relevant topics from twitter supported by Big Data
spellingShingle Method for collecting relevant topics from twitter supported by Big Data
Collection methods
Big Data
Twitter
title_short Method for collecting relevant topics from twitter supported by Big Data
title_full Method for collecting relevant topics from twitter supported by Big Data
title_fullStr Method for collecting relevant topics from twitter supported by Big Data
title_full_unstemmed Method for collecting relevant topics from twitter supported by Big Data
title_sort Method for collecting relevant topics from twitter supported by Big Data
dc.creator.fl_str_mv silva d, jesus g
Senior Naveda, Alexa
GAMBOA SUAREZ, RAMIRO
Hernández Palma, Hugo
Niebles Nuñez, William
dc.contributor.author.spa.fl_str_mv silva d, jesus g
Senior Naveda, Alexa
GAMBOA SUAREZ, RAMIRO
Hernández Palma, Hugo
Niebles Nuñez, William
dc.subject.spa.fl_str_mv Collection methods
Big Data
Twitter
topic Collection methods
Big Data
Twitter
description There is a fast increase of information and data generation in virtual environments due to microblogging sites such as Twitter, a social network that produces an average of 8, 000 tweets per second, and up to 550 million tweets per day. That's why this and many other social networks are overloaded with content, making it difficult for users to identify information topics because of the large number of tweets related to different issues. Due to the uncertainty that harms users who created the content, this study proposes a method for inferring the most representative topics that occurred in a time period of 1 day through the selection of user profiles who are experts in sports and politics. It is calculated considering the number of times this topic was mentioned by experts in their timelines. This experiment included a dataset extracted from Twitter, which contains 10, 750 tweets related to sports and 8, 758 tweets related to politics. All tweets were obtained from user timelines selected by the researchers, who were considered experts in their respective subjects due to the content of their tweets. The results show that the effective selection of users, together with the index of relevance implemented for the topics, can help to more easily find important topics in both sport and politics.
publishDate 2020
dc.date.accessioned.none.fl_str_mv 2020-01-30T13:48:37Z
dc.date.available.none.fl_str_mv 2020-01-30T13:48:37Z
dc.date.issued.none.fl_str_mv 2020
dc.type.spa.fl_str_mv Artículo de revista
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coar.spa.fl_str_mv http://purl.org/coar/resource_type/c_6501
dc.type.content.spa.fl_str_mv Text
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/article
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/ART
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
format http://purl.org/coar/resource_type/c_6501
status_str acceptedVersion
dc.identifier.issn.spa.fl_str_mv 1742-6588
1742-6596
dc.identifier.uri.spa.fl_str_mv http://hdl.handle.net/11323/5961
dc.identifier.instname.spa.fl_str_mv Corporación Universidad de la Costa
dc.identifier.reponame.spa.fl_str_mv REDICUC - Repositorio CUC
dc.identifier.repourl.spa.fl_str_mv https://repositorio.cuc.edu.co/
identifier_str_mv 1742-6588
1742-6596
Corporación Universidad de la Costa
REDICUC - Repositorio CUC
url http://hdl.handle.net/11323/5961
https://repositorio.cuc.edu.co/
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.ispartof.spa.fl_str_mv 10.1088/1742-6596/1432/1/012094/pdf
dc.relation.references.spa.fl_str_mv [1] Amelec, V., & Carmen, V. (2015). Relationship Between Variables of Performance Social and Financial of Microfinance Institutions. Advanced Science Letters, 21(6), 1931-1934.
[2] Viloria A., Lis-Gutiérrez JP., Gaitán-Angulo M., Godoy A.R.M., Moreno G.C., Kamatkar S.J. (2018) Methodology for the Design of a Student Pattern Recognition Tool to Facilitate the Teaching - Learning Process Through Knowledge Data Discovery (Big Data). In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham
[3] Guyon, I., Elisseeff, A., An introduction to variable and feature selection, Journal of machine learning research, 3, 2003, pp. 1157-1182.
[4] Kohavi, R., John, G., Wrappers for feature subset selection, Artificial Intelligence Journal, Special issue on relevance, 1997, pp. 273-324.
[5] Abdul Masud, M., Zhexue Huang, J., Wei, C., Wang, J., Khan, I., Zhong, M.: Inice: A New Approach for Identifying the Number of Clusters and Initial Cluster Centres. Inf. Sci. (2018). https://doi.org/10.1016/j.ins.2018.07.034
[6] Nic Newman, William H Dutton, Grant Blank: Social media in the changing ecology of news: The fourth and fifth estates in britain. InternationalJournalofInternetScience,7(1):6–22, 2012.
[7] Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham
[8] Avery E Holton Hsiang Iris Chyi: News and the overloaded consumer:Factors influencing information overload among news consumers. Cyberpsychology, Behavior, and Social Networking, 15(11):619–624, 2012.
[9] Eytan Bakshy, Jake M Hofman, Winter A Mason, Duncan J Watts: Identifying influencers on twitter. In Fourth ACM International Conference on Web Seach and Data Mining (WSDM), 2011.
[10] Kathy Lee, Diana Palsetia, Ramanathan Narayanan, Md Mostofa Ali Patwary, Ankit Agrawal, Alok Choudhary: Twitter trending topic classification. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 251– 258. IEEE, 2011.
[11] Leite, R., Brazdil, P., Decisión tree-based attribute selection via subsampling, Workshop de minería de datos y aprendizaje, VIII Iberamia, Sevilla, Spain, Nov, 2002, pp. 77-83.
[12] Piramuthu, S., Evaluating feature selection methods for learning in data mining applications, Proc. 31st annual Hawaii Int. conf. on system sciences, 1998, pp. 294-301.
[13] Liangjie Hong, Brian D Davison: Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80–88. ACM, 2010.
[14] Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 569– 577. ACM, 2008.
[15] Kira, K., Rendell, L., The feature selection problem: traditional methods and a new algorithm, Tenth nat. conf. on AI, MIT Press, 1992, pp. 129-134.
[16] Viloria, A., & Gaitan-Angulo, M. (2016). Statistical Adjustment Module Advanced Optimizer Planner and SAP Generated the Case of a Food Production Company. Indian Journal Of Science And Technology, 9(47). doi:10.17485/ijst/2016/v9i47/107371
dc.rights.spa.fl_str_mv CC0 1.0 Universal
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/publicdomain/zero/1.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv CC0 1.0 Universal
http://creativecommons.org/publicdomain/zero/1.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.publisher.spa.fl_str_mv Journal of Physics: Conference Series
institution Corporación Universidad de la Costa
bitstream.url.fl_str_mv https://repositorio.cuc.edu.co/bitstream/11323/5961/2/license_rdf
https://repositorio.cuc.edu.co/bitstream/11323/5961/3/license.txt
https://repositorio.cuc.edu.co/bitstream/11323/5961/1/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdf
https://repositorio.cuc.edu.co/bitstream/11323/5961/6/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdf
https://repositorio.cuc.edu.co/bitstream/11323/5961/5/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdf.jpg
https://repositorio.cuc.edu.co/bitstream/11323/5961/7/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdf.txt
bitstream.checksum.fl_str_mv 42fd4ad1e89814f5e4a476b409eb708c
8a4605be74aa9ea9d79846c1fba20a33
e132948bd5228bc26d0d2cc041e53e7d
478faea8a9f7ba905844107cdb20cc76
fceb78b07bdb79a76662a65dbb641e03
35b4ac3a07a2a8eb2a301e64b7bcff40
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Universidad de La Costa
repository.mail.fl_str_mv bdigital@metabiblioteca.com
_version_ 1808400072871772160
spelling silva d, jesus g1f6ba3473d23b4212c7f525ad6a550bcSenior Naveda, Alexaab2ae139a36464b9c2fd3f8950e6a080GAMBOA SUAREZ, RAMIRO1548ca4bdbd9fa1bc24a181486724854Hernández Palma, Hugo5be75fc527c47a185f94ec4869f8c5d2Niebles Nuñez, William7a862d1398c9f4eb93f5e5d9b56e478c2020-01-30T13:48:37Z2020-01-30T13:48:37Z20201742-65881742-6596http://hdl.handle.net/11323/5961Corporación Universidad de la CostaREDICUC - Repositorio CUChttps://repositorio.cuc.edu.co/There is a fast increase of information and data generation in virtual environments due to microblogging sites such as Twitter, a social network that produces an average of 8, 000 tweets per second, and up to 550 million tweets per day. That's why this and many other social networks are overloaded with content, making it difficult for users to identify information topics because of the large number of tweets related to different issues. Due to the uncertainty that harms users who created the content, this study proposes a method for inferring the most representative topics that occurred in a time period of 1 day through the selection of user profiles who are experts in sports and politics. It is calculated considering the number of times this topic was mentioned by experts in their timelines. This experiment included a dataset extracted from Twitter, which contains 10, 750 tweets related to sports and 8, 758 tweets related to politics. All tweets were obtained from user timelines selected by the researchers, who were considered experts in their respective subjects due to the content of their tweets. The results show that the effective selection of users, together with the index of relevance implemented for the topics, can help to more easily find important topics in both sport and politics.engJournal of Physics: Conference Series10.1088/1742-6596/1432/1/012094/pdf[1] Amelec, V., & Carmen, V. (2015). Relationship Between Variables of Performance Social and Financial of Microfinance Institutions. Advanced Science Letters, 21(6), 1931-1934.[2] Viloria A., Lis-Gutiérrez JP., Gaitán-Angulo M., Godoy A.R.M., Moreno G.C., Kamatkar S.J. (2018) Methodology for the Design of a Student Pattern Recognition Tool to Facilitate the Teaching - Learning Process Through Knowledge Data Discovery (Big Data). In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham[3] Guyon, I., Elisseeff, A., An introduction to variable and feature selection, Journal of machine learning research, 3, 2003, pp. 1157-1182.[4] Kohavi, R., John, G., Wrappers for feature subset selection, Artificial Intelligence Journal, Special issue on relevance, 1997, pp. 273-324.[5] Abdul Masud, M., Zhexue Huang, J., Wei, C., Wang, J., Khan, I., Zhong, M.: Inice: A New Approach for Identifying the Number of Clusters and Initial Cluster Centres. Inf. Sci. (2018). https://doi.org/10.1016/j.ins.2018.07.034[6] Nic Newman, William H Dutton, Grant Blank: Social media in the changing ecology of news: The fourth and fifth estates in britain. InternationalJournalofInternetScience,7(1):6–22, 2012.[7] Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham[8] Avery E Holton Hsiang Iris Chyi: News and the overloaded consumer:Factors influencing information overload among news consumers. Cyberpsychology, Behavior, and Social Networking, 15(11):619–624, 2012.[9] Eytan Bakshy, Jake M Hofman, Winter A Mason, Duncan J Watts: Identifying influencers on twitter. In Fourth ACM International Conference on Web Seach and Data Mining (WSDM), 2011.[10] Kathy Lee, Diana Palsetia, Ramanathan Narayanan, Md Mostofa Ali Patwary, Ankit Agrawal, Alok Choudhary: Twitter trending topic classification. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 251– 258. IEEE, 2011.[11] Leite, R., Brazdil, P., Decisión tree-based attribute selection via subsampling, Workshop de minería de datos y aprendizaje, VIII Iberamia, Sevilla, Spain, Nov, 2002, pp. 77-83.[12] Piramuthu, S., Evaluating feature selection methods for learning in data mining applications, Proc. 31st annual Hawaii Int. conf. on system sciences, 1998, pp. 294-301.[13] Liangjie Hong, Brian D Davison: Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80–88. ACM, 2010.[14] Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 569– 577. ACM, 2008.[15] Kira, K., Rendell, L., The feature selection problem: traditional methods and a new algorithm, Tenth nat. conf. on AI, MIT Press, 1992, pp. 129-134.[16] Viloria, A., & Gaitan-Angulo, M. (2016). Statistical Adjustment Module Advanced Optimizer Planner and SAP Generated the Case of a Food Production Company. Indian Journal Of Science And Technology, 9(47). doi:10.17485/ijst/2016/v9i47/107371CC0 1.0 Universalhttp://creativecommons.org/publicdomain/zero/1.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Collection methodsBig DataTwitterMethod for collecting relevant topics from twitter supported by Big DataArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTinfo:eu-repo/semantics/acceptedVersionCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8701https://repositorio.cuc.edu.co/bitstream/11323/5961/2/license_rdf42fd4ad1e89814f5e4a476b409eb708cMD52open accessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://repositorio.cuc.edu.co/bitstream/11323/5961/3/license.txt8a4605be74aa9ea9d79846c1fba20a33MD53open accessORIGINALMethod for Collecting Relevant Topics from Twitter supported by Big.pdfMethod for Collecting Relevant Topics from Twitter supported by Big.pdfapplication/pdf546273https://repositorio.cuc.edu.co/bitstream/11323/5961/1/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdfe132948bd5228bc26d0d2cc041e53e7dMD51open accessMethod for Collecting Relevant Topics from Twitter supported by Big.pdfMethod for Collecting Relevant Topics from Twitter supported by Big.pdfapplication/pdf1406754https://repositorio.cuc.edu.co/bitstream/11323/5961/6/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdf478faea8a9f7ba905844107cdb20cc76MD56open accessTHUMBNAILMethod for Collecting Relevant Topics from Twitter supported by Big.pdf.jpgMethod for Collecting Relevant Topics from Twitter supported by Big.pdf.jpgimage/jpeg28161https://repositorio.cuc.edu.co/bitstream/11323/5961/5/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdf.jpgfceb78b07bdb79a76662a65dbb641e03MD55open accessTEXTMethod for Collecting Relevant Topics from Twitter supported by Big.pdf.txtMethod for Collecting Relevant Topics from Twitter supported by Big.pdf.txttext/plain20310https://repositorio.cuc.edu.co/bitstream/11323/5961/7/Method%20for%20Collecting%20Relevant%20Topics%20from%20Twitter%20supported%20by%20Big.pdf.txt35b4ac3a07a2a8eb2a301e64b7bcff40MD57open access11323/5961oai:repositorio.cuc.edu.co:11323/59612023-12-14 13:03:05.298CC0 1.0 Universal|||http://creativecommons.org/publicdomain/zero/1.0/open accessRepositorio Universidad de La Costabdigital@metabiblioteca.comTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=