An information retrieval strategy for large multimodal data collections involving source code and natural language
Source code repositories store data from software products. Among this data we can find the evolution of the source code, requirements, bugs and communication between developers. Source code repositories have been growing rapidly in the recent years andwith them the need of extracting information fr...
- Autores:
-
Baquero Vargas, Juan Felipe
- Tipo de recurso:
- Fecha de publicación:
- 2019
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/76556
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/76556
http://bdigital.unal.edu.co/73062/
- Palabra clave:
- Stack Overflow
source code analysis
Duplication detection
Predicting programming language
Análisis de código fuente
Detección de duplicados
Predecir el lenguaje de programación
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_58476dc6cf5ecfffe596250f7fa4e6c7 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/76556 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
An information retrieval strategy for large multimodal data collections involving source code and natural language |
title |
An information retrieval strategy for large multimodal data collections involving source code and natural language |
spellingShingle |
An information retrieval strategy for large multimodal data collections involving source code and natural language Stack Overflow source code analysis Duplication detection Predicting programming language Análisis de código fuente Detección de duplicados Predecir el lenguaje de programación |
title_short |
An information retrieval strategy for large multimodal data collections involving source code and natural language |
title_full |
An information retrieval strategy for large multimodal data collections involving source code and natural language |
title_fullStr |
An information retrieval strategy for large multimodal data collections involving source code and natural language |
title_full_unstemmed |
An information retrieval strategy for large multimodal data collections involving source code and natural language |
title_sort |
An information retrieval strategy for large multimodal data collections involving source code and natural language |
dc.creator.fl_str_mv |
Baquero Vargas, Juan Felipe |
dc.contributor.author.spa.fl_str_mv |
Baquero Vargas, Juan Felipe |
dc.contributor.spa.fl_str_mv |
González Osorio, Fabio Augusto Restrepo Calle, Felipe |
dc.subject.proposal.spa.fl_str_mv |
Stack Overflow source code analysis Duplication detection Predicting programming language Análisis de código fuente Detección de duplicados Predecir el lenguaje de programación |
topic |
Stack Overflow source code analysis Duplication detection Predicting programming language Análisis de código fuente Detección de duplicados Predecir el lenguaje de programación |
description |
Source code repositories store data from software products. Among this data we can find the evolution of the source code, requirements, bugs and communication between developers. Source code repositories have been growing rapidly in the recent years andwith them the need of extracting information from them. An interesting source code repository that is growing both in usage and information is Stack Overflow (SO), this web site provides one of the biggest Question Answering places used by thousands of developers everyday. In SO the developers can ask any question related to a programming issue and it will be answered by other users. We can find a source code repository with both source code and natural language with thousands of samples and the possibility of combining both sources of information to extract useful and not eye-noticeable information from it. In this thesis, we explore how to represent source code and natural language and how to combine these representations. We try to solve the task of understanding how users in SO talk about the programming language, how similar these programming languages are among them based on how users talk about them, and finally, we provide tools on the building of an information retrieval strategy by identifying duplicated post. |
publishDate |
2019 |
dc.date.issued.spa.fl_str_mv |
2019-07-03 |
dc.date.accessioned.spa.fl_str_mv |
2020-03-30T06:22:16Z |
dc.date.available.spa.fl_str_mv |
2020-03-30T06:22:16Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/76556 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/73062/ |
url |
https://repositorio.unal.edu.co/handle/unal/76556 http://bdigital.unal.edu.co/73062/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas |
dc.relation.haspart.spa.fl_str_mv |
0 Generalidades / Computer science, information and general works 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering |
dc.relation.references.spa.fl_str_mv |
Baquero Vargas, Juan Felipe (2019) An information retrieval strategy for large multimodal data collections involving source code and natural language. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/76556/1/Tesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdf https://repositorio.unal.edu.co/bitstream/unal/76556/2/Tesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdf.jpg |
bitstream.checksum.fl_str_mv |
b4d7f67ed4300ad6e047fbdbc37a1542 4d9ee26fac36bc8b0a9dfe2f9119d586 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089676563152896 |
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2González Osorio, Fabio AugustoRestrepo Calle, FelipeBaquero Vargas, Juan Felipe941ae0d4-da37-438e-b049-aa10a5a649483002020-03-30T06:22:16Z2020-03-30T06:22:16Z2019-07-03https://repositorio.unal.edu.co/handle/unal/76556http://bdigital.unal.edu.co/73062/Source code repositories store data from software products. Among this data we can find the evolution of the source code, requirements, bugs and communication between developers. Source code repositories have been growing rapidly in the recent years andwith them the need of extracting information from them. An interesting source code repository that is growing both in usage and information is Stack Overflow (SO), this web site provides one of the biggest Question Answering places used by thousands of developers everyday. In SO the developers can ask any question related to a programming issue and it will be answered by other users. We can find a source code repository with both source code and natural language with thousands of samples and the possibility of combining both sources of information to extract useful and not eye-noticeable information from it. In this thesis, we explore how to represent source code and natural language and how to combine these representations. We try to solve the task of understanding how users in SO talk about the programming language, how similar these programming languages are among them based on how users talk about them, and finally, we provide tools on the building of an information retrieval strategy by identifying duplicated post.Los repositorios de software almacenan datos sobre los productos de software, datos relacionados con la evolución de código fuente, requerimientos de software, reporte de bugs y comunicación entre desarrolladores. Los repositorios de software han crecido rápidamente en los últimos años y con ellos la necesidad de extraer información significativa de ellos. Un repositorio de software intersante es Stack Overflow(SO), este sitio web es uno de los sitios de Question Answering más grandes y usados por miles de desarrolladores de sofware en su día a día. En SO los desarrollares pueden preguntar cualquier duda relacionada con programación y software que será respondida por otros usuarios. Como SO, existen muchos repositorios de software con código fuente y texto con millones de ejemplares y la posibilidad de combinar ambas fuentes para extraer información de ellos que no es visible a simple vista. En este trabajo de tesis, exploramos como representar código fuente y lenguaje natural y cómo combinar estas representaciones. Intentamos resolver la tarea de entender como los usuarios de SO hablan sobre un lenguage de programación, que tan similares son los lenguajes de programación basados en cómo los usuarios hablen sobre ellos y, finalmente, proporcionar herramientas para construir una estrategia de recuperación de información para identificar post duplicados.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de Sistemas0 Generalidades / Computer science, information and general works6 Tecnología (ciencias aplicadas) / Technology62 Ingeniería y operaciones afines / EngineeringBaquero Vargas, Juan Felipe (2019) An information retrieval strategy for large multimodal data collections involving source code and natural language. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.An information retrieval strategy for large multimodal data collections involving source code and natural languageTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMStack Overflowsource code analysisDuplication detectionPredicting programming languageAnálisis de código fuenteDetección de duplicadosPredecir el lenguaje de programaciónORIGINALTesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdfapplication/pdf1145354https://repositorio.unal.edu.co/bitstream/unal/76556/1/Tesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdfb4d7f67ed4300ad6e047fbdbc37a1542MD51THUMBNAILTesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdf.jpgTesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdf.jpgGenerated Thumbnailimage/jpeg4697https://repositorio.unal.edu.co/bitstream/unal/76556/2/Tesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdf.jpg4d9ee26fac36bc8b0a9dfe2f9119d586MD52unal/76556oai:repositorio.unal.edu.co:unal/765562024-07-14 01:01:05.013Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |