A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art app...
- Autores:
-
Escobar Avila, Javier Ricardo
- Tipo de recurso:
- Fecha de publicación:
- 2015
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/54862
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/54862
http://bdigital.unal.edu.co/50071/
- Palabra clave:
- 0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Software categorization
Categorización de software
Bytecode
Non-parametric clustering
Automatic labeling
Clustering no paramétrico
Etiquetado automático
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_ab13f13737d1e56623a3ac6db53da586 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/54862 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Linares Vásquez, MarioAponte Melo, Jairo Hernán (Thesis advisor)71bbd054-4261-47de-8e49-34704a192ef5-1Escobar Avila, Javier Ricardoaad3a8ba-8d81-4af1-9590-750bed8011b73002019-07-02T11:12:31Z2019-07-02T11:12:31Z2015-05-11https://repositorio.unal.edu.co/handle/unal/54862http://bdigital.unal.edu.co/50071/Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art approaches rely on the semantics reflected by identifiers and comments in the source code of the libraries in order to determine their category. However, these approaches fail when the source code of the libraries is not available. In this document, we describe a novel approach for the automatic categorization of Java libraries, which needs only the bytecode of a library in order to determine its category. We show that the approach, based on Dirichlet Process Clustering with automatic labeling, is able to successfully categorize libraries from the Apache Foundation Repository.Resumen. Categorización automática de software es la tarea de asignar categorias o etiquetas a aplicaciones o librerias para representar su funcionalidad. Una asignación correcta de estas categorías es esencial para asegurar que las librerias puedan ser fácilmente consultadas y recuperadas por los desarolladores, cuando estos últimos usan grandes repositorios de software. Técnicas actuales se basan en la información semántica reflejada en los identificadores de código fuente y sus comentarios con el objetivo de determinar su categoría. Sin embargo, estas técnicas no son adecuadas cuando el código fuente de las aplicaciones o librerias no está disponible. En este documento, se describe una nueva técnica para la categorización automática de librerias escritas en Java, la cual necesita solo el bytecode de las librerias para asignarles una categoría. Este documento muestra que la técnica, basada en Dirichlet Process Clustering con etiquetado automático de clusters, es capaz de categorizar exitosamente librerias almacenadas en el repositorio del la Fundación Apache.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de SistemasEscobar Avila, Javier Ricardo (2015) A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis. Maestría thesis, Universidad Nacional de Colombia.0 Generalidades / Computer science, information and general works62 Ingeniería y operaciones afines / EngineeringSoftware categorizationCategorización de softwareBytecodeNon-parametric clusteringAutomatic labelingClustering no paramétricoEtiquetado automáticoA model for automatic categorization of software applications using non-parametric clustering and bytecode analysisTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMORIGINAL1070596644.2015.pdfapplication/pdf3140714https://repositorio.unal.edu.co/bitstream/unal/54862/1/1070596644.2015.pdf9e5137940b1750b64c378ad1eb9c27edMD51THUMBNAIL1070596644.2015.pdf.jpg1070596644.2015.pdf.jpgGenerated Thumbnailimage/jpeg4298https://repositorio.unal.edu.co/bitstream/unal/54862/2/1070596644.2015.pdf.jpg09dff8d656db075e93b9c5d59c8b836dMD52unal/54862oai:repositorio.unal.edu.co:unal/548622023-03-09 23:11:59.181Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |
dc.title.spa.fl_str_mv |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis |
title |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis |
spellingShingle |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis 0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering Software categorization Categorización de software Bytecode Non-parametric clustering Automatic labeling Clustering no paramétrico Etiquetado automático |
title_short |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis |
title_full |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis |
title_fullStr |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis |
title_full_unstemmed |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis |
title_sort |
A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis |
dc.creator.fl_str_mv |
Escobar Avila, Javier Ricardo |
dc.contributor.advisor.spa.fl_str_mv |
Aponte Melo, Jairo Hernán (Thesis advisor) |
dc.contributor.author.spa.fl_str_mv |
Escobar Avila, Javier Ricardo |
dc.contributor.spa.fl_str_mv |
Linares Vásquez, Mario |
dc.subject.ddc.spa.fl_str_mv |
0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering |
topic |
0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering Software categorization Categorización de software Bytecode Non-parametric clustering Automatic labeling Clustering no paramétrico Etiquetado automático |
dc.subject.proposal.spa.fl_str_mv |
Software categorization Categorización de software Bytecode Non-parametric clustering Automatic labeling Clustering no paramétrico Etiquetado automático |
description |
Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art approaches rely on the semantics reflected by identifiers and comments in the source code of the libraries in order to determine their category. However, these approaches fail when the source code of the libraries is not available. In this document, we describe a novel approach for the automatic categorization of Java libraries, which needs only the bytecode of a library in order to determine its category. We show that the approach, based on Dirichlet Process Clustering with automatic labeling, is able to successfully categorize libraries from the Apache Foundation Repository. |
publishDate |
2015 |
dc.date.issued.spa.fl_str_mv |
2015-05-11 |
dc.date.accessioned.spa.fl_str_mv |
2019-07-02T11:12:31Z |
dc.date.available.spa.fl_str_mv |
2019-07-02T11:12:31Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/54862 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/50071/ |
url |
https://repositorio.unal.edu.co/handle/unal/54862 http://bdigital.unal.edu.co/50071/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas |
dc.relation.references.spa.fl_str_mv |
Escobar Avila, Javier Ricardo (2015) A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis. Maestría thesis, Universidad Nacional de Colombia. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/54862/1/1070596644.2015.pdf https://repositorio.unal.edu.co/bitstream/unal/54862/2/1070596644.2015.pdf.jpg |
bitstream.checksum.fl_str_mv |
9e5137940b1750b64c378ad1eb9c27ed 09dff8d656db075e93b9c5d59c8b836d |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089779193577472 |