A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis

Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art app...

Full description

Autores:
Escobar Avila, Javier Ricardo
Tipo de recurso:
Fecha de publicación:
2015
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/54862
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/54862
http://bdigital.unal.edu.co/50071/
Palabra clave:
0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Software categorization
Categorización de software
Bytecode
Non-parametric clustering
Automatic labeling
Clustering no paramétrico
Etiquetado automático
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
id UNACIONAL2_ab13f13737d1e56623a3ac6db53da586
oai_identifier_str oai:repositorio.unal.edu.co:unal/54862
network_acronym_str UNACIONAL2
network_name_str Universidad Nacional de Colombia
repository_id_str
spelling Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Linares Vásquez, MarioAponte Melo, Jairo Hernán (Thesis advisor)71bbd054-4261-47de-8e49-34704a192ef5-1Escobar Avila, Javier Ricardoaad3a8ba-8d81-4af1-9590-750bed8011b73002019-07-02T11:12:31Z2019-07-02T11:12:31Z2015-05-11https://repositorio.unal.edu.co/handle/unal/54862http://bdigital.unal.edu.co/50071/Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art approaches rely on the semantics reflected by identifiers and comments in the source code of the libraries in order to determine their category. However, these approaches fail when the source code of the libraries is not available. In this document, we describe a novel approach for the automatic categorization of Java libraries, which needs only the bytecode of a library in order to determine its category. We show that the approach, based on Dirichlet Process Clustering with automatic labeling, is able to successfully categorize libraries from the Apache Foundation Repository.Resumen. Categorización automática de software es la tarea de asignar categorias o etiquetas a aplicaciones o librerias para representar su funcionalidad. Una asignación correcta de estas categorías es esencial para asegurar que las librerias puedan ser fácilmente consultadas y recuperadas por los desarolladores, cuando estos últimos usan grandes repositorios de software. Técnicas actuales se basan en la información semántica reflejada en los identificadores de código fuente y sus comentarios con el objetivo de determinar su categoría. Sin embargo, estas técnicas no son adecuadas cuando el código fuente de las aplicaciones o librerias no está disponible. En este documento, se describe una nueva técnica para la categorización automática de librerias escritas en Java, la cual necesita solo el bytecode de las librerias para asignarles una categoría. Este documento muestra que la técnica, basada en Dirichlet Process Clustering con etiquetado automático de clusters, es capaz de categorizar exitosamente librerias almacenadas en el repositorio del la Fundación Apache.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de SistemasEscobar Avila, Javier Ricardo (2015) A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis. Maestría thesis, Universidad Nacional de Colombia.0 Generalidades / Computer science, information and general works62 Ingeniería y operaciones afines / EngineeringSoftware categorizationCategorización de softwareBytecodeNon-parametric clusteringAutomatic labelingClustering no paramétricoEtiquetado automáticoA model for automatic categorization of software applications using non-parametric clustering and bytecode analysisTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMORIGINAL1070596644.2015.pdfapplication/pdf3140714https://repositorio.unal.edu.co/bitstream/unal/54862/1/1070596644.2015.pdf9e5137940b1750b64c378ad1eb9c27edMD51THUMBNAIL1070596644.2015.pdf.jpg1070596644.2015.pdf.jpgGenerated Thumbnailimage/jpeg4298https://repositorio.unal.edu.co/bitstream/unal/54862/2/1070596644.2015.pdf.jpg09dff8d656db075e93b9c5d59c8b836dMD52unal/54862oai:repositorio.unal.edu.co:unal/548622023-03-09 23:11:59.181Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co
dc.title.spa.fl_str_mv A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
title A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
spellingShingle A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Software categorization
Categorización de software
Bytecode
Non-parametric clustering
Automatic labeling
Clustering no paramétrico
Etiquetado automático
title_short A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
title_full A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
title_fullStr A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
title_full_unstemmed A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
title_sort A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
dc.creator.fl_str_mv Escobar Avila, Javier Ricardo
dc.contributor.advisor.spa.fl_str_mv Aponte Melo, Jairo Hernán (Thesis advisor)
dc.contributor.author.spa.fl_str_mv Escobar Avila, Javier Ricardo
dc.contributor.spa.fl_str_mv Linares Vásquez, Mario
dc.subject.ddc.spa.fl_str_mv 0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
topic 0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Software categorization
Categorización de software
Bytecode
Non-parametric clustering
Automatic labeling
Clustering no paramétrico
Etiquetado automático
dc.subject.proposal.spa.fl_str_mv Software categorization
Categorización de software
Bytecode
Non-parametric clustering
Automatic labeling
Clustering no paramétrico
Etiquetado automático
description Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art approaches rely on the semantics reflected by identifiers and comments in the source code of the libraries in order to determine their category. However, these approaches fail when the source code of the libraries is not available. In this document, we describe a novel approach for the automatic categorization of Java libraries, which needs only the bytecode of a library in order to determine its category. We show that the approach, based on Dirichlet Process Clustering with automatic labeling, is able to successfully categorize libraries from the Apache Foundation Repository.
publishDate 2015
dc.date.issued.spa.fl_str_mv 2015-05-11
dc.date.accessioned.spa.fl_str_mv 2019-07-02T11:12:31Z
dc.date.available.spa.fl_str_mv 2019-07-02T11:12:31Z
dc.type.spa.fl_str_mv Trabajo de grado - Maestría
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/masterThesis
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TM
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://repositorio.unal.edu.co/handle/unal/54862
dc.identifier.eprints.spa.fl_str_mv http://bdigital.unal.edu.co/50071/
url https://repositorio.unal.edu.co/handle/unal/54862
http://bdigital.unal.edu.co/50071/
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.ispartof.spa.fl_str_mv Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas
Ingeniería de Sistemas
dc.relation.references.spa.fl_str_mv Escobar Avila, Javier Ricardo (2015) A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis. Maestría thesis, Universidad Nacional de Colombia.
dc.rights.spa.fl_str_mv Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv Atribución-NoComercial 4.0 Internacional
Derechos reservados - Universidad Nacional de Colombia
http://creativecommons.org/licenses/by-nc/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.spa.fl_str_mv application/pdf
institution Universidad Nacional de Colombia
bitstream.url.fl_str_mv https://repositorio.unal.edu.co/bitstream/unal/54862/1/1070596644.2015.pdf
https://repositorio.unal.edu.co/bitstream/unal/54862/2/1070596644.2015.pdf.jpg
bitstream.checksum.fl_str_mv 9e5137940b1750b64c378ad1eb9c27ed
09dff8d656db075e93b9c5d59c8b836d
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv repositorio_nal@unal.edu.co
_version_ 1806886380413911040