A model for automatic categorization of software applications using non-parametric clustering and bytecode analysis
Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art app...
- Autores:
-
Escobar Avila, Javier Ricardo
- Tipo de recurso:
- Fecha de publicación:
- 2015
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/54862
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/54862
http://bdigital.unal.edu.co/50071/
- Palabra clave:
- 0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Software categorization
Categorización de software
Bytecode
Non-parametric clustering
Automatic labeling
Clustering no paramétrico
Etiquetado automático
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
Summary: | Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. State of the art approaches rely on the semantics reflected by identifiers and comments in the source code of the libraries in order to determine their category. However, these approaches fail when the source code of the libraries is not available. In this document, we describe a novel approach for the automatic categorization of Java libraries, which needs only the bytecode of a library in order to determine its category. We show that the approach, based on Dirichlet Process Clustering with automatic labeling, is able to successfully categorize libraries from the Apache Foundation Repository. |
---|