Automatic multi-label categorization of Java applications using Dependency graphs

Automatic approaches for categorization of software repositories are increasingly gaining acceptance because they reduce manual effort and can produce high quality results. Most of the existing approaches have strongly relied on supervised machine learning {which requires a set of predefined categor...

Full description

Autores:: Vargas Baldrich, Santiago

Tipo de recurso:

Fecha de publicación:: 2015

Institución:: Universidad Nacional de Colombia

Repositorio:: Universidad Nacional de Colombia

Idioma:: spa

Description
Summary:	Automatic approaches for categorization of software repositories are increasingly gaining acceptance because they reduce manual effort and can produce high quality results. Most of the existing approaches have strongly relied on supervised machine learning {which requires a set of predefined categories to be used as training data{ and have used source code, comments, API Calls and other sources to obtain information about the projects to be categorized. We consider that existing approaches have weaknesses that can have major implications on the categorization results and haven't been solved at the same time, namely the assumption of non-restricted access to source code and the use of predefined sets of categories. Therefore, we present Sally: a novel, unsupervised and multi-label automatic categorization model that is able to obtain meaningful categories without depending on access to source code nor the existence of predefined categories by leveraging on information obtained from the projects in the categorization corpus and the dependency relations between them. We performed two experiments in which we compared Sally to the categorization strategies of two widely used websites and to MUDABlue, a categorization model proposed by Kawaguchi et al. that we consider to be a good baseline. Additionally, we assessed the proposed model by conducting a survey with 14 developers with a wide range of programming experience and developed a web application to make the proposed model available to potential users.

Automatic multi-label categorization of Java applications using Dependency graphs

Publicaciones similares