A Genetic Clustering Algorithm for Automatic Text Summarization

Abstract. Automatic text summarization has become a relevant topic due to the information overload. This automatization aims to help humans and machines to deal with the vast amount of text data (structured and un-structured) offered on the web and deep web. In this research a novel approach for aut...

Full description

Autores:
Suaréz Benjumea, Sebastian
Tipo de recurso:
Fecha de publicación:
2016
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/57548
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/57548
http://bdigital.unal.edu.co/53848/
Palabra clave:
0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Text mining
Genetic algorithm
Clustering algorithm
Automatic text summarization
Single document automatic text summarization
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:Abstract. Automatic text summarization has become a relevant topic due to the information overload. This automatization aims to help humans and machines to deal with the vast amount of text data (structured and un-structured) offered on the web and deep web. In this research a novel approach for automatic extractive text summarization called SENCLUS is presented. Using a genetic clustering algorithm, SENCLUS clusters the sentences as close representation of the text topics using a fitness function based on redundancy and coverage, and applies a scoring function to select the most relevant sentences of each topic to be part of the extractive summary. The approach was validated using the DUC2002 data set and ROUGE summary quality measures. The results shows that the approach is representative against the state of the art methods for extractive automatic text summarization.