Anomalies detection for big data

The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detecti...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2019
Institución:
Universidad Pedagógica y Tecnológica de Colombia
Repositorio:
RiUPTC: Repositorio Institucional UPTC
Idioma:
spa
OAI Identifier:
oai:repositorio.uptc.edu.co:001/14232
Acceso en línea:
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793
https://repositorio.uptc.edu.co/handle/001/14232
Palabra clave:
big data
data mining
detecting anomalies
MapReduce
big data
detección de anomalías
MapReduce
minería de datos
Rights
openAccess
License
http://purl.org/coar/access_right/c_abf2
id REPOUPTC2_a3ee2260ec69c9a0ad59d7c418c49bed
oai_identifier_str oai:repositorio.uptc.edu.co:001/14232
network_acronym_str REPOUPTC2
network_name_str RiUPTC: Repositorio Institucional UPTC
repository_id_str
dc.title.en-US.fl_str_mv Anomalies detection for big data
dc.title.es-ES.fl_str_mv Detección de anomalías en grandes volúmenes de datos
title Anomalies detection for big data
spellingShingle Anomalies detection for big data
big data
data mining
detecting anomalies
MapReduce
big data
detección de anomalías
MapReduce
minería de datos
title_short Anomalies detection for big data
title_full Anomalies detection for big data
title_fullStr Anomalies detection for big data
title_full_unstemmed Anomalies detection for big data
title_sort Anomalies detection for big data
dc.subject.en-US.fl_str_mv big data
data mining
detecting anomalies
MapReduce
topic big data
data mining
detecting anomalies
MapReduce
big data
detección de anomalías
MapReduce
minería de datos
dc.subject.es-ES.fl_str_mv big data
detección de anomalías
MapReduce
minería de datos
description The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detection of anomalies in big data problems. This type of analysis can be very useful the use of data mining techniques because it allows extracting patterns and relationships from large amounts of data. The processing and analysis of these data volumes need tools capable of processing them as Apache Spark and Hadoop. These tools do not have specific algorithms for detecting anomalies. The general objective of the work is to develop a new algorithm for the detection of neighborhood-based anomalies in big data problems. From a comparative study, the KNNW algorithm was selected by its results, in order to design a big data variant. The implementation of the big data algorithm was done in the Apache Spark tool, using the parallel programming paradigm MapReduce. Subsequently different experiments were performed to analyze the behavior of the algorithm with different configurations. Within the experiments, the execution times and the quality of the results were compared between the sequential variant and the big data variant. Getting better results, the big data variant with significant difference. Getting the big data variant, KNNW-BigData, can process large volumes of data. Keywords: big data; data mining; detecting anomalies; MapReduce.
publishDate 2019
dc.date.accessioned.none.fl_str_mv 2024-07-05T19:11:49Z
dc.date.available.none.fl_str_mv 2024-07-05T19:11:49Z
dc.date.none.fl_str_mv 2019-01-10
dc.type.en-US.fl_str_mv research
dc.type.es-ES.fl_str_mv investigación
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.coarversion.spa.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
status_str publishedVersion
dc.identifier.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793
10.19053/01211129.v28.n50.2019.8793
dc.identifier.uri.none.fl_str_mv https://repositorio.uptc.edu.co/handle/001/14232
url https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793
https://repositorio.uptc.edu.co/handle/001/14232
identifier_str_mv 10.19053/01211129.v28.n50.2019.8793
dc.language.none.fl_str_mv spa
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7288
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7504
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7533
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
rights_invalid_str_mv http://purl.org/coar/access_right/c_abf2
dc.format.none.fl_str_mv application/pdf
application/xml
dc.coverage.en-US.fl_str_mv N.A.
dc.coverage.es-ES.fl_str_mv N.A.
dc.publisher.en-US.fl_str_mv Universidad Pedagógica y Tecnológica de Colombia
dc.source.en-US.fl_str_mv Revista Facultad de Ingeniería; Vol. 28 No. 50 (2019); 62-76
dc.source.es-ES.fl_str_mv Revista Facultad de Ingeniería; Vol. 28 Núm. 50 (2019); 62-76
dc.source.none.fl_str_mv 2357-5328
0121-1129
institution Universidad Pedagógica y Tecnológica de Colombia
repository.name.fl_str_mv Repositorio Institucional UPTC
repository.mail.fl_str_mv repositorio.uptc@uptc.edu.co
_version_ 1839633767269924864
spelling 2019-01-102024-07-05T19:11:49Z2024-07-05T19:11:49Zhttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/879310.19053/01211129.v28.n50.2019.8793https://repositorio.uptc.edu.co/handle/001/14232The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detection of anomalies in big data problems. This type of analysis can be very useful the use of data mining techniques because it allows extracting patterns and relationships from large amounts of data. The processing and analysis of these data volumes need tools capable of processing them as Apache Spark and Hadoop. These tools do not have specific algorithms for detecting anomalies. The general objective of the work is to develop a new algorithm for the detection of neighborhood-based anomalies in big data problems. From a comparative study, the KNNW algorithm was selected by its results, in order to design a big data variant. The implementation of the big data algorithm was done in the Apache Spark tool, using the parallel programming paradigm MapReduce. Subsequently different experiments were performed to analyze the behavior of the algorithm with different configurations. Within the experiments, the execution times and the quality of the results were compared between the sequential variant and the big data variant. Getting better results, the big data variant with significant difference. Getting the big data variant, KNNW-BigData, can process large volumes of data. Keywords: big data; data mining; detecting anomalies; MapReduce.El desarrollo de la era digital ha traído como consecuencia un incremento considerable de los volúmenes de datos. A estos grandes volúmenes de datos se les ha denominado big data ya que exceden la capacidad de procesamiento de sistemas de bases de datos convencionales. Diversos sectores consideran varias oportunidades y aplicaciones en la detección de anomalías en problemas de big data.  Para realizar este tipo de análisis puede resultar muy útil el empleo de técnicas de minería de datos porque permiten extraer patrones y relaciones desde grandes cantidades de datos. El procesamiento y análisis de estos volúmenes de datos, necesitan de herramientas capaces de procesarlos como Apache Spark y Hadoop. Estas herramientas no cuentan con algoritmos específicos para la detección de anomalías. El objetivo del trabajo es presentar un nuevo algoritmo para la detección de anomalías basado en vecindad para de problemas big data. A partir de un estudio comparativo se seleccionó el algoritmo KNNW por sus resultados, con el fin de diseñar una variante big data. La implementación del algoritmo big data se realizó en la herramienta Apache Spark, utilizando el paradigma de programación paralela MapReduce. Posteriormente se realizaron diferentes experimentos para analizar el comportamiento del algoritmo con distintas configuraciones. Dentro de los experimentos se compararon los tiempos de ejecución y calidad de los resultados entre la variante secuencial y la variante big data. La variante big data obtuvo mejores resultados con diferencia significativa. Logrando que la variante big data, KNNW-BigData, pueda procesar grandes volúmenes de datos.application/pdfapplication/xmlspaspaUniversidad Pedagógica y Tecnológica de Colombiahttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7288https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7504https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7533Revista Facultad de Ingeniería; Vol. 28 No. 50 (2019); 62-76Revista Facultad de Ingeniería; Vol. 28 Núm. 50 (2019); 62-762357-53280121-1129big datadata miningdetecting anomaliesMapReducebig datadetección de anomalíasMapReduceminería de datosAnomalies detection for big dataDetección de anomalías en grandes volúmenes de datosresearchinvestigacióninfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_2df8fbb1info:eu-repo/semantics/publishedVersionhttp://purl.org/coar/version/c_970fb48d4fbd8a85info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2N.A.N.A.Torres-Domínguez, OmarSabater-Fernández, SamuelBravo-Ilisatigui, LisandraMartin-Rodríguez, DianaGarcía-Borroto, Milton001/14232oai:repositorio.uptc.edu.co:001/142322025-07-18 11:53:14.263metadata.onlyhttps://repositorio.uptc.edu.coRepositorio Institucional UPTCrepositorio.uptc@uptc.edu.co