Anomalies detection for big data

The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detecti...

Full description

Autores:

Tipo de recurso:: Article of investigation

Fecha de publicación:: 2019

Institución:: Universidad Pedagógica y Tecnológica de Colombia

Repositorio:: RiUPTC: Repositorio Institucional UPTC

Idioma:: spa

id	REPOUPTC2_a3ee2260ec69c9a0ad59d7c418c49bed
oai_identifier_str	oai:repositorio.uptc.edu.co:001/14232
network_acronym_str	REPOUPTC2
network_name_str	RiUPTC: Repositorio Institucional UPTC
repository_id_str
dc.title.en-US.fl_str_mv	Anomalies detection for big data
dc.title.es-ES.fl_str_mv	Detección de anomalías en grandes volúmenes de datos
title	Anomalies detection for big data
spellingShingle	Anomalies detection for big data big data data mining detecting anomalies MapReduce big data detección de anomalías MapReduce minería de datos
title_short	Anomalies detection for big data
title_full	Anomalies detection for big data
title_fullStr	Anomalies detection for big data
title_full_unstemmed	Anomalies detection for big data
title_sort	Anomalies detection for big data
dc.subject.en-US.fl_str_mv	big data data mining detecting anomalies MapReduce
topic	big data data mining detecting anomalies MapReduce big data detección de anomalías MapReduce minería de datos
dc.subject.es-ES.fl_str_mv	big data detección de anomalías MapReduce minería de datos
description	The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detection of anomalies in big data problems. This type of analysis can be very useful the use of data mining techniques because it allows extracting patterns and relationships from large amounts of data. The processing and analysis of these data volumes need tools capable of processing them as Apache Spark and Hadoop. These tools do not have specific algorithms for detecting anomalies. The general objective of the work is to develop a new algorithm for the detection of neighborhood-based anomalies in big data problems. From a comparative study, the KNNW algorithm was selected by its results, in order to design a big data variant. The implementation of the big data algorithm was done in the Apache Spark tool, using the parallel programming paradigm MapReduce. Subsequently different experiments were performed to analyze the behavior of the algorithm with different configurations. Within the experiments, the execution times and the quality of the results were compared between the sequential variant and the big data variant. Getting better results, the big data variant with significant difference. Getting the big data variant, KNNW-BigData, can process large volumes of data. Keywords: big data; data mining; detecting anomalies; MapReduce.
publishDate	2019
dc.date.accessioned.none.fl_str_mv	2024-07-05T19:11:49Z
dc.date.available.none.fl_str_mv	2024-07-05T19:11:49Z
dc.date.none.fl_str_mv	2019-01-10
dc.type.none.fl_str_mv	Artículo de revista
dc.type.coar.none.fl_str_mv	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.driver.none.fl_str_mv	info:eu-repo/semantics/article
dc.type.version.spa.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.content.none.fl_str_mv	Text
dc.type.redcol.none.fl_str_mv	https://purl.org/redcol/resource_type/ART
dc.type.coarversion.spa.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
format	http://purl.org/coar/resource_type/c_2df8fbb1
status_str	publishedVersion
dc.identifier.none.fl_str_mv	https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793 10.19053/01211129.v28.n50.2019.8793
dc.identifier.uri.none.fl_str_mv	https://repositorio.uptc.edu.co/handle/001/14232
url	https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793 https://repositorio.uptc.edu.co/handle/001/14232
identifier_str_mv	10.19053/01211129.v28.n50.2019.8793
dc.language.none.fl_str_mv	spa
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.none.fl_str_mv	https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7288 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7504 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7533
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv	http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://purl.org/coar/access_right/c_abf2
dc.format.none.fl_str_mv	application/pdf application/xml
dc.coverage.en-US.fl_str_mv	N.A.
dc.coverage.es-ES.fl_str_mv	N.A.
dc.publisher.en-US.fl_str_mv	Universidad Pedagógica y Tecnológica de Colombia
dc.source.en-US.fl_str_mv	Revista Facultad de Ingeniería; Vol. 28 No. 50 (2019); 62-76
dc.source.es-ES.fl_str_mv	Revista Facultad de Ingeniería; Vol. 28 Núm. 50 (2019); 62-76
dc.source.none.fl_str_mv	2357-5328 0121-1129
institution	Universidad Pedagógica y Tecnológica de Colombia
repository.name.fl_str_mv	Repositorio Institucional UPTC
repository.mail.fl_str_mv	repositorio.uptc@uptc.edu.co
_version_	1858227311128608768
spelling	2019-01-102024-07-05T19:11:49Z2024-07-05T19:11:49Zhttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/879310.19053/01211129.v28.n50.2019.8793https://repositorio.uptc.edu.co/handle/001/14232The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detection of anomalies in big data problems. This type of analysis can be very useful the use of data mining techniques because it allows extracting patterns and relationships from large amounts of data. The processing and analysis of these data volumes need tools capable of processing them as Apache Spark and Hadoop. These tools do not have specific algorithms for detecting anomalies. The general objective of the work is to develop a new algorithm for the detection of neighborhood-based anomalies in big data problems. From a comparative study, the KNNW algorithm was selected by its results, in order to design a big data variant. The implementation of the big data algorithm was done in the Apache Spark tool, using the parallel programming paradigm MapReduce. Subsequently different experiments were performed to analyze the behavior of the algorithm with different configurations. Within the experiments, the execution times and the quality of the results were compared between the sequential variant and the big data variant. Getting better results, the big data variant with significant difference. Getting the big data variant, KNNW-BigData, can process large volumes of data. Keywords: big data; data mining; detecting anomalies; MapReduce.El desarrollo de la era digital ha traído como consecuencia un incremento considerable de los volúmenes de datos. A estos grandes volúmenes de datos se les ha denominado big data ya que exceden la capacidad de procesamiento de sistemas de bases de datos convencionales. Diversos sectores consideran varias oportunidades y aplicaciones en la detección de anomalías en problemas de big data.  Para realizar este tipo de análisis puede resultar muy útil el empleo de técnicas de minería de datos porque permiten extraer patrones y relaciones desde grandes cantidades de datos. El procesamiento y análisis de estos volúmenes de datos, necesitan de herramientas capaces de procesarlos como Apache Spark y Hadoop. Estas herramientas no cuentan con algoritmos específicos para la detección de anomalías. El objetivo del trabajo es presentar un nuevo algoritmo para la detección de anomalías basado en vecindad para de problemas big data. A partir de un estudio comparativo se seleccionó el algoritmo KNNW por sus resultados, con el fin de diseñar una variante big data. La implementación del algoritmo big data se realizó en la herramienta Apache Spark, utilizando el paradigma de programación paralela MapReduce. Posteriormente se realizaron diferentes experimentos para analizar el comportamiento del algoritmo con distintas configuraciones. Dentro de los experimentos se compararon los tiempos de ejecución y calidad de los resultados entre la variante secuencial y la variante big data. La variante big data obtuvo mejores resultados con diferencia significativa. Logrando que la variante big data, KNNW-BigData, pueda procesar grandes volúmenes de datos.application/pdfapplication/xmlspaspaUniversidad Pedagógica y Tecnológica de Colombiahttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7288https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7504https://revistas.uptc.edu.co/index.php/ingenieria/article/view/8793/7533Revista Facultad de Ingeniería; Vol. 28 No. 50 (2019); 62-76Revista Facultad de Ingeniería; Vol. 28 Núm. 50 (2019); 62-762357-53280121-1129big datadata miningdetecting anomaliesMapReducebig datadetección de anomalíasMapReduceminería de datosAnomalies detection for big dataDetección de anomalías en grandes volúmenes de datosArtículo de revistahttp://purl.org/coar/resource_type/c_2df8fbb1info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionTexthttps://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a85info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2N.A.N.A.Torres-Domínguez, OmarSabater-Fernández, SamuelBravo-Ilisatigui, LisandraMartin-Rodríguez, DianaGarcía-Borroto, MiltonPublication001/14232oai:repositorio.uptc.edu.co:001/142322025-10-30 19:45:13.131metadata.onlyhttps://repositorio.uptc.edu.coRepositorio Institucional UPTCrepositorio.uptc@uptc.edu.co

Anomalies detection for big data

Publicaciones similares