Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks

Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immed...

Full description

Autores:: Castrillo Velilla, Eduar Moisés

Tipo de recurso:

Fecha de publicación:: 2018

Institución:: Universidad Nacional de Colombia

Repositorio:: Universidad Nacional de Colombia

Idioma:: spa

id	UNACIONAL2_75850e7985bbe24a1cd765f53b2d9767
oai_identifier_str	oai:repositorio.unal.edu.co:unal/68757
network_acronym_str	UNACIONAL2
network_name_str	Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
spellingShingle	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks 0 Generalidades / Computer science, information and general works 51 Matemáticas / Mathematics 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering Algorithm Complex networks Community detection Dynamic structural similarity Large-scale networks Graph clustering Algoritmo Redes complejas Detección de comunidades Similitud estructural dinámica Redes de gran escala Agrupamiento en grafos
title_short	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_full	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_fullStr	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_full_unstemmed	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_sort	Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
dc.creator.fl_str_mv	Castrillo Velilla, Eduar Moisés
dc.contributor.author.spa.fl_str_mv	Castrillo Velilla, Eduar Moisés
dc.contributor.spa.fl_str_mv	León Gumán, Elizabeth Gómez Perdomo, Jonatan
dc.subject.ddc.spa.fl_str_mv	0 Generalidades / Computer science, information and general works 51 Matemáticas / Mathematics 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering
topic	0 Generalidades / Computer science, information and general works 51 Matemáticas / Mathematics 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering Algorithm Complex networks Community detection Dynamic structural similarity Large-scale networks Graph clustering Algoritmo Redes complejas Detección de comunidades Similitud estructural dinámica Redes de gran escala Agrupamiento en grafos
dc.subject.proposal.spa.fl_str_mv	Algorithm Complex networks Community detection Dynamic structural similarity Large-scale networks Graph clustering Algoritmo Redes complejas Detección de comunidades Similitud estructural dinámica Redes de gran escala Agrupamiento en grafos
description	Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immediate neighborhood of connected nodes. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$. Second, a novel fast heuristic algorithm for multi-scale and hierarchical community detection inspired on an agglomerative hierarchical clustering technique. This algorithm uses the Dynamic Structural Similarity in a heuristic agglomerative hierarchical algorithm, that does not merge only clusters with maximal similarity as in the classical hierarchical approach, but merges any cluster that does not meet a community definition passed by parameter with its most similar adjacent clusters. The algorithm computes the similarity between clusters at the same time is checking if each cluster meets the specified community definition. It is done in linear time complexity in terms of the number of cluster in the iteration. Since a complex network is a sparse graph, this approach has a super linear time complexity with respect to the size of the input in the average case scenario, making it suitable to be applied on large-scale complex networks. Third, an efficient algorithm to detect fuzzy and crisp overlapping community structure. This algorithm leverages the disjoint community structure generated by the heuristic algorithm proposed above. Three core elements have been proposed to compute the overlapping community structure: \emph{i)} A connectivity function that quantifies the density of connections of a node towards a disjoint community, that relies its computation on the Dynamic Structural Similarity measure. \emph{ii)} An $\epsilon$-Core community definition that increases the probability of identifying in-between communities in the disjoint community structure. \emph{iii)} A membership function to compute the soft partition from the core disjoint communities. Because this algorithm keeps the same computational complexity of the original disjoint algorithm, it is still applicable to large-scale graphs. Finally, an extensive experimentation is performed in order to test the properties, efficiency and efficacy of the proposed algorithms and to compare them with the state-of-the-art. The experimental results show that the proposed algorithms provide better trade-off among the quality of the detected community structure, computational complexity and usability, compared to the state-of-the-art.
publishDate	2018
dc.date.issued.spa.fl_str_mv	2018-09-10
dc.date.accessioned.spa.fl_str_mv	2019-07-03T07:42:23Z
dc.date.available.spa.fl_str_mv	2019-07-03T07:42:23Z
dc.type.spa.fl_str_mv	Trabajo de grado - Maestría
dc.type.driver.spa.fl_str_mv	info:eu-repo/semantics/masterThesis
dc.type.version.spa.fl_str_mv	info:eu-repo/semantics/acceptedVersion
dc.type.content.spa.fl_str_mv	Text
dc.type.redcol.spa.fl_str_mv	http://purl.org/redcol/resource_type/TM
status_str	acceptedVersion
dc.identifier.uri.none.fl_str_mv	https://repositorio.unal.edu.co/handle/unal/68757
dc.identifier.eprints.spa.fl_str_mv	http://bdigital.unal.edu.co/69933/
url	https://repositorio.unal.edu.co/handle/unal/68757 http://bdigital.unal.edu.co/69933/
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.ispartof.spa.fl_str_mv	Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas
dc.relation.references.spa.fl_str_mv	Castrillo Velilla, Eduar Moisés (2018) Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.
dc.rights.spa.fl_str_mv	Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv	Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv	http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.mimetype.spa.fl_str_mv	application/pdf
institution	Universidad Nacional de Colombia
bitstream.url.fl_str_mv	https://repositorio.unal.edu.co/bitstream/unal/68757/1/1085099275_2018.pdf https://repositorio.unal.edu.co/bitstream/unal/68757/2/1085099275_2018.pdf.jpg
bitstream.checksum.fl_str_mv	12bd71c8938385e6c692f61f05b81a66 bf322ed65de30fd79479a56cc09afd8c
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv	repositorio_nal@unal.edu.co
_version_	1814089889719779328
spelling	Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2León Gumán, ElizabethGómez Perdomo, JonatanCastrillo Velilla, Eduar Moisés6a13e419-72f3-44e4-9c56-02008faff64a3002019-07-03T07:42:23Z2019-07-03T07:42:23Z2018-09-10https://repositorio.unal.edu.co/handle/unal/68757http://bdigital.unal.edu.co/69933/Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immediate neighborhood of connected nodes. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$. Second, a novel fast heuristic algorithm for multi-scale and hierarchical community detection inspired on an agglomerative hierarchical clustering technique. This algorithm uses the Dynamic Structural Similarity in a heuristic agglomerative hierarchical algorithm, that does not merge only clusters with maximal similarity as in the classical hierarchical approach, but merges any cluster that does not meet a community definition passed by parameter with its most similar adjacent clusters. The algorithm computes the similarity between clusters at the same time is checking if each cluster meets the specified community definition. It is done in linear time complexity in terms of the number of cluster in the iteration. Since a complex network is a sparse graph, this approach has a super linear time complexity with respect to the size of the input in the average case scenario, making it suitable to be applied on large-scale complex networks. Third, an efficient algorithm to detect fuzzy and crisp overlapping community structure. This algorithm leverages the disjoint community structure generated by the heuristic algorithm proposed above. Three core elements have been proposed to compute the overlapping community structure: \emph{i)} A connectivity function that quantifies the density of connections of a node towards a disjoint community, that relies its computation on the Dynamic Structural Similarity measure. \emph{ii)} An $\epsilon$-Core community definition that increases the probability of identifying in-between communities in the disjoint community structure. \emph{iii)} A membership function to compute the soft partition from the core disjoint communities. Because this algorithm keeps the same computational complexity of the original disjoint algorithm, it is still applicable to large-scale graphs. Finally, an extensive experimentation is performed in order to test the properties, efficiency and efficacy of the proposed algorithms and to compare them with the state-of-the-art. The experimental results show that the proposed algorithms provide better trade-off among the quality of the detected community structure, computational complexity and usability, compared to the state-of-the-art.En esta tesis se proponen varios algoritmos para computar eficientemente estructura de comunidad de alta calidad en redes complejas de gran escala. Primero, se propone una nueva medida que determina la similitud estructural en un grafo mediante la difusión y captura de información mas allá de la vecindad inmediata de los nodos conectados que están siendo analizados. Esta nueva similitud está modelada como una función iterada que puede ser calculada por iteración a punto fijo en complejidad de tiempo y memoria super-lineal, por lo tanto puede utilizarse para analizar grafos de gran escala. Para mostrar las ventajas de la similitud estructural propuesta, se ha reemplazado la similitud estructural local utilizada en el algoritmo SCAN, con la similitud estructural dinámica, mejorando así la calidad de la estructura de comunidad detectada y también reduciendo la sensibilidad al parámetro $\epsilon$. Segundo, se propone un algoritmo heurístico novedoso para detección de comunidades jerárquicas multi-escala que está inspirado en una técnica de agrupamiento jerárquica aglomerante. Este algoritmo utiliza la similitud estructural dinámica en un algoritmo heurístico jerárquico algomerante, que no une solamente las comunidades con máxima similitud tal como en la técnica jerárquica clásica, sino que une cualquier comunidad que no cumple una definición de comunidad pasada como parametro, con sus comunidades vecinas con las cuales presenta mayor similitud. El algoritmo computa la similitud entre las comunidades a la vez que verifica si cumplen la definición de comunidad pasada como parámetro. Esto es hecho en tiempo lineal en términos del número de comunidades en la iteración. Ya que una red compleja es un grafo disperso, esta aproximación presenta una complejidad de tiempo super-lineal en el caso promedio con respecto al tamaño del grafo entrada, por lo tanto puede ser aplicada en redes complejas de gran escala. Tercero, se propone un algoritmo novedoso para detectar estructura de comunidad superpuesta, tanto difusa como nítida. Este algoritmo utiliza la estructura de comunidad disyunta generada por el algoritmo heurístico propuesto anteriormente. Se proponen tres componentes principales para computar la estructura de comunidad superpuesta. i) Una función de conectividad que cuantifica la densidad de conexiones de un vertice hacia una comunidad disyunta, y su computación está basada en los valores de la similitud estructural dinámica. ii) Una definición de comunidad llamada Comunidad $\epsilon$-Central que incrementa la probabilidad de detectar comunidades superpuestas preliminares en la estructura de comunidad disyunta. iii) Una función de probabilidad que computa la estructura de comunidad difusa a partir de la estructura de comunidad disyunta. Ya que este algoritmo presenta la misma complejidad computacional que el algoritmo original, entonces sigue siendo aplicable a redes complejas de gran escala. Finalmente, una experimentación extensiva ha sido desarrollada con el fin de probar las propiedades, eficacia y eficiencia de los algoritmos propuestos, y para compararlos con el estado del arte. Los resultados experimentales muestran que los algoritmos propuestos proveen un mejor balance entre calidad de la estructura de comunidad detectada, eficiencia de computación y facilidad de uso, comparados con el estado del arte.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de SistemasCastrillo Velilla, Eduar Moisés (2018) Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.0 Generalidades / Computer science, information and general works51 Matemáticas / Mathematics6 Tecnología (ciencias aplicadas) / Technology62 Ingeniería y operaciones afines / EngineeringAlgorithmComplex networksCommunity detectionDynamic structural similarityLarge-scale networksGraph clusteringAlgoritmoRedes complejasDetección de comunidadesSimilitud estructural dinámicaRedes de gran escalaAgrupamiento en grafosAgglomerative hierarchical clustering algorithm for community detection in large-scale complex networksTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMORIGINAL1085099275_2018.pdfapplication/pdf5443423https://repositorio.unal.edu.co/bitstream/unal/68757/1/1085099275_2018.pdf12bd71c8938385e6c692f61f05b81a66MD51THUMBNAIL1085099275_2018.pdf.jpg1085099275_2018.pdf.jpgGenerated Thumbnailimage/jpeg4400https://repositorio.unal.edu.co/bitstream/unal/68757/2/1085099275_2018.pdf.jpgbf322ed65de30fd79479a56cc09afd8cMD52unal/68757oai:repositorio.unal.edu.co:unal/687572024-05-28 23:09:11.921Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co

Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks

Publicaciones similares