Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks

Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immed...

Full description

Autores:
Castrillo Velilla, Eduar Moisés
Tipo de recurso:
Fecha de publicación:
2018
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/68757
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/68757
http://bdigital.unal.edu.co/69933/
Palabra clave:
0 Generalidades / Computer science, information and general works
51 Matemáticas / Mathematics
6 Tecnología (ciencias aplicadas) / Technology
62 Ingeniería y operaciones afines / Engineering
Algorithm
Complex networks
Community detection
Dynamic structural similarity
Large-scale networks
Graph clustering
Algoritmo
Redes complejas
Detección de comunidades
Similitud estructural dinámica
Redes de gran escala
Agrupamiento en grafos
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
id UNACIONAL2_75850e7985bbe24a1cd765f53b2d9767
oai_identifier_str oai:repositorio.unal.edu.co:unal/68757
network_acronym_str UNACIONAL2
network_name_str Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
spellingShingle Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
0 Generalidades / Computer science, information and general works
51 Matemáticas / Mathematics
6 Tecnología (ciencias aplicadas) / Technology
62 Ingeniería y operaciones afines / Engineering
Algorithm
Complex networks
Community detection
Dynamic structural similarity
Large-scale networks
Graph clustering
Algoritmo
Redes complejas
Detección de comunidades
Similitud estructural dinámica
Redes de gran escala
Agrupamiento en grafos
title_short Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_full Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_fullStr Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_full_unstemmed Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
title_sort Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
dc.creator.fl_str_mv Castrillo Velilla, Eduar Moisés
dc.contributor.author.spa.fl_str_mv Castrillo Velilla, Eduar Moisés
dc.contributor.spa.fl_str_mv León Gumán, Elizabeth
Gómez Perdomo, Jonatan
dc.subject.ddc.spa.fl_str_mv 0 Generalidades / Computer science, information and general works
51 Matemáticas / Mathematics
6 Tecnología (ciencias aplicadas) / Technology
62 Ingeniería y operaciones afines / Engineering
topic 0 Generalidades / Computer science, information and general works
51 Matemáticas / Mathematics
6 Tecnología (ciencias aplicadas) / Technology
62 Ingeniería y operaciones afines / Engineering
Algorithm
Complex networks
Community detection
Dynamic structural similarity
Large-scale networks
Graph clustering
Algoritmo
Redes complejas
Detección de comunidades
Similitud estructural dinámica
Redes de gran escala
Agrupamiento en grafos
dc.subject.proposal.spa.fl_str_mv Algorithm
Complex networks
Community detection
Dynamic structural similarity
Large-scale networks
Graph clustering
Algoritmo
Redes complejas
Detección de comunidades
Similitud estructural dinámica
Redes de gran escala
Agrupamiento en grafos
description Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immediate neighborhood of connected nodes. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$. Second, a novel fast heuristic algorithm for multi-scale and hierarchical community detection inspired on an agglomerative hierarchical clustering technique. This algorithm uses the Dynamic Structural Similarity in a heuristic agglomerative hierarchical algorithm, that does not merge only clusters with maximal similarity as in the classical hierarchical approach, but merges any cluster that does not meet a community definition passed by parameter with its most similar adjacent clusters. The algorithm computes the similarity between clusters at the same time is checking if each cluster meets the specified community definition. It is done in linear time complexity in terms of the number of cluster in the iteration. Since a complex network is a sparse graph, this approach has a super linear time complexity with respect to the size of the input in the average case scenario, making it suitable to be applied on large-scale complex networks. Third, an efficient algorithm to detect fuzzy and crisp overlapping community structure. This algorithm leverages the disjoint community structure generated by the heuristic algorithm proposed above. Three core elements have been proposed to compute the overlapping community structure: \emph{i)} A connectivity function that quantifies the density of connections of a node towards a disjoint community, that relies its computation on the Dynamic Structural Similarity measure. \emph{ii)} An $\epsilon$-Core community definition that increases the probability of identifying in-between communities in the disjoint community structure. \emph{iii)} A membership function to compute the soft partition from the core disjoint communities. Because this algorithm keeps the same computational complexity of the original disjoint algorithm, it is still applicable to large-scale graphs. Finally, an extensive experimentation is performed in order to test the properties, efficiency and efficacy of the proposed algorithms and to compare them with the state-of-the-art. The experimental results show that the proposed algorithms provide better trade-off among the quality of the detected community structure, computational complexity and usability, compared to the state-of-the-art.
publishDate 2018
dc.date.issued.spa.fl_str_mv 2018-09-10
dc.date.accessioned.spa.fl_str_mv 2019-07-03T07:42:23Z
dc.date.available.spa.fl_str_mv 2019-07-03T07:42:23Z
dc.type.spa.fl_str_mv Trabajo de grado - Maestría
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/masterThesis
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TM
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://repositorio.unal.edu.co/handle/unal/68757
dc.identifier.eprints.spa.fl_str_mv http://bdigital.unal.edu.co/69933/
url https://repositorio.unal.edu.co/handle/unal/68757
http://bdigital.unal.edu.co/69933/
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.ispartof.spa.fl_str_mv Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas
Ingeniería de Sistemas
dc.relation.references.spa.fl_str_mv Castrillo Velilla, Eduar Moisés (2018) Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.
dc.rights.spa.fl_str_mv Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv Atribución-NoComercial 4.0 Internacional
Derechos reservados - Universidad Nacional de Colombia
http://creativecommons.org/licenses/by-nc/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.spa.fl_str_mv application/pdf
institution Universidad Nacional de Colombia
bitstream.url.fl_str_mv https://repositorio.unal.edu.co/bitstream/unal/68757/1/1085099275_2018.pdf
https://repositorio.unal.edu.co/bitstream/unal/68757/2/1085099275_2018.pdf.jpg
bitstream.checksum.fl_str_mv 12bd71c8938385e6c692f61f05b81a66
bf322ed65de30fd79479a56cc09afd8c
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv repositorio_nal@unal.edu.co
_version_ 1814089889719779328
spelling Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2León Gumán, ElizabethGómez Perdomo, JonatanCastrillo Velilla, Eduar Moisés6a13e419-72f3-44e4-9c56-02008faff64a3002019-07-03T07:42:23Z2019-07-03T07:42:23Z2018-09-10https://repositorio.unal.edu.co/handle/unal/68757http://bdigital.unal.edu.co/69933/Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immediate neighborhood of connected nodes. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$. Second, a novel fast heuristic algorithm for multi-scale and hierarchical community detection inspired on an agglomerative hierarchical clustering technique. This algorithm uses the Dynamic Structural Similarity in a heuristic agglomerative hierarchical algorithm, that does not merge only clusters with maximal similarity as in the classical hierarchical approach, but merges any cluster that does not meet a community definition passed by parameter with its most similar adjacent clusters. The algorithm computes the similarity between clusters at the same time is checking if each cluster meets the specified community definition. It is done in linear time complexity in terms of the number of cluster in the iteration. Since a complex network is a sparse graph, this approach has a super linear time complexity with respect to the size of the input in the average case scenario, making it suitable to be applied on large-scale complex networks. Third, an efficient algorithm to detect fuzzy and crisp overlapping community structure. This algorithm leverages the disjoint community structure generated by the heuristic algorithm proposed above. Three core elements have been proposed to compute the overlapping community structure: \emph{i)} A connectivity function that quantifies the density of connections of a node towards a disjoint community, that relies its computation on the Dynamic Structural Similarity measure. \emph{ii)} An $\epsilon$-Core community definition that increases the probability of identifying in-between communities in the disjoint community structure. \emph{iii)} A membership function to compute the soft partition from the core disjoint communities. Because this algorithm keeps the same computational complexity of the original disjoint algorithm, it is still applicable to large-scale graphs. Finally, an extensive experimentation is performed in order to test the properties, efficiency and efficacy of the proposed algorithms and to compare them with the state-of-the-art. The experimental results show that the proposed algorithms provide better trade-off among the quality of the detected community structure, computational complexity and usability, compared to the state-of-the-art.En esta tesis se proponen varios algoritmos para computar eficientemente estructura de comunidad de alta calidad en redes complejas de gran escala. Primero, se propone una nueva medida que determina la similitud estructural en un grafo mediante la difusión y captura de información mas allá de la vecindad inmediata de los nodos conectados que están siendo analizados. Esta nueva similitud está modelada como una función iterada que puede ser calculada por iteración a punto fijo en complejidad de tiempo y memoria super-lineal, por lo tanto puede utilizarse para analizar grafos de gran escala. Para mostrar las ventajas de la similitud estructural propuesta, se ha reemplazado la similitud estructural local utilizada en el algoritmo SCAN, con la similitud estructural dinámica, mejorando así la calidad de la estructura de comunidad detectada y también reduciendo la sensibilidad al parámetro $\epsilon$. Segundo, se propone un algoritmo heurístico novedoso para detección de comunidades jerárquicas multi-escala que está inspirado en una técnica de agrupamiento jerárquica aglomerante. Este algoritmo utiliza la similitud estructural dinámica en un algoritmo heurístico jerárquico algomerante, que no une solamente las comunidades con máxima similitud tal como en la técnica jerárquica clásica, sino que une cualquier comunidad que no cumple una definición de comunidad pasada como parametro, con sus comunidades vecinas con las cuales presenta mayor similitud. El algoritmo computa la similitud entre las comunidades a la vez que verifica si cumplen la definición de comunidad pasada como parámetro. Esto es hecho en tiempo lineal en términos del número de comunidades en la iteración. Ya que una red compleja es un grafo disperso, esta aproximación presenta una complejidad de tiempo super-lineal en el caso promedio con respecto al tamaño del grafo entrada, por lo tanto puede ser aplicada en redes complejas de gran escala. Tercero, se propone un algoritmo novedoso para detectar estructura de comunidad superpuesta, tanto difusa como nítida. Este algoritmo utiliza la estructura de comunidad disyunta generada por el algoritmo heurístico propuesto anteriormente. Se proponen tres componentes principales para computar la estructura de comunidad superpuesta. i) Una función de conectividad que cuantifica la densidad de conexiones de un vertice hacia una comunidad disyunta, y su computación está basada en los valores de la similitud estructural dinámica. ii) Una definición de comunidad llamada Comunidad $\epsilon$-Central que incrementa la probabilidad de detectar comunidades superpuestas preliminares en la estructura de comunidad disyunta. iii) Una función de probabilidad que computa la estructura de comunidad difusa a partir de la estructura de comunidad disyunta. Ya que este algoritmo presenta la misma complejidad computacional que el algoritmo original, entonces sigue siendo aplicable a redes complejas de gran escala. Finalmente, una experimentación extensiva ha sido desarrollada con el fin de probar las propiedades, eficacia y eficiencia de los algoritmos propuestos, y para compararlos con el estado del arte. Los resultados experimentales muestran que los algoritmos propuestos proveen un mejor balance entre calidad de la estructura de comunidad detectada, eficiencia de computación y facilidad de uso, comparados con el estado del arte.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de SistemasCastrillo Velilla, Eduar Moisés (2018) Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.0 Generalidades / Computer science, information and general works51 Matemáticas / Mathematics6 Tecnología (ciencias aplicadas) / Technology62 Ingeniería y operaciones afines / EngineeringAlgorithmComplex networksCommunity detectionDynamic structural similarityLarge-scale networksGraph clusteringAlgoritmoRedes complejasDetección de comunidadesSimilitud estructural dinámicaRedes de gran escalaAgrupamiento en grafosAgglomerative hierarchical clustering algorithm for community detection in large-scale complex networksTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMORIGINAL1085099275_2018.pdfapplication/pdf5443423https://repositorio.unal.edu.co/bitstream/unal/68757/1/1085099275_2018.pdf12bd71c8938385e6c692f61f05b81a66MD51THUMBNAIL1085099275_2018.pdf.jpg1085099275_2018.pdf.jpgGenerated Thumbnailimage/jpeg4400https://repositorio.unal.edu.co/bitstream/unal/68757/2/1085099275_2018.pdf.jpgbf322ed65de30fd79479a56cc09afd8cMD52unal/68757oai:repositorio.unal.edu.co:unal/687572024-05-28 23:09:11.921Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co