Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks

Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immed...

Full description

Autores:
Castrillo Velilla, Eduar Moisés
Tipo de recurso:
Fecha de publicación:
2018
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/68757
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/68757
http://bdigital.unal.edu.co/69933/
Palabra clave:
0 Generalidades / Computer science, information and general works
51 Matemáticas / Mathematics
6 Tecnología (ciencias aplicadas) / Technology
62 Ingeniería y operaciones afines / Engineering
Algorithm
Complex networks
Community detection
Dynamic structural similarity
Large-scale networks
Graph clustering
Algoritmo
Redes complejas
Detección de comunidades
Similitud estructural dinámica
Redes de gran escala
Agrupamiento en grafos
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immediate neighborhood of connected nodes. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$. Second, a novel fast heuristic algorithm for multi-scale and hierarchical community detection inspired on an agglomerative hierarchical clustering technique. This algorithm uses the Dynamic Structural Similarity in a heuristic agglomerative hierarchical algorithm, that does not merge only clusters with maximal similarity as in the classical hierarchical approach, but merges any cluster that does not meet a community definition passed by parameter with its most similar adjacent clusters. The algorithm computes the similarity between clusters at the same time is checking if each cluster meets the specified community definition. It is done in linear time complexity in terms of the number of cluster in the iteration. Since a complex network is a sparse graph, this approach has a super linear time complexity with respect to the size of the input in the average case scenario, making it suitable to be applied on large-scale complex networks. Third, an efficient algorithm to detect fuzzy and crisp overlapping community structure. This algorithm leverages the disjoint community structure generated by the heuristic algorithm proposed above. Three core elements have been proposed to compute the overlapping community structure: \emph{i)} A connectivity function that quantifies the density of connections of a node towards a disjoint community, that relies its computation on the Dynamic Structural Similarity measure. \emph{ii)} An $\epsilon$-Core community definition that increases the probability of identifying in-between communities in the disjoint community structure. \emph{iii)} A membership function to compute the soft partition from the core disjoint communities. Because this algorithm keeps the same computational complexity of the original disjoint algorithm, it is still applicable to large-scale graphs. Finally, an extensive experimentation is performed in order to test the properties, efficiency and efficacy of the proposed algorithms and to compare them with the state-of-the-art. The experimental results show that the proposed algorithms provide better trade-off among the quality of the detected community structure, computational complexity and usability, compared to the state-of-the-art.