Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks
Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immed...
- Autores:
-
Castrillo Velilla, Eduar Moisés
- Tipo de recurso:
- Fecha de publicación:
- 2018
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/68757
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/68757
http://bdigital.unal.edu.co/69933/
- Palabra clave:
- 0 Generalidades / Computer science, information and general works
51 Matemáticas / Mathematics
6 Tecnología (ciencias aplicadas) / Technology
62 Ingeniería y operaciones afines / Engineering
Algorithm
Complex networks
Community detection
Dynamic structural similarity
Large-scale networks
Graph clustering
Algoritmo
Redes complejas
Detección de comunidades
Similitud estructural dinámica
Redes de gran escala
Agrupamiento en grafos
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_75850e7985bbe24a1cd765f53b2d9767 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/68757 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks |
title |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks |
spellingShingle |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks 0 Generalidades / Computer science, information and general works 51 Matemáticas / Mathematics 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering Algorithm Complex networks Community detection Dynamic structural similarity Large-scale networks Graph clustering Algoritmo Redes complejas Detección de comunidades Similitud estructural dinámica Redes de gran escala Agrupamiento en grafos |
title_short |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks |
title_full |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks |
title_fullStr |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks |
title_full_unstemmed |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks |
title_sort |
Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks |
dc.creator.fl_str_mv |
Castrillo Velilla, Eduar Moisés |
dc.contributor.author.spa.fl_str_mv |
Castrillo Velilla, Eduar Moisés |
dc.contributor.spa.fl_str_mv |
León Gumán, Elizabeth Gómez Perdomo, Jonatan |
dc.subject.ddc.spa.fl_str_mv |
0 Generalidades / Computer science, information and general works 51 Matemáticas / Mathematics 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering |
topic |
0 Generalidades / Computer science, information and general works 51 Matemáticas / Mathematics 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering Algorithm Complex networks Community detection Dynamic structural similarity Large-scale networks Graph clustering Algoritmo Redes complejas Detección de comunidades Similitud estructural dinámica Redes de gran escala Agrupamiento en grafos |
dc.subject.proposal.spa.fl_str_mv |
Algorithm Complex networks Community detection Dynamic structural similarity Large-scale networks Graph clustering Algoritmo Redes complejas Detección de comunidades Similitud estructural dinámica Redes de gran escala Agrupamiento en grafos |
description |
Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immediate neighborhood of connected nodes. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$. Second, a novel fast heuristic algorithm for multi-scale and hierarchical community detection inspired on an agglomerative hierarchical clustering technique. This algorithm uses the Dynamic Structural Similarity in a heuristic agglomerative hierarchical algorithm, that does not merge only clusters with maximal similarity as in the classical hierarchical approach, but merges any cluster that does not meet a community definition passed by parameter with its most similar adjacent clusters. The algorithm computes the similarity between clusters at the same time is checking if each cluster meets the specified community definition. It is done in linear time complexity in terms of the number of cluster in the iteration. Since a complex network is a sparse graph, this approach has a super linear time complexity with respect to the size of the input in the average case scenario, making it suitable to be applied on large-scale complex networks. Third, an efficient algorithm to detect fuzzy and crisp overlapping community structure. This algorithm leverages the disjoint community structure generated by the heuristic algorithm proposed above. Three core elements have been proposed to compute the overlapping community structure: \emph{i)} A connectivity function that quantifies the density of connections of a node towards a disjoint community, that relies its computation on the Dynamic Structural Similarity measure. \emph{ii)} An $\epsilon$-Core community definition that increases the probability of identifying in-between communities in the disjoint community structure. \emph{iii)} A membership function to compute the soft partition from the core disjoint communities. Because this algorithm keeps the same computational complexity of the original disjoint algorithm, it is still applicable to large-scale graphs. Finally, an extensive experimentation is performed in order to test the properties, efficiency and efficacy of the proposed algorithms and to compare them with the state-of-the-art. The experimental results show that the proposed algorithms provide better trade-off among the quality of the detected community structure, computational complexity and usability, compared to the state-of-the-art. |
publishDate |
2018 |
dc.date.issued.spa.fl_str_mv |
2018-09-10 |
dc.date.accessioned.spa.fl_str_mv |
2019-07-03T07:42:23Z |
dc.date.available.spa.fl_str_mv |
2019-07-03T07:42:23Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/68757 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/69933/ |
url |
https://repositorio.unal.edu.co/handle/unal/68757 http://bdigital.unal.edu.co/69933/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas |
dc.relation.references.spa.fl_str_mv |
Castrillo Velilla, Eduar Moisés (2018) Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/68757/1/1085099275_2018.pdf https://repositorio.unal.edu.co/bitstream/unal/68757/2/1085099275_2018.pdf.jpg |
bitstream.checksum.fl_str_mv |
12bd71c8938385e6c692f61f05b81a66 bf322ed65de30fd79479a56cc09afd8c |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089889719779328 |
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2León Gumán, ElizabethGómez Perdomo, JonatanCastrillo Velilla, Eduar Moisés6a13e419-72f3-44e4-9c56-02008faff64a3002019-07-03T07:42:23Z2019-07-03T07:42:23Z2018-09-10https://repositorio.unal.edu.co/handle/unal/68757http://bdigital.unal.edu.co/69933/Abstract: In this thesis several algorithms are proposed to compute efficiently high quality community structure in large-scale complex networks. First, a novel similarity measure that determines the structural similarity in a graph by dynamically diffusing and capturing information beyond the immediate neighborhood of connected nodes. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$. Second, a novel fast heuristic algorithm for multi-scale and hierarchical community detection inspired on an agglomerative hierarchical clustering technique. This algorithm uses the Dynamic Structural Similarity in a heuristic agglomerative hierarchical algorithm, that does not merge only clusters with maximal similarity as in the classical hierarchical approach, but merges any cluster that does not meet a community definition passed by parameter with its most similar adjacent clusters. The algorithm computes the similarity between clusters at the same time is checking if each cluster meets the specified community definition. It is done in linear time complexity in terms of the number of cluster in the iteration. Since a complex network is a sparse graph, this approach has a super linear time complexity with respect to the size of the input in the average case scenario, making it suitable to be applied on large-scale complex networks. Third, an efficient algorithm to detect fuzzy and crisp overlapping community structure. This algorithm leverages the disjoint community structure generated by the heuristic algorithm proposed above. Three core elements have been proposed to compute the overlapping community structure: \emph{i)} A connectivity function that quantifies the density of connections of a node towards a disjoint community, that relies its computation on the Dynamic Structural Similarity measure. \emph{ii)} An $\epsilon$-Core community definition that increases the probability of identifying in-between communities in the disjoint community structure. \emph{iii)} A membership function to compute the soft partition from the core disjoint communities. Because this algorithm keeps the same computational complexity of the original disjoint algorithm, it is still applicable to large-scale graphs. Finally, an extensive experimentation is performed in order to test the properties, efficiency and efficacy of the proposed algorithms and to compare them with the state-of-the-art. The experimental results show that the proposed algorithms provide better trade-off among the quality of the detected community structure, computational complexity and usability, compared to the state-of-the-art.En esta tesis se proponen varios algoritmos para computar eficientemente estructura de comunidad de alta calidad en redes complejas de gran escala. Primero, se propone una nueva medida que determina la similitud estructural en un grafo mediante la difusión y captura de información mas allá de la vecindad inmediata de los nodos conectados que están siendo analizados. Esta nueva similitud está modelada como una función iterada que puede ser calculada por iteración a punto fijo en complejidad de tiempo y memoria super-lineal, por lo tanto puede utilizarse para analizar grafos de gran escala. Para mostrar las ventajas de la similitud estructural propuesta, se ha reemplazado la similitud estructural local utilizada en el algoritmo SCAN, con la similitud estructural dinámica, mejorando así la calidad de la estructura de comunidad detectada y también reduciendo la sensibilidad al parámetro $\epsilon$. Segundo, se propone un algoritmo heurístico novedoso para detección de comunidades jerárquicas multi-escala que está inspirado en una técnica de agrupamiento jerárquica aglomerante. Este algoritmo utiliza la similitud estructural dinámica en un algoritmo heurístico jerárquico algomerante, que no une solamente las comunidades con máxima similitud tal como en la técnica jerárquica clásica, sino que une cualquier comunidad que no cumple una definición de comunidad pasada como parametro, con sus comunidades vecinas con las cuales presenta mayor similitud. El algoritmo computa la similitud entre las comunidades a la vez que verifica si cumplen la definición de comunidad pasada como parámetro. Esto es hecho en tiempo lineal en términos del número de comunidades en la iteración. Ya que una red compleja es un grafo disperso, esta aproximación presenta una complejidad de tiempo super-lineal en el caso promedio con respecto al tamaño del grafo entrada, por lo tanto puede ser aplicada en redes complejas de gran escala. Tercero, se propone un algoritmo novedoso para detectar estructura de comunidad superpuesta, tanto difusa como nítida. Este algoritmo utiliza la estructura de comunidad disyunta generada por el algoritmo heurístico propuesto anteriormente. Se proponen tres componentes principales para computar la estructura de comunidad superpuesta. i) Una función de conectividad que cuantifica la densidad de conexiones de un vertice hacia una comunidad disyunta, y su computación está basada en los valores de la similitud estructural dinámica. ii) Una definición de comunidad llamada Comunidad $\epsilon$-Central que incrementa la probabilidad de detectar comunidades superpuestas preliminares en la estructura de comunidad disyunta. iii) Una función de probabilidad que computa la estructura de comunidad difusa a partir de la estructura de comunidad disyunta. Ya que este algoritmo presenta la misma complejidad computacional que el algoritmo original, entonces sigue siendo aplicable a redes complejas de gran escala. Finalmente, una experimentación extensiva ha sido desarrollada con el fin de probar las propiedades, eficacia y eficiencia de los algoritmos propuestos, y para compararlos con el estado del arte. Los resultados experimentales muestran que los algoritmos propuestos proveen un mejor balance entre calidad de la estructura de comunidad detectada, eficiencia de computación y facilidad de uso, comparados con el estado del arte.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de SistemasCastrillo Velilla, Eduar Moisés (2018) Agglomerative hierarchical clustering algorithm for community detection in large-scale complex networks. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.0 Generalidades / Computer science, information and general works51 Matemáticas / Mathematics6 Tecnología (ciencias aplicadas) / Technology62 Ingeniería y operaciones afines / EngineeringAlgorithmComplex networksCommunity detectionDynamic structural similarityLarge-scale networksGraph clusteringAlgoritmoRedes complejasDetección de comunidadesSimilitud estructural dinámicaRedes de gran escalaAgrupamiento en grafosAgglomerative hierarchical clustering algorithm for community detection in large-scale complex networksTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMORIGINAL1085099275_2018.pdfapplication/pdf5443423https://repositorio.unal.edu.co/bitstream/unal/68757/1/1085099275_2018.pdf12bd71c8938385e6c692f61f05b81a66MD51THUMBNAIL1085099275_2018.pdf.jpg1085099275_2018.pdf.jpgGenerated Thumbnailimage/jpeg4400https://repositorio.unal.edu.co/bitstream/unal/68757/2/1085099275_2018.pdf.jpgbf322ed65de30fd79479a56cc09afd8cMD52unal/68757oai:repositorio.unal.edu.co:unal/687572024-05-28 23:09:11.921Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |