Self-healing model for distributed environments based on artificial life techniques
Nowadays, distributed systems are a fundamental part of cloud-based systems, such as Google and cyber-physical systems like smart cities and electric grids. Achieving robustness and providing failure detection and recovery in distributed systems is a difficult problem because they are subject to loc...
- Autores:
-
Rodríguez Portela, Arles Ernesto
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2018
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/68730
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/68730
http://bdigital.unal.edu.co/69892/
- Palabra clave:
- 6 Tecnología (ciencias aplicadas) / Technology
62 Ingeniería y operaciones afines / Engineering
Self-healing
Distributed systems
Multi-agent systems
Animal motion
Local interactions
Complex networks
Self-organisation
Auto-recuperación
Sistemas distribuidos
Sistemas multi-agente
Movimiento animal
Interacciones locales
Redes complejas
Auto-organización
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_51284732af2fdc6b7fdf67b3d73be8cd |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/68730 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Self-healing model for distributed environments based on artificial life techniques |
title |
Self-healing model for distributed environments based on artificial life techniques |
spellingShingle |
Self-healing model for distributed environments based on artificial life techniques 6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering Self-healing Distributed systems Multi-agent systems Animal motion Local interactions Complex networks Self-organisation Auto-recuperación Sistemas distribuidos Sistemas multi-agente Movimiento animal Interacciones locales Redes complejas Auto-organización |
title_short |
Self-healing model for distributed environments based on artificial life techniques |
title_full |
Self-healing model for distributed environments based on artificial life techniques |
title_fullStr |
Self-healing model for distributed environments based on artificial life techniques |
title_full_unstemmed |
Self-healing model for distributed environments based on artificial life techniques |
title_sort |
Self-healing model for distributed environments based on artificial life techniques |
dc.creator.fl_str_mv |
Rodríguez Portela, Arles Ernesto |
dc.contributor.advisor.spa.fl_str_mv |
Diaconescu, Ada (Thesis advisor) |
dc.contributor.author.spa.fl_str_mv |
Rodríguez Portela, Arles Ernesto |
dc.contributor.spa.fl_str_mv |
Gómez Perdomo, Jonatan |
dc.subject.ddc.spa.fl_str_mv |
6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering |
topic |
6 Tecnología (ciencias aplicadas) / Technology 62 Ingeniería y operaciones afines / Engineering Self-healing Distributed systems Multi-agent systems Animal motion Local interactions Complex networks Self-organisation Auto-recuperación Sistemas distribuidos Sistemas multi-agente Movimiento animal Interacciones locales Redes complejas Auto-organización |
dc.subject.proposal.spa.fl_str_mv |
Self-healing Distributed systems Multi-agent systems Animal motion Local interactions Complex networks Self-organisation Auto-recuperación Sistemas distribuidos Sistemas multi-agente Movimiento animal Interacciones locales Redes complejas Auto-organización |
description |
Nowadays, distributed systems are a fundamental part of cloud-based systems, such as Google and cyber-physical systems like smart cities and electric grids. Achieving robustness and providing failure detection and recovery in distributed systems is a difficult problem because they are subject to local conditions and can fail unexpectedly. The main goal of this research is to define algorithms to achieve robustness and self-healing for solving the problem of failure detection and recovery in distributed systems. This research integrates different approaches inspired from nature: it improves robustness for distributed data-collection tasks performed by failure-prone mobile agents employing techniques inspired from animal foraging and swarm intelligence. Results show how agents are able to collect and replicate data from the entire target space despite agent failures. Then, the performance and robustness of the pheromone-based algorithm and random exploration are studied for data collection in complex networks, with different topologies (Lattice, Small-world, Community and Scale-free). Experimental results show that network topologies impact data collection and synchronisation and that the proposed pheromone-based approach can improve performance and success rates across most networks. Next, a replication based self-healing mechanism is proposed. The proposed replication approach uses communication time-outs to determine agent failure, and learns time-outs automatically to minimise false positives. Finally, a model to self-heal the structure of a complex network from node failures is proposed. This model differs from existing approaches in the creation of replicas from existing failing nodes and its links instead of rewiring the network to recover its functionality. Experimental results show that it is possible to recover failures in nodes if nodes know the topology. However, in some cases the topology is unknown or changes dynamically. To solve this problem, the data-collection strategies studied previously are applied to synchronise the network topology. Results show the benefits of this approach with respect to a reference multicast-based solution. By using mobile agents, a good part of the network is maintained with lesser overloads in terms of number of messages compared with multicast. Additionally, the strategy to replicate failing mobile agents is extended to deal with failures in nodes, making possible for agents to synchronise the topology data and to enable nodes holding this information to recover failed agents and neighbouring nodes at the same time. The obtained results provide key information that may help to design distributed systems covering applications like sensor networks, swarm robotics, server clusters, clouds and Internet of Things (IoT). |
publishDate |
2018 |
dc.date.issued.spa.fl_str_mv |
2018 |
dc.date.accessioned.spa.fl_str_mv |
2019-07-03T07:37:43Z |
dc.date.available.spa.fl_str_mv |
2019-07-03T07:37:43Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Doctorado |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.coar.spa.fl_str_mv |
http://purl.org/coar/resource_type/c_db06 |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TD |
format |
http://purl.org/coar/resource_type/c_db06 |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/68730 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/69892/ |
url |
https://repositorio.unal.edu.co/handle/unal/68730 http://bdigital.unal.edu.co/69892/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas |
dc.relation.references.spa.fl_str_mv |
Rodríguez Portela, Arles Ernesto (2018) Self-healing model for distributed environments based on artificial life techniques. Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/68730/1/80849599.2018.pdf https://repositorio.unal.edu.co/bitstream/unal/68730/2/80849599.2018.pdf.jpg |
bitstream.checksum.fl_str_mv |
c4d80d4c1054cf99dcb80c9d9bbe3f03 b283341752b79156a3e6437b86c4e0b7 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089944921014272 |
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Gómez Perdomo, JonatanDiaconescu, Ada (Thesis advisor)f67c987a-6a2e-4326-84ff-63dc6896b3b6-1Rodríguez Portela, Arles Ernestoe07801d3-8025-4500-a6b8-cfd86ac8b9583002019-07-03T07:37:43Z2019-07-03T07:37:43Z2018https://repositorio.unal.edu.co/handle/unal/68730http://bdigital.unal.edu.co/69892/Nowadays, distributed systems are a fundamental part of cloud-based systems, such as Google and cyber-physical systems like smart cities and electric grids. Achieving robustness and providing failure detection and recovery in distributed systems is a difficult problem because they are subject to local conditions and can fail unexpectedly. The main goal of this research is to define algorithms to achieve robustness and self-healing for solving the problem of failure detection and recovery in distributed systems. This research integrates different approaches inspired from nature: it improves robustness for distributed data-collection tasks performed by failure-prone mobile agents employing techniques inspired from animal foraging and swarm intelligence. Results show how agents are able to collect and replicate data from the entire target space despite agent failures. Then, the performance and robustness of the pheromone-based algorithm and random exploration are studied for data collection in complex networks, with different topologies (Lattice, Small-world, Community and Scale-free). Experimental results show that network topologies impact data collection and synchronisation and that the proposed pheromone-based approach can improve performance and success rates across most networks. Next, a replication based self-healing mechanism is proposed. The proposed replication approach uses communication time-outs to determine agent failure, and learns time-outs automatically to minimise false positives. Finally, a model to self-heal the structure of a complex network from node failures is proposed. This model differs from existing approaches in the creation of replicas from existing failing nodes and its links instead of rewiring the network to recover its functionality. Experimental results show that it is possible to recover failures in nodes if nodes know the topology. However, in some cases the topology is unknown or changes dynamically. To solve this problem, the data-collection strategies studied previously are applied to synchronise the network topology. Results show the benefits of this approach with respect to a reference multicast-based solution. By using mobile agents, a good part of the network is maintained with lesser overloads in terms of number of messages compared with multicast. Additionally, the strategy to replicate failing mobile agents is extended to deal with failures in nodes, making possible for agents to synchronise the topology data and to enable nodes holding this information to recover failed agents and neighbouring nodes at the same time. The obtained results provide key information that may help to design distributed systems covering applications like sensor networks, swarm robotics, server clusters, clouds and Internet of Things (IoT).Resumen: Actualmente, los sistemas distribuidos son una parte fundamental de sistemas basados en la nube, tales como Google y sistemas ciber físicos como ciudades inteligentes y redes eléctricas. Obtener robustez y proveer detección y recuperación de fallas en sistemas distribuidos es un problema difícil porque dichos sistemas están sujetos a condiciones locales y pueden fallar inesperadamente. El objetivo principal de esta investigación es definir algunos algoritmos para lograr robustez y auto-recuperación para resolver el problema de detección y recuperación de fallas en sistemas distribuidos. Esta investigación integra diferentes enfoques inspirados en la naturaleza: mejora la robustez en una tarea de recopilación de datos distribuidos usando agentes móviles propensos a fallas basados en la búsqueda de alimento de animales y especialmente en la inteligencia de enjambres. Los resultados muestran cómo los enjambres recogen y replican información de todo el espacio incluso cuando ocurren fallas. A continuación, se estudia el rendimiento y la robustez del modelo basado en feromonas y la exploración aleatoria para recopilar datos en redes complejas con diferentes topologías (Lattice, Small-world, Community y Scale-free). Los resultados experimentales muestran cómo las topologías de red impactan en la recopilación y sincronización de datos y cómo el enfoque de enjambres propuesto puede mejorar el rendimiento y las tasas de éxito en la mayoría de las redes. A continuación, se define una técnica de auto-curación que crea réplicas de agentes anómalos que les permite completar una tarea de sincronización de datos incluso para altas tasas de fallas. El enfoque de replicación propuesto aprende y estima los tiempos límite del movimiento de agentes minimizando los falsos positivos. Finalmente, se propone un modelo para auto-curar la estructura de una red compleja a partir de fallas en los nodos. Una diferencia con otros trabajos revisados, es que el modelo crea réplicas de los nodos que fallan y sus enlaces en lugar de reconectar la red para recuperar la funcionalidad. Los resultados experimentales muestran que es posible recuperar fallas en los nodos si los nodos conocen la topología. Sin embargo, hay casos en que la topología es desconocida o cambia dinámicamente. Para resolver este problema, las estrategias estudiadas para recopilar datos se aplican para sincronizar la topología de la red y compararlas con respecto a una solución de multidifusión de referencia. Mediante el uso de agentes móviles, una buena parte de la red se mantiene con menos sobrecargas en términos de cantidad de mensajes en comparación con multidifusión. Además, la estrategia para replicar los agentes móviles que fallan se amplía para tratar las fallas en los nodos haciendo posible que los agentes sincronicen los datos de topología y logrando que los nodos puedan recuperar los agentes móviles y otros nodos al mismo tiempo. Los resultados obtenidos proveen información clave que puede ayudar a diseñar sistemas distribuidos cubriendo aplicaciones como redes de sensores, robótica de enjambres, clústeres de servidores, nubes e Internet de las cosas.Doctoradoapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de SistemasRodríguez Portela, Arles Ernesto (2018) Self-healing model for distributed environments based on artificial life techniques. Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá.6 Tecnología (ciencias aplicadas) / Technology62 Ingeniería y operaciones afines / EngineeringSelf-healingDistributed systemsMulti-agent systemsAnimal motionLocal interactionsComplex networksSelf-organisationAuto-recuperaciónSistemas distribuidosSistemas multi-agenteMovimiento animalInteracciones localesRedes complejasAuto-organizaciónSelf-healing model for distributed environments based on artificial life techniquesTrabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttp://purl.org/redcol/resource_type/TDORIGINAL80849599.2018.pdfapplication/pdf10433265https://repositorio.unal.edu.co/bitstream/unal/68730/1/80849599.2018.pdfc4d80d4c1054cf99dcb80c9d9bbe3f03MD51THUMBNAIL80849599.2018.pdf.jpg80849599.2018.pdf.jpgGenerated Thumbnailimage/jpeg4160https://repositorio.unal.edu.co/bitstream/unal/68730/2/80849599.2018.pdf.jpgb283341752b79156a3e6437b86c4e0b7MD52unal/68730oai:repositorio.unal.edu.co:unal/687302023-06-05 23:03:05.422Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |