Tackling latency via replication in distributed systems

Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2016
Institución:
Universidad del Rosario
Repositorio:
Repositorio EdocUR - U. Rosario
Idioma:
eng
OAI Identifier:
oai:repository.urosario.edu.co:10336/28507
Acceso en línea:
https://doi.org/10.1145/2851553.2851562
https://repository.urosario.edu.co/handle/10336/28507
Palabra clave:
Latency-tolerance
Fault-tolerance
Matrix-analytic methods
Response time distribution
Distributed system
Rights
License
Restringido (Acceso a grupos específicos)
id EDOCUR2_28274b1f52ff6ed3e6d2e8149bff9cef
oai_identifier_str oai:repository.urosario.edu.co:10336/28507
network_acronym_str EDOCUR2
network_name_str Repositorio EdocUR - U. Rosario
repository_id_str
spelling e1d48e1f-f195-4e4b-9c1b-f62c730d04e5800352026006549d3b2-d9df-440b-acfa-5833e4b4b3232020-08-28T15:49:15Z2020-08-28T15:49:15Z2016-03Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault- and delay-tolerance.application/pdfhttps://doi.org/10.1145/2851553.2851562ISBN: 978-1-4503-4080-9https://repository.urosario.edu.co/handle/10336/28507engAssociation for Computing MachineryICPE 16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering;ICPE`16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance EngineeringICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands, ISBN: 978-1-4503-4080-9 (March, 2016); pp. 197-208https://dl.acm.org/doi/abs/10.1145/2851553.2851562Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ecICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance EngineeringICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURLatency-toleranceFault-toleranceMatrix-analytic methodsResponse time distributionDistributed systemTackling latency via replication in distributed systemsAbordar la latencia mediante la replicación en sistemas distribuidosbookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Qiu, ZhanPérez, Juan F.Harrison, Peter G10336/28507oai:repository.urosario.edu.co:10336/285072021-09-23 12:59:28.603https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co
dc.title.spa.fl_str_mv Tackling latency via replication in distributed systems
dc.title.TranslatedTitle.spa.fl_str_mv Abordar la latencia mediante la replicación en sistemas distribuidos
title Tackling latency via replication in distributed systems
spellingShingle Tackling latency via replication in distributed systems
Latency-tolerance
Fault-tolerance
Matrix-analytic methods
Response time distribution
Distributed system
title_short Tackling latency via replication in distributed systems
title_full Tackling latency via replication in distributed systems
title_fullStr Tackling latency via replication in distributed systems
title_full_unstemmed Tackling latency via replication in distributed systems
title_sort Tackling latency via replication in distributed systems
dc.subject.keyword.spa.fl_str_mv Latency-tolerance
Fault-tolerance
Matrix-analytic methods
Response time distribution
Distributed system
topic Latency-tolerance
Fault-tolerance
Matrix-analytic methods
Response time distribution
Distributed system
description Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault- and delay-tolerance.
publishDate 2016
dc.date.created.spa.fl_str_mv 2016-03
dc.date.accessioned.none.fl_str_mv 2020-08-28T15:49:15Z
dc.date.available.none.fl_str_mv 2020-08-28T15:49:15Z
dc.type.eng.fl_str_mv bookPart
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_3248
dc.type.spa.spa.fl_str_mv Parte de libro
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1145/2851553.2851562
dc.identifier.issn.none.fl_str_mv ISBN: 978-1-4503-4080-9
dc.identifier.uri.none.fl_str_mv https://repository.urosario.edu.co/handle/10336/28507
url https://doi.org/10.1145/2851553.2851562
https://repository.urosario.edu.co/handle/10336/28507
identifier_str_mv ISBN: 978-1-4503-4080-9
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.citationTitle.none.fl_str_mv ICPE 16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering;ICPE`16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016
dc.relation.ispartof.spa.fl_str_mv ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering
ICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands, ISBN: 978-1-4503-4080-9 (March, 2016); pp. 197-208
dc.relation.uri.spa.fl_str_mv https://dl.acm.org/doi/abs/10.1145/2851553.2851562
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.rights.acceso.spa.fl_str_mv Restringido (Acceso a grupos específicos)
rights_invalid_str_mv Restringido (Acceso a grupos específicos)
http://purl.org/coar/access_right/c_16ec
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Association for Computing Machinery
dc.source.spa.fl_str_mv ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering
ICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016
institution Universidad del Rosario
dc.source.instname.none.fl_str_mv instname:Universidad del Rosario
dc.source.reponame.none.fl_str_mv reponame:Repositorio Institucional EdocUR
repository.name.fl_str_mv Repositorio institucional EdocUR
repository.mail.fl_str_mv edocur@urosario.edu.co
_version_ 1814167652402200576