Tackling latency via replication in distributed systems

Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and...

Full description

Autores:

Tipo de recurso:

Fecha de publicación:: 2016

Institución:: Universidad del Rosario

Repositorio:: Repositorio EdocUR - U. Rosario

Idioma:: eng

id	EDOCUR2_28274b1f52ff6ed3e6d2e8149bff9cef
oai_identifier_str	oai:repository.urosario.edu.co:10336/28507
network_acronym_str	EDOCUR2
network_name_str	Repositorio EdocUR - U. Rosario
repository_id_str
spelling	e1d48e1f-f195-4e4b-9c1b-f62c730d04e5800352026006549d3b2-d9df-440b-acfa-5833e4b4b3232020-08-28T15:49:15Z2020-08-28T15:49:15Z2016-03Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault- and delay-tolerance.application/pdfhttps://doi.org/10.1145/2851553.2851562ISBN: 978-1-4503-4080-9https://repository.urosario.edu.co/handle/10336/28507engAssociation for Computing MachineryICPE 16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering;ICPE`16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance EngineeringICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands, ISBN: 978-1-4503-4080-9 (March, 2016); pp. 197-208https://dl.acm.org/doi/abs/10.1145/2851553.2851562Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ecICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance EngineeringICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURLatency-toleranceFault-toleranceMatrix-analytic methodsResponse time distributionDistributed systemTackling latency via replication in distributed systemsAbordar la latencia mediante la replicación en sistemas distribuidosbookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Qiu, ZhanPérez, Juan F.Harrison, Peter G10336/28507oai:repository.urosario.edu.co:10336/285072021-09-23 12:59:28.603https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co
dc.title.spa.fl_str_mv	Tackling latency via replication in distributed systems
dc.title.TranslatedTitle.spa.fl_str_mv	Abordar la latencia mediante la replicación en sistemas distribuidos
title	Tackling latency via replication in distributed systems
spellingShingle	Tackling latency via replication in distributed systems Latency-tolerance Fault-tolerance Matrix-analytic methods Response time distribution Distributed system
title_short	Tackling latency via replication in distributed systems
title_full	Tackling latency via replication in distributed systems
title_fullStr	Tackling latency via replication in distributed systems
title_full_unstemmed	Tackling latency via replication in distributed systems
title_sort	Tackling latency via replication in distributed systems
dc.subject.keyword.spa.fl_str_mv	Latency-tolerance Fault-tolerance Matrix-analytic methods Response time distribution Distributed system
topic	Latency-tolerance Fault-tolerance Matrix-analytic methods Response time distribution Distributed system
description	Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault- and delay-tolerance.
publishDate	2016
dc.date.created.spa.fl_str_mv	2016-03
dc.date.accessioned.none.fl_str_mv	2020-08-28T15:49:15Z
dc.date.available.none.fl_str_mv	2020-08-28T15:49:15Z
dc.type.eng.fl_str_mv	bookPart
dc.type.coarversion.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv	http://purl.org/coar/resource_type/c_3248
dc.type.spa.spa.fl_str_mv	Parte de libro
dc.identifier.doi.none.fl_str_mv	https://doi.org/10.1145/2851553.2851562
dc.identifier.issn.none.fl_str_mv	ISBN: 978-1-4503-4080-9
dc.identifier.uri.none.fl_str_mv	https://repository.urosario.edu.co/handle/10336/28507
url	https://doi.org/10.1145/2851553.2851562 https://repository.urosario.edu.co/handle/10336/28507
identifier_str_mv	ISBN: 978-1-4503-4080-9
dc.language.iso.spa.fl_str_mv	eng
language	eng
dc.relation.citationTitle.none.fl_str_mv	ICPE 16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering;ICPE`16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016
dc.relation.ispartof.spa.fl_str_mv	ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering ICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands, ISBN: 978-1-4503-4080-9 (March, 2016); pp. 197-208
dc.relation.uri.spa.fl_str_mv	https://dl.acm.org/doi/abs/10.1145/2851553.2851562
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_16ec
dc.rights.acceso.spa.fl_str_mv	Restringido (Acceso a grupos específicos)
rights_invalid_str_mv	Restringido (Acceso a grupos específicos) http://purl.org/coar/access_right/c_16ec
dc.format.mimetype.none.fl_str_mv	application/pdf
dc.publisher.spa.fl_str_mv	Association for Computing Machinery
dc.source.spa.fl_str_mv	ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering ICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016
institution	Universidad del Rosario
dc.source.instname.none.fl_str_mv	instname:Universidad del Rosario
dc.source.reponame.none.fl_str_mv	reponame:Repositorio Institucional EdocUR
repository.name.fl_str_mv	Repositorio institucional EdocUR
repository.mail.fl_str_mv	edocur@urosario.edu.co
_version_	1837007768728371200

Tackling latency via replication in distributed systems

Publicaciones similares