Tackling latency via replication in distributed systems
Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and...
- Autores:
- Tipo de recurso:
- Fecha de publicación:
- 2016
- Institución:
- Universidad del Rosario
- Repositorio:
- Repositorio EdocUR - U. Rosario
- Idioma:
- eng
- OAI Identifier:
- oai:repository.urosario.edu.co:10336/28507
- Acceso en línea:
- https://doi.org/10.1145/2851553.2851562
https://repository.urosario.edu.co/handle/10336/28507
- Palabra clave:
- Latency-tolerance
Fault-tolerance
Matrix-analytic methods
Response time distribution
Distributed system
- Rights
- License
- Restringido (Acceso a grupos específicos)
id |
EDOCUR2_28274b1f52ff6ed3e6d2e8149bff9cef |
---|---|
oai_identifier_str |
oai:repository.urosario.edu.co:10336/28507 |
network_acronym_str |
EDOCUR2 |
network_name_str |
Repositorio EdocUR - U. Rosario |
repository_id_str |
|
spelling |
e1d48e1f-f195-4e4b-9c1b-f62c730d04e5800352026006549d3b2-d9df-440b-acfa-5833e4b4b3232020-08-28T15:49:15Z2020-08-28T15:49:15Z2016-03Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault- and delay-tolerance.application/pdfhttps://doi.org/10.1145/2851553.2851562ISBN: 978-1-4503-4080-9https://repository.urosario.edu.co/handle/10336/28507engAssociation for Computing MachineryICPE 16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering;ICPE`16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance EngineeringICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands, ISBN: 978-1-4503-4080-9 (March, 2016); pp. 197-208https://dl.acm.org/doi/abs/10.1145/2851553.2851562Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ecICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance EngineeringICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURLatency-toleranceFault-toleranceMatrix-analytic methodsResponse time distributionDistributed systemTackling latency via replication in distributed systemsAbordar la latencia mediante la replicación en sistemas distribuidosbookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Qiu, ZhanPérez, Juan F.Harrison, Peter G10336/28507oai:repository.urosario.edu.co:10336/285072021-09-23 12:59:28.603https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co |
dc.title.spa.fl_str_mv |
Tackling latency via replication in distributed systems |
dc.title.TranslatedTitle.spa.fl_str_mv |
Abordar la latencia mediante la replicación en sistemas distribuidos |
title |
Tackling latency via replication in distributed systems |
spellingShingle |
Tackling latency via replication in distributed systems Latency-tolerance Fault-tolerance Matrix-analytic methods Response time distribution Distributed system |
title_short |
Tackling latency via replication in distributed systems |
title_full |
Tackling latency via replication in distributed systems |
title_fullStr |
Tackling latency via replication in distributed systems |
title_full_unstemmed |
Tackling latency via replication in distributed systems |
title_sort |
Tackling latency via replication in distributed systems |
dc.subject.keyword.spa.fl_str_mv |
Latency-tolerance Fault-tolerance Matrix-analytic methods Response time distribution Distributed system |
topic |
Latency-tolerance Fault-tolerance Matrix-analytic methods Response time distribution Distributed system |
description |
Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling – i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault- and delay-tolerance. |
publishDate |
2016 |
dc.date.created.spa.fl_str_mv |
2016-03 |
dc.date.accessioned.none.fl_str_mv |
2020-08-28T15:49:15Z |
dc.date.available.none.fl_str_mv |
2020-08-28T15:49:15Z |
dc.type.eng.fl_str_mv |
bookPart |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_3248 |
dc.type.spa.spa.fl_str_mv |
Parte de libro |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.1145/2851553.2851562 |
dc.identifier.issn.none.fl_str_mv |
ISBN: 978-1-4503-4080-9 |
dc.identifier.uri.none.fl_str_mv |
https://repository.urosario.edu.co/handle/10336/28507 |
url |
https://doi.org/10.1145/2851553.2851562 https://repository.urosario.edu.co/handle/10336/28507 |
identifier_str_mv |
ISBN: 978-1-4503-4080-9 |
dc.language.iso.spa.fl_str_mv |
eng |
language |
eng |
dc.relation.citationTitle.none.fl_str_mv |
ICPE 16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering;ICPE`16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016 |
dc.relation.ispartof.spa.fl_str_mv |
ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering ICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands, ISBN: 978-1-4503-4080-9 (March, 2016); pp. 197-208 |
dc.relation.uri.spa.fl_str_mv |
https://dl.acm.org/doi/abs/10.1145/2851553.2851562 |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_16ec |
dc.rights.acceso.spa.fl_str_mv |
Restringido (Acceso a grupos específicos) |
rights_invalid_str_mv |
Restringido (Acceso a grupos específicos) http://purl.org/coar/access_right/c_16ec |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
Association for Computing Machinery |
dc.source.spa.fl_str_mv |
ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering ICPE'16: ACM/SPEC International Conference on Performance Engineering Delft The Netherlands March, 2016 |
institution |
Universidad del Rosario |
dc.source.instname.none.fl_str_mv |
instname:Universidad del Rosario |
dc.source.reponame.none.fl_str_mv |
reponame:Repositorio Institucional EdocUR |
repository.name.fl_str_mv |
Repositorio institucional EdocUR |
repository.mail.fl_str_mv |
edocur@urosario.edu.co |
_version_ |
1814167652402200576 |