Evaluating the effectiveness of replication for tail-tolerance

Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time dist...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2015
Institución:
Universidad del Rosario
Repositorio:
Repositorio EdocUR - U. Rosario
Idioma:
eng
OAI Identifier:
oai:repository.urosario.edu.co:10336/28509
Acceso en línea:
https://doi.org/10.1109/CCGrid.2015.22
https://repository.urosario.edu.co/handle/10336/28509
Palabra clave:
Computing clusters
Concurrent replication
Computer system
Rights
License
Restringido (Acceso a grupos específicos)
id EDOCUR2_1257504e5b0d7b2cb077d09c7afa5c44
oai_identifier_str oai:repository.urosario.edu.co:10336/28509
network_acronym_str EDOCUR2
network_name_str Repositorio EdocUR - U. Rosario
repository_id_str
spelling e1d48e1f-f195-4e4b-9c1b-f62c730d04e5-1800352026002020-08-28T15:49:15Z2020-08-28T15:49:15Z2015-05Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with canceling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and canceling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.application/pdfhttps://doi.org/10.1109/CCGrid.2015.22ISBN: 978-1-4799-8006-2https://repository.urosario.edu.co/handle/10336/28509engIEEE452443CCGrid `15: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Shenzhen China (May, 2015)Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, ISBN: 978-1-4799-8006-2 (May 2015); pp. 443-452https://dl.acm.org/doi/abs/10.1109/CCGrid.2015.22Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ecCCGrid '15: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Shenzhen China (May, 2015)instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURComputing clustersConcurrent replicationComputer systemEvaluating the effectiveness of replication for tail-toleranceEvaluar la efectividad de la replicación para la tolerancia a la colabookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Qiu, ZhanPérez, Juan F.10336/28509oai:repository.urosario.edu.co:10336/285092021-06-03 00:49:51.087https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co
dc.title.spa.fl_str_mv Evaluating the effectiveness of replication for tail-tolerance
dc.title.TranslatedTitle.spa.fl_str_mv Evaluar la efectividad de la replicación para la tolerancia a la cola
title Evaluating the effectiveness of replication for tail-tolerance
spellingShingle Evaluating the effectiveness of replication for tail-tolerance
Computing clusters
Concurrent replication
Computer system
title_short Evaluating the effectiveness of replication for tail-tolerance
title_full Evaluating the effectiveness of replication for tail-tolerance
title_fullStr Evaluating the effectiveness of replication for tail-tolerance
title_full_unstemmed Evaluating the effectiveness of replication for tail-tolerance
title_sort Evaluating the effectiveness of replication for tail-tolerance
dc.subject.keyword.spa.fl_str_mv Computing clusters
Concurrent replication
Computer system
topic Computing clusters
Concurrent replication
Computer system
description Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with canceling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and canceling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.
publishDate 2015
dc.date.created.spa.fl_str_mv 2015-05
dc.date.accessioned.none.fl_str_mv 2020-08-28T15:49:15Z
dc.date.available.none.fl_str_mv 2020-08-28T15:49:15Z
dc.type.eng.fl_str_mv bookPart
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_3248
dc.type.spa.spa.fl_str_mv Parte de libro
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1109/CCGrid.2015.22
dc.identifier.issn.none.fl_str_mv ISBN: 978-1-4799-8006-2
dc.identifier.uri.none.fl_str_mv https://repository.urosario.edu.co/handle/10336/28509
url https://doi.org/10.1109/CCGrid.2015.22
https://repository.urosario.edu.co/handle/10336/28509
identifier_str_mv ISBN: 978-1-4799-8006-2
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.citationEndPage.none.fl_str_mv 452
dc.relation.citationStartPage.none.fl_str_mv 443
dc.relation.citationTitle.none.fl_str_mv CCGrid `15: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Shenzhen China (May, 2015)
dc.relation.ispartof.spa.fl_str_mv Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, ISBN: 978-1-4799-8006-2 (May 2015); pp. 443-452
dc.relation.uri.spa.fl_str_mv https://dl.acm.org/doi/abs/10.1109/CCGrid.2015.22
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.rights.acceso.spa.fl_str_mv Restringido (Acceso a grupos específicos)
rights_invalid_str_mv Restringido (Acceso a grupos específicos)
http://purl.org/coar/access_right/c_16ec
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv IEEE
dc.source.spa.fl_str_mv CCGrid '15: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Shenzhen China (May, 2015)
institution Universidad del Rosario
dc.source.instname.none.fl_str_mv instname:Universidad del Rosario
dc.source.reponame.none.fl_str_mv reponame:Repositorio Institucional EdocUR
repository.name.fl_str_mv Repositorio institucional EdocUR
repository.mail.fl_str_mv edocur@urosario.edu.co
_version_ 1808390618329645056