Enhancing reliability and response times via replication in computing clusters
Computing clusters have been widely deployed for scientific and engineering applications to support intensive computation and massive data operations. As applications and resources in a cluster are subject to failures, fault-tolerance strategies are commonly adopted, sometimes at the expense of addi...
- Autores:
- Tipo de recurso:
- Fecha de publicación:
- 2015
- Institución:
- Universidad del Rosario
- Repositorio:
- Repositorio EdocUR - U. Rosario
- Idioma:
- eng
- OAI Identifier:
- oai:repository.urosario.edu.co:10336/28504
- Acceso en línea:
- https://doi.org/10.1109/INFOCOM.2015.7218512
https://repository.urosario.edu.co/handle/10336/28504
- Palabra clave:
- Servers
Time factors
Reliability
Computational modeling
Conferences
Computers
Switches
- Rights
- License
- Restringido (Acceso a grupos específicos)
id |
EDOCUR2_38c54bd73fb21e00a6e741d17625b34e |
---|---|
oai_identifier_str |
oai:repository.urosario.edu.co:10336/28504 |
network_acronym_str |
EDOCUR2 |
network_name_str |
Repositorio EdocUR - U. Rosario |
repository_id_str |
|
spelling |
e1d48e1f-f195-4e4b-9c1b-f62c730d04e5800352026002020-08-28T15:49:14Z2020-08-28T15:49:14Z2015-08-24Computing clusters have been widely deployed for scientific and engineering applications to support intensive computation and massive data operations. As applications and resources in a cluster are subject to failures, fault-tolerance strategies are commonly adopted, sometimes at the expense of additional delays in job response times, or unnecessarily increasing resource usage. In this paper, we explore concurrent replication with canceling, a fault-tolerance approach where jobs and their replicas are processed concurrently, and the successful completion of either triggers the removals of its replica. We propose a stochastic model to study how this approach affects the cluster service level objectives (SLOs), particularly the offered response time percentiles. In addition to the expected gains in reliability, the proposed model allows us to determine the regions of the utilization where introducing replication with canceling effectively reduces the response times. Moreover, we show how this model can support resource provisioning decisions with reliability and response time guarantees.application/pdfhttps://doi.org/10.1109/INFOCOM.2015.7218512EISBN: 978-1-4799-8381-0https://repository.urosario.edu.co/handle/10336/28504engIEEE136313552015 IEEE Conference on Computer Communications (INFOCOM)IEEE Conference on Computer Communications (INFOCOM), EISBN: 978-1-4799-8381-0 (2015); pp. 1355-1363https://ieeexplore.ieee.org/abstract/document/7218512Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ec2015 IEEE Conference on Computer Communications (INFOCOM)instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURServersTime factorsReliabilityComputational modelingConferencesComputersSwitchesEnhancing reliability and response times via replication in computing clustersMejora de la confiabilidad y los tiempos de respuesta mediante la replicación en clústeres informáticosbookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Qiu, Zhan.Pérez, Juan F.10336/28504oai:repository.urosario.edu.co:10336/285042021-09-23 00:51:24.057https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co |
dc.title.spa.fl_str_mv |
Enhancing reliability and response times via replication in computing clusters |
dc.title.TranslatedTitle.spa.fl_str_mv |
Mejora de la confiabilidad y los tiempos de respuesta mediante la replicación en clústeres informáticos |
title |
Enhancing reliability and response times via replication in computing clusters |
spellingShingle |
Enhancing reliability and response times via replication in computing clusters Servers Time factors Reliability Computational modeling Conferences Computers Switches |
title_short |
Enhancing reliability and response times via replication in computing clusters |
title_full |
Enhancing reliability and response times via replication in computing clusters |
title_fullStr |
Enhancing reliability and response times via replication in computing clusters |
title_full_unstemmed |
Enhancing reliability and response times via replication in computing clusters |
title_sort |
Enhancing reliability and response times via replication in computing clusters |
dc.subject.keyword.spa.fl_str_mv |
Servers Time factors Reliability Computational modeling Conferences Computers Switches |
topic |
Servers Time factors Reliability Computational modeling Conferences Computers Switches |
description |
Computing clusters have been widely deployed for scientific and engineering applications to support intensive computation and massive data operations. As applications and resources in a cluster are subject to failures, fault-tolerance strategies are commonly adopted, sometimes at the expense of additional delays in job response times, or unnecessarily increasing resource usage. In this paper, we explore concurrent replication with canceling, a fault-tolerance approach where jobs and their replicas are processed concurrently, and the successful completion of either triggers the removals of its replica. We propose a stochastic model to study how this approach affects the cluster service level objectives (SLOs), particularly the offered response time percentiles. In addition to the expected gains in reliability, the proposed model allows us to determine the regions of the utilization where introducing replication with canceling effectively reduces the response times. Moreover, we show how this model can support resource provisioning decisions with reliability and response time guarantees. |
publishDate |
2015 |
dc.date.created.spa.fl_str_mv |
2015-08-24 |
dc.date.accessioned.none.fl_str_mv |
2020-08-28T15:49:14Z |
dc.date.available.none.fl_str_mv |
2020-08-28T15:49:14Z |
dc.type.eng.fl_str_mv |
bookPart |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_3248 |
dc.type.spa.spa.fl_str_mv |
Parte de libro |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.1109/INFOCOM.2015.7218512 |
dc.identifier.issn.none.fl_str_mv |
EISBN: 978-1-4799-8381-0 |
dc.identifier.uri.none.fl_str_mv |
https://repository.urosario.edu.co/handle/10336/28504 |
url |
https://doi.org/10.1109/INFOCOM.2015.7218512 https://repository.urosario.edu.co/handle/10336/28504 |
identifier_str_mv |
EISBN: 978-1-4799-8381-0 |
dc.language.iso.spa.fl_str_mv |
eng |
language |
eng |
dc.relation.citationEndPage.none.fl_str_mv |
1363 |
dc.relation.citationStartPage.none.fl_str_mv |
1355 |
dc.relation.citationTitle.none.fl_str_mv |
2015 IEEE Conference on Computer Communications (INFOCOM) |
dc.relation.ispartof.spa.fl_str_mv |
IEEE Conference on Computer Communications (INFOCOM), EISBN: 978-1-4799-8381-0 (2015); pp. 1355-1363 |
dc.relation.uri.spa.fl_str_mv |
https://ieeexplore.ieee.org/abstract/document/7218512 |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_16ec |
dc.rights.acceso.spa.fl_str_mv |
Restringido (Acceso a grupos específicos) |
rights_invalid_str_mv |
Restringido (Acceso a grupos específicos) http://purl.org/coar/access_right/c_16ec |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
IEEE |
dc.source.spa.fl_str_mv |
2015 IEEE Conference on Computer Communications (INFOCOM) |
institution |
Universidad del Rosario |
dc.source.instname.none.fl_str_mv |
instname:Universidad del Rosario |
dc.source.reponame.none.fl_str_mv |
reponame:Repositorio Institucional EdocUR |
repository.name.fl_str_mv |
Repositorio institucional EdocUR |
repository.mail.fl_str_mv |
edocur@urosario.edu.co |
_version_ |
1814167478436102144 |