Assessing the impact of concurrent replication with canceling in Parallel Jobs

Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a fail...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2015
Institución:
Universidad del Rosario
Repositorio:
Repositorio EdocUR - U. Rosario
Idioma:
eng
OAI Identifier:
oai:repository.urosario.edu.co:10336/28505
Acceso en línea:
https://doi.org/0.1109/MASCOTS.2014.13
https://repository.urosario.edu.co/handle/10336/28505
Palabra clave:
Reliability
Time factors
Computational modeling
Numerical models
Vectors
Generators
Equations
Rights
License
Restringido (Acceso a grupos específicos)
id EDOCUR2_8059e1d64c90de7753c4727b03ae70eb
oai_identifier_str oai:repository.urosario.edu.co:10336/28505
network_acronym_str EDOCUR2
network_name_str Repositorio EdocUR - U. Rosario
repository_id_str
spelling e1d48e1f-f195-4e4b-9c1b-f62c730d04e5800352026002020-08-28T15:49:14Z2020-08-28T15:49:14Z2015-02-09Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.application/pdfhttps://doi.org/0.1109/MASCOTS.2014.13EISBN: 978-1-4799-5610-4https://repository.urosario.edu.co/handle/10336/28505engIEEE40312014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication SystemsIEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, EISBN: 978-1-4799-5610-4 (2014 ); pp. 31-40https://ieeexplore.ieee.org/document/7033635Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ec2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systemsinstname:Universidad del Rosarioreponame:Repositorio Institucional EdocURReliabilityTime factorsComputational modelingNumerical modelsVectorsGeneratorsEquationsAssessing the impact of concurrent replication with canceling in Parallel JobsEvaluación del impacto de la replicación concurrente con la cancelación en trabajos paralelosbookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Qiu, ZhanPérez, Juan F.10336/28505oai:repository.urosario.edu.co:10336/285052021-09-23 12:33:59.473https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co
dc.title.spa.fl_str_mv Assessing the impact of concurrent replication with canceling in Parallel Jobs
dc.title.TranslatedTitle.spa.fl_str_mv Evaluación del impacto de la replicación concurrente con la cancelación en trabajos paralelos
title Assessing the impact of concurrent replication with canceling in Parallel Jobs
spellingShingle Assessing the impact of concurrent replication with canceling in Parallel Jobs
Reliability
Time factors
Computational modeling
Numerical models
Vectors
Generators
Equations
title_short Assessing the impact of concurrent replication with canceling in Parallel Jobs
title_full Assessing the impact of concurrent replication with canceling in Parallel Jobs
title_fullStr Assessing the impact of concurrent replication with canceling in Parallel Jobs
title_full_unstemmed Assessing the impact of concurrent replication with canceling in Parallel Jobs
title_sort Assessing the impact of concurrent replication with canceling in Parallel Jobs
dc.subject.keyword.spa.fl_str_mv Reliability
Time factors
Computational modeling
Numerical models
Vectors
Generators
Equations
topic Reliability
Time factors
Computational modeling
Numerical models
Vectors
Generators
Equations
description Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.
publishDate 2015
dc.date.created.spa.fl_str_mv 2015-02-09
dc.date.accessioned.none.fl_str_mv 2020-08-28T15:49:14Z
dc.date.available.none.fl_str_mv 2020-08-28T15:49:14Z
dc.type.eng.fl_str_mv bookPart
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_3248
dc.type.spa.spa.fl_str_mv Parte de libro
dc.identifier.doi.none.fl_str_mv https://doi.org/0.1109/MASCOTS.2014.13
dc.identifier.issn.none.fl_str_mv EISBN: 978-1-4799-5610-4
dc.identifier.uri.none.fl_str_mv https://repository.urosario.edu.co/handle/10336/28505
url https://doi.org/0.1109/MASCOTS.2014.13
https://repository.urosario.edu.co/handle/10336/28505
identifier_str_mv EISBN: 978-1-4799-5610-4
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.citationEndPage.none.fl_str_mv 40
dc.relation.citationStartPage.none.fl_str_mv 31
dc.relation.citationTitle.none.fl_str_mv 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems
dc.relation.ispartof.spa.fl_str_mv IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, EISBN: 978-1-4799-5610-4 (2014 ); pp. 31-40
dc.relation.uri.spa.fl_str_mv https://ieeexplore.ieee.org/document/7033635
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.rights.acceso.spa.fl_str_mv Restringido (Acceso a grupos específicos)
rights_invalid_str_mv Restringido (Acceso a grupos específicos)
http://purl.org/coar/access_right/c_16ec
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv IEEE
dc.source.spa.fl_str_mv 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems
institution Universidad del Rosario
dc.source.instname.none.fl_str_mv instname:Universidad del Rosario
dc.source.reponame.none.fl_str_mv reponame:Repositorio Institucional EdocUR
repository.name.fl_str_mv Repositorio institucional EdocUR
repository.mail.fl_str_mv edocur@urosario.edu.co
_version_ 1814167472367992832