Assessing the impact of concurrent replication with canceling in Parallel Jobs
Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a fail...
- Autores:
- Tipo de recurso:
- Fecha de publicación:
- 2015
- Institución:
- Universidad del Rosario
- Repositorio:
- Repositorio EdocUR - U. Rosario
- Idioma:
- eng
- OAI Identifier:
- oai:repository.urosario.edu.co:10336/28505
- Acceso en línea:
- https://doi.org/0.1109/MASCOTS.2014.13
https://repository.urosario.edu.co/handle/10336/28505
- Palabra clave:
- Reliability
Time factors
Computational modeling
Numerical models
Vectors
Generators
Equations
- Rights
- License
- Restringido (Acceso a grupos específicos)
id |
EDOCUR2_8059e1d64c90de7753c4727b03ae70eb |
---|---|
oai_identifier_str |
oai:repository.urosario.edu.co:10336/28505 |
network_acronym_str |
EDOCUR2 |
network_name_str |
Repositorio EdocUR - U. Rosario |
repository_id_str |
|
spelling |
e1d48e1f-f195-4e4b-9c1b-f62c730d04e5800352026002020-08-28T15:49:14Z2020-08-28T15:49:14Z2015-02-09Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.application/pdfhttps://doi.org/0.1109/MASCOTS.2014.13EISBN: 978-1-4799-5610-4https://repository.urosario.edu.co/handle/10336/28505engIEEE40312014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication SystemsIEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, EISBN: 978-1-4799-5610-4 (2014 ); pp. 31-40https://ieeexplore.ieee.org/document/7033635Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ec2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systemsinstname:Universidad del Rosarioreponame:Repositorio Institucional EdocURReliabilityTime factorsComputational modelingNumerical modelsVectorsGeneratorsEquationsAssessing the impact of concurrent replication with canceling in Parallel JobsEvaluación del impacto de la replicación concurrente con la cancelación en trabajos paralelosbookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Qiu, ZhanPérez, Juan F.10336/28505oai:repository.urosario.edu.co:10336/285052021-09-23 12:33:59.473https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co |
dc.title.spa.fl_str_mv |
Assessing the impact of concurrent replication with canceling in Parallel Jobs |
dc.title.TranslatedTitle.spa.fl_str_mv |
Evaluación del impacto de la replicación concurrente con la cancelación en trabajos paralelos |
title |
Assessing the impact of concurrent replication with canceling in Parallel Jobs |
spellingShingle |
Assessing the impact of concurrent replication with canceling in Parallel Jobs Reliability Time factors Computational modeling Numerical models Vectors Generators Equations |
title_short |
Assessing the impact of concurrent replication with canceling in Parallel Jobs |
title_full |
Assessing the impact of concurrent replication with canceling in Parallel Jobs |
title_fullStr |
Assessing the impact of concurrent replication with canceling in Parallel Jobs |
title_full_unstemmed |
Assessing the impact of concurrent replication with canceling in Parallel Jobs |
title_sort |
Assessing the impact of concurrent replication with canceling in Parallel Jobs |
dc.subject.keyword.spa.fl_str_mv |
Reliability Time factors Computational modeling Numerical models Vectors Generators Equations |
topic |
Reliability Time factors Computational modeling Numerical models Vectors Generators Equations |
description |
Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times. |
publishDate |
2015 |
dc.date.created.spa.fl_str_mv |
2015-02-09 |
dc.date.accessioned.none.fl_str_mv |
2020-08-28T15:49:14Z |
dc.date.available.none.fl_str_mv |
2020-08-28T15:49:14Z |
dc.type.eng.fl_str_mv |
bookPart |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_3248 |
dc.type.spa.spa.fl_str_mv |
Parte de libro |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/0.1109/MASCOTS.2014.13 |
dc.identifier.issn.none.fl_str_mv |
EISBN: 978-1-4799-5610-4 |
dc.identifier.uri.none.fl_str_mv |
https://repository.urosario.edu.co/handle/10336/28505 |
url |
https://doi.org/0.1109/MASCOTS.2014.13 https://repository.urosario.edu.co/handle/10336/28505 |
identifier_str_mv |
EISBN: 978-1-4799-5610-4 |
dc.language.iso.spa.fl_str_mv |
eng |
language |
eng |
dc.relation.citationEndPage.none.fl_str_mv |
40 |
dc.relation.citationStartPage.none.fl_str_mv |
31 |
dc.relation.citationTitle.none.fl_str_mv |
2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems |
dc.relation.ispartof.spa.fl_str_mv |
IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, EISBN: 978-1-4799-5610-4 (2014 ); pp. 31-40 |
dc.relation.uri.spa.fl_str_mv |
https://ieeexplore.ieee.org/document/7033635 |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_16ec |
dc.rights.acceso.spa.fl_str_mv |
Restringido (Acceso a grupos específicos) |
rights_invalid_str_mv |
Restringido (Acceso a grupos específicos) http://purl.org/coar/access_right/c_16ec |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
IEEE |
dc.source.spa.fl_str_mv |
2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems |
institution |
Universidad del Rosario |
dc.source.instname.none.fl_str_mv |
instname:Universidad del Rosario |
dc.source.reponame.none.fl_str_mv |
reponame:Repositorio Institucional EdocUR |
repository.name.fl_str_mv |
Repositorio institucional EdocUR |
repository.mail.fl_str_mv |
edocur@urosario.edu.co |
_version_ |
1814167472367992832 |