An evaluation of the state of time synchronization on leadership class supercomputers
We present a detailed examination of time agreement characteristics for nodes within extreme‐scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high‐performance computers sited at three national laborat...
- Autores:
-
Mondragón Martínez, Oscar Hernán
Jones, Terry
Bridges, Patrick
Ostrouchov, George
Koenig, Gregory A.
- Tipo de recurso:
- Article of journal
- Fecha de publicación:
- 2019
- Institución:
- Universidad Autónoma de Occidente
- Repositorio:
- RED: Repositorio Educativo Digital UAO
- Idioma:
- eng
- OAI Identifier:
- oai:red.uao.edu.co:10614/11190
- Acceso en línea:
- http://hdl.handle.net/10614/11190
https://doi.org/10.1002/cpe.4341
- Palabra clave:
- Ingeniería de computación
Computer engineering
Clock synchronization
Large-scale systems
System software
Time service
- Rights
- openAccess
- License
- Derechos Reservados - Universidad Autónoma de Occidente
id |
REPOUAO2_6d81f1ec1de1ec178804bbd9d06bf797 |
---|---|
oai_identifier_str |
oai:red.uao.edu.co:10614/11190 |
network_acronym_str |
REPOUAO2 |
network_name_str |
RED: Repositorio Educativo Digital UAO |
repository_id_str |
|
dc.title.eng.fl_str_mv |
An evaluation of the state of time synchronization on leadership class supercomputers |
title |
An evaluation of the state of time synchronization on leadership class supercomputers |
spellingShingle |
An evaluation of the state of time synchronization on leadership class supercomputers Ingeniería de computación Computer engineering Clock synchronization Large-scale systems System software Time service |
title_short |
An evaluation of the state of time synchronization on leadership class supercomputers |
title_full |
An evaluation of the state of time synchronization on leadership class supercomputers |
title_fullStr |
An evaluation of the state of time synchronization on leadership class supercomputers |
title_full_unstemmed |
An evaluation of the state of time synchronization on leadership class supercomputers |
title_sort |
An evaluation of the state of time synchronization on leadership class supercomputers |
dc.creator.fl_str_mv |
Mondragón Martínez, Oscar Hernán Jones, Terry Bridges, Patrick Ostrouchov, George Koenig, Gregory A. |
dc.contributor.author.none.fl_str_mv |
Mondragón Martínez, Oscar Hernán Jones, Terry Bridges, Patrick Ostrouchov, George Koenig, Gregory A. |
dc.subject.lemb.eng.fl_str_mv |
Ingeniería de computación |
topic |
Ingeniería de computación Computer engineering Clock synchronization Large-scale systems System software Time service |
dc.subject.lemb.spa.fl_str_mv |
Computer engineering |
dc.subject.proposal.eng.fl_str_mv |
Clock synchronization Large-scale systems System software Time service |
description |
We present a detailed examination of time agreement characteristics for nodes within extreme‐scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high‐performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings |
publishDate |
2019 |
dc.date.issued.none.fl_str_mv |
20180225 |
dc.date.accessioned.none.fl_str_mv |
2019-10-09T21:16:13Z |
dc.date.available.none.fl_str_mv |
2019-10-09T21:16:13Z |
dc.type.spa.fl_str_mv |
Artículo de revista |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.coar.eng.fl_str_mv |
http://purl.org/coar/resource_type/c_6501 |
dc.type.content.eng.fl_str_mv |
Text |
dc.type.driver.eng.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.redcol.eng.fl_str_mv |
http://purl.org/redcol/resource_type/ARTREF |
dc.type.version.eng.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
http://purl.org/coar/resource_type/c_6501 |
status_str |
publishedVersion |
dc.identifier.issn.spa.fl_str_mv |
15320634 (en línea) 15320626 (impresa) |
dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/10614/11190 |
dc.identifier.doi.spa.fl_str_mv |
https://doi.org/10.1002/cpe.4341 |
identifier_str_mv |
15320634 (en línea) 15320626 (impresa) |
url |
http://hdl.handle.net/10614/11190 https://doi.org/10.1002/cpe.4341 |
dc.language.iso.eng.fl_str_mv |
eng |
language |
eng |
dc.relation.citationendpage.none.fl_str_mv |
16 |
dc.relation.citationissue.none.fl_str_mv |
4 |
dc.relation.citationstartpage.none.fl_str_mv |
1 |
dc.relation.citationvolume.none.fl_str_mv |
30 |
dc.relation.cites.spa.fl_str_mv |
Jones, T., Ostrouchov, G., Koenig, G. A., Mondragon, O. H., & Bridges, P. G. (2018). An evaluation of the state of time synchronization on leadership class supercomputers. Concurrency and Computation, 30 (4), 1-16. DOI: 10.1002/cpe.4341 |
dc.relation.ispartofjournal.eng.fl_str_mv |
Concurrency and Computation. Practice and Experience |
dc.relation.references.none.fl_str_mv |
1. Veitch D, Ridoux J, Korada SB. Robust synchronization of absolute and difference clocks over networks. IEEE/ACM Trans Networking (TON). 2009;17(2):417-430. 2. Mills DL. Internet time synchronization: the network time protocol. Commun IEEE Trans. 1991;39(10):1482-1493. 3. Top 500 Supercomputing Sites. https://www.top500.org, Accessed: 04-2015. 4. Valiant LG. A bridging model for parallel computation. Commun ACM. 1990;33(8):103-111. 5. Oliner AJ, Kulkarni AV, Aiken A. Using correlated surprise to infer shared influence. In: 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL: IEEE; 2010:191-200. 6. Oeste S, Knüpfer A, Ilsche T. Towards parallel performance analysis tools for the openSHMEM standard. Workshop on OpenSHMEM and Related Technologies, Annapolis, MD: Springer; 2014:90-104. 7. Marangos N, Rizomiliotis P,Mitrou L. Time synchronization: pivotal element in cloud forensics. Security and CommunicationNetworks. 2014;9(6):571-582. 8. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Scheduling in-situ analytics in next-generation applications. In: Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, Cartegena, Colombia: IEEE; 2016:102-105. 9. Mondragon OH, Bridges PG, Jones T. Quantifying scheduling challenges for exascale system software. In: Proceedings of the 5th InternationalWorkshop on Runtime and Operating Systems for Supercomputers, Portland, OR: ACM; 2015:1-8. 10. Jones T, Dawson S, Neely R, et al. Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: Supercomputing, 2003 ACM/IEEE Conference, Phoenix, AZ: IEEE; 2003:10-10. 11. Feitelson DG, Rudolph L. Gang scheduling performance benefits for fine-grain synchronization. J Parallel Distrib Comput. 1992;16(4):306-318. 12. Brightwell R, Oldfield R,Maccabe AB, Bernholdt DE. Hobbes: Composition and virtualization as the foundations of an extreme-scale os/r. In: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, Eugene, OR: ACM; 2013:1-8. 13. Akkan H, Lang M, Liebrock L. Understanding and isolating the noise in the linux kernel. Int J High Perform Comput Appl. 2013;27(2):136-146. 14. De P, Kothari R, Mann V. Identifying sources of operating system jitter through fine-grained kernel instrumentation. In: 2007 IEEE International Conference on Cluster Computing, Austin, TX: IEEE; 2007:331-340. 15. Jones T. Linux kernel co-scheduling and bulk synchronous parallelism. Int J High Perform Comput Appl. 2012;26(2):136-145. 16. Ferreira KB, Brightwell R, Bridges PG. Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC'08), Austin, TX; 2008:12 pp. 17. Seelam S, Fong L, TantawiA, Lewars J, Divirgilio J, GildeaK. Extreme scale computing:modeling the impact of system noise inmulti-core clusteredsystems. J Parallel Distrib Comput. 2013;73(7):898-910. 18. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Understanding performance interference in next-generation hpc systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Press; 2016:384-395. 19. Levy S, Ferreira KB, Widener P, Bridges PG, Mondragon OH. How I learned to stop worrying and love in situ analytics. In: EuroMPI 2016, Edinburgh, United Kingdom; 2016:140-153. 20. Hammouda A, Siegel AR, Siegel SF. Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans Parallel Comput. 2015;2(1):1-33. 21. Corbett JC, Dean J, Epstein M, et al. Spanner: Googles globally distributed database. ACM Trans Comput Syst (TOCS). 2013;31(3):1-22. 22. Liskov B. Practical uses of synchronized clocks in distributed systems. Distrib Comput. 1993;6(4):211-219. 23. Hoefler T, Schneider T, Lumsdaine A. Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings ofthe 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA: IEEE Computer Society; 2010:1-11. 24. Fidge C. Logical time in distributed computing systems. Computer. 1991;24(8):28-33. 25. Elson J, Girod L, Estrin D. Fine-grained network time synchronization using reference broadcasts. ACM SIGOPS Oper Syst Rev. 2002;36(SI):147-163. 26. Lamport L. Time, clocks, and the ordering of events in a distributed system. Commun ACM. 1978;21(7):558-565. 27. Mattern F. Virtual time and global states of distributed systems. Parallel Distrib Algo. 1989;1(23):215-226. 28. DeRose L, Poxon H. A paradigm change: from performance monitoring to performance analysis. In: Computer Architecture and High Performance Computing, 2009. SBAC-PAĎ S09. 21st International Symposium on, Sao Paulo, Brazil: IEEE; 2009:119-126. 29. Schmuck FB, Haskin RL. Gpfs: A shared-disk file system for large computing clusters. In: Fast, Vol. 2, Monterey, CA; 2002:231-244. 30. Becker D, Linford JC, Rabenseifner R, Wolf F. Replay-based synchronization of timestamps in event traces of massively parallel applications. In: Parallel Processing-Workshops, 2008. ICPP-W̌ S08. International Conference on, Portland, OR: IEEE; 2008:212-219. 31. Mills DL. On the accuracy and stablility of clocks synchronized by the network time protocol in the internet system. ACM SIGCOMM Comput Commun Rev. 1989;20(1):65-75. 32. Ridoux J, Veitch D, Broomhead T. The case for feed-forward clock synchronization. Networking, IEEE/ACM Trans. 2012;20(1):231-242. 33. RADClock Installation. http://www.synclab.org/radclock/installation_linux, Accessed: 02-2017. 34. EEE 1588-2008. IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. Technical Report, IEEE; 24 July 2008:pp. 289. https://doi.org/10.1109/IEEESTD.2008.4579760 35. Correll K, Barendt N, Branicky M. Design considerations for software only implementations of the ieee 1588 precision time protocol. In: Conference on IEEE, Vol. 1588, Winterhur, Switzerland; 2005:11-15. 36. Hong C-Y, Lin C-C, Caesar M. Clockscalpel: understanding root causes of internet clock synchronization inaccuracy. Passive and Active Measurement, Atlanta, GA: Springer; 2011:204-213. 37. Maillet E, Tron C. On efficiently implementing global time for performance evaluation on multiprocessor systems. J Parallel Distrib Comput. 1995;28(1):84-93. 38. Gurewitz O, Cidon I, Sidi M. Network classless time protocol based on clock offset optimization. IEEE/ACM Trans Networking (TON). 2006;14(4):876-888. 39. Gurewitz O, Cidon I, Sidi M. One-way delay estimation using network-wide measurements. IEEE/ACM Trans Networking (TON). 2006;14(SI):2710-2724. 40. Jeske DR. On maximum-likelihood estimation of clock offset. IEEE Trans Commun. 2005;53(1):53-54. 41. Doleschal J, Knüpfer A, Müller MS, Nagel WE. Internal timer synchronization for parallel event tracing. European Parallel Virtual Machine/Message Passing Interface Users Group Meeting, Dublin, Ireland: Springer; 2008:202-209. 42. Jones T, Koenig GA. A clock synchronization strategy for minimizing clock variance at runtime in high-end computing environments. In: Computer Architecture and High Performance Computing (SBAC-PAD), 2010 22nd International Symposium on, Petrópolis, Rio de Janeiro, Brazil: IEEE; 2010:207-214. 43. Jones T, Koenig GA. Clock synchronization in high-end computing environments: a strategy for minimizing clock variance at runtime. Concurr Comput Practice Experience. 2013;25(6):881-897. 44. OLCF. The Oak Ridge Leadership Computing Facility. https://www.olcf.ornl.gov, Accessed: 03-2015. 45. Cray XK7 Data Sheet. http://www.cray.com/Products/Computing/XK7/Specifications.aspx, Accessed: 03-2015. 46. Alverson R, Roweth D, Kaplan L. The gemini system interconnect. In: High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on, Mountain View, CA: IEEE; 2010:83-87. 47. Faanes G, Bataineh A, Roweth D, et al. Cray cascade: a scalable hpc system based on a dragonfly network. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Computer Society Press; 2012:103. 48. Cray XC Data Sheet. http://www.cray.com/Products/Computing/XC/Specs/Spcifications-XC30.aspx, Accessed: 03-2015. 49. Alverson B, Froese E, Kaplan L, Roweth D. Cray xc series network. Cray Inc., White Paper WP-Aries01-1112; 2012. 50. Kumaran K. Introduction to Mira. In: Code for Q Workshop, Lemont, IL; 2012:24 pp. 51. Morozov V, Kumaran K, Vishwanath V, Meng J, Papka ME. Early experience on the blue gene/q supercomputing system. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, Boston, MA: IEEE; 2013:1229-1240. 52. Milano J, Lembke P, et al. IBM System Blue Gene Solution: Blue Gene/Q Hardware Overview and Installation Planning, Armonk, NY: IBM Redbooks; 2013. 53. Chiu G, Coteus P, Wisniewski R, Sexton J. BlueGene/Q Overview and Update. https://www.alcf.anl.gov/files/IBM_BGQ_Architecture_0.pdf, Accessed:04-2015. 54. Kim J, Dally WJ, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. In: ACM SIGARCH Computer Architecture News, Vol. 36, New York, NY: IEEE Computer Society; 2008:77-88. 55. Mink A, Carpenter RJ, Courson M. Time synchronized measurements in cluster computing systems. Parity. 2000;1:1-7. 56. Mizrahi T, Moses Y. Serving time in the cloud: Why time-as-a-service? In: Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on, San Francisco, CA: IEEE; 2016:95-96. 57. Configuring Mellanox ConnectX for PTP. https://community.mellanox.com/docs/DOC-2403, Accessed: 02-2017. 58. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2015. |
dc.rights.spa.fl_str_mv |
Derechos Reservados - Universidad Autónoma de Occidente |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.uri.eng.fl_str_mv |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.accessrights.eng.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.creativecommons.spa.fl_str_mv |
Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0) |
rights_invalid_str_mv |
Derechos Reservados - Universidad Autónoma de Occidente https://creativecommons.org/licenses/by-nc-nd/4.0/ Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0) http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.eng.fl_str_mv |
application/pdf |
dc.format.extent.spa.fl_str_mv |
16 páginas |
dc.coverage.spatial.spa.fl_str_mv |
Universidad Autónoma de Occidente. Calle 25 115-85. Km 2 vía Cali-Jamundí |
dc.publisher.eng.fl_str_mv |
Wiley |
institution |
Universidad Autónoma de Occidente |
bitstream.url.fl_str_mv |
https://red.uao.edu.co/bitstreams/e9068bfd-5488-47d6-bcf4-b5501b976cb1/download https://red.uao.edu.co/bitstreams/3d8ff5cc-7428-458a-b55c-f17f9a3f1b95/download https://red.uao.edu.co/bitstreams/2914f52a-d4c3-4c96-b1f8-894c6bafbaa5/download https://red.uao.edu.co/bitstreams/d003ff0d-bdad-4a2a-8e01-5ad29f77064b/download https://red.uao.edu.co/bitstreams/602ee01c-8d8a-47c4-9d64-6a92cd388c64/download |
bitstream.checksum.fl_str_mv |
4460e5956bc1d1639be9ae6146a50347 20b5ba22b1117f71589c7318baa2c560 cb0751115620b1659871ab4186cd148c 3dda254ecda4dc033fad01a38e031d6a fc5faa884f701dd58d5fae2a8a654e54 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Digital Universidad Autonoma de Occidente |
repository.mail.fl_str_mv |
repositorio@uao.edu.co |
_version_ |
1814259872972144640 |
spelling |
Mondragón Martínez, Oscar Hernánvirtual::3375-1Jones, Terryb0e17ed01435636d921e40d53fea4f84Bridges, Patrick8c94ff840306011034c23c006f4679ceOstrouchov, Georgea5ff67dd80bcc60921f313a0759092c6Koenig, Gregory A.b4879dce4d03b5eb75ff6c702ea90787Universidad Autónoma de Occidente. Calle 25 115-85. Km 2 vía Cali-Jamundí2019-10-09T21:16:13Z2019-10-09T21:16:13Z2018022515320634 (en línea)15320626 (impresa)http://hdl.handle.net/10614/11190https://doi.org/10.1002/cpe.4341We present a detailed examination of time agreement characteristics for nodes within extreme‐scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high‐performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomingsapplication/pdf16 páginasengWileyDerechos Reservados - Universidad Autónoma de Occidentehttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessAtribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)http://purl.org/coar/access_right/c_abf2An evaluation of the state of time synchronization on leadership class supercomputersArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTREFinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/version/c_970fb48d4fbd8a85Ingeniería de computaciónComputer engineeringClock synchronizationLarge-scale systemsSystem softwareTime service164130Jones, T., Ostrouchov, G., Koenig, G. A., Mondragon, O. H., & Bridges, P. G. (2018). An evaluation of the state of time synchronization on leadership class supercomputers. Concurrency and Computation, 30 (4), 1-16. DOI: 10.1002/cpe.4341Concurrency and Computation. Practice and Experience1. Veitch D, Ridoux J, Korada SB. Robust synchronization of absolute and difference clocks over networks. IEEE/ACM Trans Networking (TON). 2009;17(2):417-430.2. Mills DL. Internet time synchronization: the network time protocol. Commun IEEE Trans. 1991;39(10):1482-1493.3. Top 500 Supercomputing Sites. https://www.top500.org, Accessed: 04-2015.4. Valiant LG. A bridging model for parallel computation. Commun ACM. 1990;33(8):103-111.5. Oliner AJ, Kulkarni AV, Aiken A. Using correlated surprise to infer shared influence. In: 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL: IEEE; 2010:191-200.6. Oeste S, Knüpfer A, Ilsche T. Towards parallel performance analysis tools for the openSHMEM standard. Workshop on OpenSHMEM and Related Technologies, Annapolis, MD: Springer; 2014:90-104.7. Marangos N, Rizomiliotis P,Mitrou L. Time synchronization: pivotal element in cloud forensics. Security and CommunicationNetworks. 2014;9(6):571-582.8. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Scheduling in-situ analytics in next-generation applications. In: Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, Cartegena, Colombia: IEEE; 2016:102-105.9. Mondragon OH, Bridges PG, Jones T. Quantifying scheduling challenges for exascale system software. In: Proceedings of the 5th InternationalWorkshop on Runtime and Operating Systems for Supercomputers, Portland, OR: ACM; 2015:1-8.10. Jones T, Dawson S, Neely R, et al. Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: Supercomputing, 2003 ACM/IEEE Conference, Phoenix, AZ: IEEE; 2003:10-10.11. Feitelson DG, Rudolph L. Gang scheduling performance benefits for fine-grain synchronization. J Parallel Distrib Comput. 1992;16(4):306-318.12. Brightwell R, Oldfield R,Maccabe AB, Bernholdt DE. Hobbes: Composition and virtualization as the foundations of an extreme-scale os/r. In: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, Eugene, OR: ACM; 2013:1-8.13. Akkan H, Lang M, Liebrock L. Understanding and isolating the noise in the linux kernel. Int J High Perform Comput Appl. 2013;27(2):136-146.14. De P, Kothari R, Mann V. Identifying sources of operating system jitter through fine-grained kernel instrumentation. In: 2007 IEEE International Conference on Cluster Computing, Austin, TX: IEEE; 2007:331-340.15. Jones T. Linux kernel co-scheduling and bulk synchronous parallelism. Int J High Perform Comput Appl. 2012;26(2):136-145.16. Ferreira KB, Brightwell R, Bridges PG. Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC'08), Austin, TX; 2008:12 pp.17. Seelam S, Fong L, TantawiA, Lewars J, Divirgilio J, GildeaK. Extreme scale computing:modeling the impact of system noise inmulti-core clusteredsystems. J Parallel Distrib Comput. 2013;73(7):898-910.18. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Understanding performance interference in next-generation hpc systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Press; 2016:384-395.19. Levy S, Ferreira KB, Widener P, Bridges PG, Mondragon OH. How I learned to stop worrying and love in situ analytics. In: EuroMPI 2016, Edinburgh, United Kingdom; 2016:140-153.20. Hammouda A, Siegel AR, Siegel SF. Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans Parallel Comput. 2015;2(1):1-33.21. Corbett JC, Dean J, Epstein M, et al. Spanner: Googles globally distributed database. ACM Trans Comput Syst (TOCS). 2013;31(3):1-22.22. Liskov B. Practical uses of synchronized clocks in distributed systems. Distrib Comput. 1993;6(4):211-219.23. Hoefler T, Schneider T, Lumsdaine A. Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings ofthe 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA: IEEE Computer Society; 2010:1-11.24. Fidge C. Logical time in distributed computing systems. Computer. 1991;24(8):28-33.25. Elson J, Girod L, Estrin D. Fine-grained network time synchronization using reference broadcasts. ACM SIGOPS Oper Syst Rev. 2002;36(SI):147-163.26. Lamport L. Time, clocks, and the ordering of events in a distributed system. Commun ACM. 1978;21(7):558-565.27. Mattern F. Virtual time and global states of distributed systems. Parallel Distrib Algo. 1989;1(23):215-226.28. DeRose L, Poxon H. A paradigm change: from performance monitoring to performance analysis. In: Computer Architecture and High Performance Computing, 2009. SBAC-PAĎ S09. 21st International Symposium on, Sao Paulo, Brazil: IEEE; 2009:119-126.29. Schmuck FB, Haskin RL. Gpfs: A shared-disk file system for large computing clusters. In: Fast, Vol. 2, Monterey, CA; 2002:231-244.30. Becker D, Linford JC, Rabenseifner R, Wolf F. Replay-based synchronization of timestamps in event traces of massively parallel applications. In: Parallel Processing-Workshops, 2008. ICPP-W̌ S08. International Conference on, Portland, OR: IEEE; 2008:212-219.31. Mills DL. On the accuracy and stablility of clocks synchronized by the network time protocol in the internet system. ACM SIGCOMM Comput Commun Rev. 1989;20(1):65-75.32. Ridoux J, Veitch D, Broomhead T. The case for feed-forward clock synchronization. Networking, IEEE/ACM Trans. 2012;20(1):231-242.33. RADClock Installation. http://www.synclab.org/radclock/installation_linux, Accessed: 02-2017.34. EEE 1588-2008. IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. Technical Report, IEEE; 24 July 2008:pp. 289. https://doi.org/10.1109/IEEESTD.2008.457976035. Correll K, Barendt N, Branicky M. Design considerations for software only implementations of the ieee 1588 precision time protocol. In: Conference on IEEE, Vol. 1588, Winterhur, Switzerland; 2005:11-15.36. Hong C-Y, Lin C-C, Caesar M. Clockscalpel: understanding root causes of internet clock synchronization inaccuracy. Passive and Active Measurement, Atlanta, GA: Springer; 2011:204-213.37. Maillet E, Tron C. On efficiently implementing global time for performance evaluation on multiprocessor systems. J Parallel Distrib Comput. 1995;28(1):84-93.38. Gurewitz O, Cidon I, Sidi M. Network classless time protocol based on clock offset optimization. IEEE/ACM Trans Networking (TON). 2006;14(4):876-888.39. Gurewitz O, Cidon I, Sidi M. One-way delay estimation using network-wide measurements. IEEE/ACM Trans Networking (TON). 2006;14(SI):2710-2724.40. Jeske DR. On maximum-likelihood estimation of clock offset. IEEE Trans Commun. 2005;53(1):53-54.41. Doleschal J, Knüpfer A, Müller MS, Nagel WE. Internal timer synchronization for parallel event tracing. European Parallel Virtual Machine/Message Passing Interface Users Group Meeting, Dublin, Ireland: Springer; 2008:202-209.42. Jones T, Koenig GA. A clock synchronization strategy for minimizing clock variance at runtime in high-end computing environments. In: Computer Architecture and High Performance Computing (SBAC-PAD), 2010 22nd International Symposium on, Petrópolis, Rio de Janeiro, Brazil: IEEE; 2010:207-214.43. Jones T, Koenig GA. Clock synchronization in high-end computing environments: a strategy for minimizing clock variance at runtime. Concurr Comput Practice Experience. 2013;25(6):881-897.44. OLCF. The Oak Ridge Leadership Computing Facility. https://www.olcf.ornl.gov, Accessed: 03-2015.45. Cray XK7 Data Sheet. http://www.cray.com/Products/Computing/XK7/Specifications.aspx, Accessed: 03-2015.46. Alverson R, Roweth D, Kaplan L. The gemini system interconnect. In: High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on, Mountain View, CA: IEEE; 2010:83-87.47. Faanes G, Bataineh A, Roweth D, et al. Cray cascade: a scalable hpc system based on a dragonfly network. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Computer Society Press; 2012:103.48. Cray XC Data Sheet. http://www.cray.com/Products/Computing/XC/Specs/Spcifications-XC30.aspx, Accessed: 03-2015.49. Alverson B, Froese E, Kaplan L, Roweth D. Cray xc series network. Cray Inc., White Paper WP-Aries01-1112; 2012.50. Kumaran K. Introduction to Mira. In: Code for Q Workshop, Lemont, IL; 2012:24 pp.51. Morozov V, Kumaran K, Vishwanath V, Meng J, Papka ME. Early experience on the blue gene/q supercomputing system. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, Boston, MA: IEEE; 2013:1229-1240.52. Milano J, Lembke P, et al. IBM System Blue Gene Solution: Blue Gene/Q Hardware Overview and Installation Planning, Armonk, NY: IBM Redbooks; 2013.53. Chiu G, Coteus P, Wisniewski R, Sexton J. BlueGene/Q Overview and Update. https://www.alcf.anl.gov/files/IBM_BGQ_Architecture_0.pdf, Accessed:04-2015.54. Kim J, Dally WJ, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. In: ACM SIGARCH Computer Architecture News, Vol. 36, New York, NY: IEEE Computer Society; 2008:77-88.55. Mink A, Carpenter RJ, Courson M. Time synchronized measurements in cluster computing systems. Parity. 2000;1:1-7.56. Mizrahi T, Moses Y. Serving time in the cloud: Why time-as-a-service? In: Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on, San Francisco, CA: IEEE; 2016:95-96.57. Configuring Mellanox ConnectX for PTP. https://community.mellanox.com/docs/DOC-2403, Accessed: 02-2017.58. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2015.Publication12461af9-e662-45ff-8b8a-4266f2973120virtual::3375-112461af9-e662-45ff-8b8a-4266f2973120virtual::3375-1https://scholar.google.com/citations?user=oD3MzGcAAAAJ&hl=esvirtual::3375-10000-0002-5772-6545virtual::3375-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001099140virtual::3375-1CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://red.uao.edu.co/bitstreams/e9068bfd-5488-47d6-bcf4-b5501b976cb1/download4460e5956bc1d1639be9ae6146a50347MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81665https://red.uao.edu.co/bitstreams/3d8ff5cc-7428-458a-b55c-f17f9a3f1b95/download20b5ba22b1117f71589c7318baa2c560MD53ORIGINALAn evaluation of the state of time synchronization on leadership class supercomputers.pdfAn evaluation of the state of time synchronization on leadership class supercomputers.pdfTexto archivo completo del artículo de revista, PDFapplication/pdf1824567https://red.uao.edu.co/bitstreams/2914f52a-d4c3-4c96-b1f8-894c6bafbaa5/downloadcb0751115620b1659871ab4186cd148cMD54TEXTAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.txtAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.txtExtracted texttext/plain72580https://red.uao.edu.co/bitstreams/d003ff0d-bdad-4a2a-8e01-5ad29f77064b/download3dda254ecda4dc033fad01a38e031d6aMD55THUMBNAILAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.jpgAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.jpgGenerated Thumbnailimage/jpeg15600https://red.uao.edu.co/bitstreams/602ee01c-8d8a-47c4-9d64-6a92cd388c64/downloadfc5faa884f701dd58d5fae2a8a654e54MD5610614/11190oai:red.uao.edu.co:10614/111902024-03-11 09:32:58.592https://creativecommons.org/licenses/by-nc-nd/4.0/Derechos Reservados - Universidad Autónoma de Occidenteopen.accesshttps://red.uao.edu.coRepositorio Digital Universidad Autonoma de Occidenterepositorio@uao.edu.coRUwgQVVUT1IgYXV0b3JpemEgYSBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgT2NjaWRlbnRlLCBkZSBmb3JtYSBpbmRlZmluaWRhLCBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgbGEgTGV5IDQ0IGRlIDE5OTMsIGxhIERlY2lzacOzbiBhbmRpbmEgMzUxIGRlIDE5OTMsIGVsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbGV5ZXMgeSBqdXJpc3BydWRlbmNpYSB2aWdlbnRlIGFsIHJlc3BlY3RvLCBoYWdhIHB1YmxpY2FjacOzbiBkZSBlc3RlIGNvbiBmaW5lcyBlZHVjYXRpdm9zLiBQQVJBR1JBRk86IEVzdGEgYXV0b3JpemFjacOzbiBhZGVtw6FzIGRlIHNlciB2w6FsaWRhIHBhcmEgbGFzIGZhY3VsdGFkZXMgeSBkZXJlY2hvcyBkZSB1c28gc29icmUgbGEgb2JyYSBlbiBmb3JtYXRvIG8gc29wb3J0ZSBtYXRlcmlhbCwgdGFtYmnDqW4gcGFyYSBmb3JtYXRvIGRpZ2l0YWwsIGVsZWN0csOzbmljbywgdmlydHVhbCwgcGFyYSB1c29zIGVuIHJlZCwgSW50ZXJuZXQsIGV4dHJhbmV0LCBpbnRyYW5ldCwgYmlibGlvdGVjYSBkaWdpdGFsIHkgZGVtw6FzIHBhcmEgY3VhbHF1aWVyIGZvcm1hdG8gY29ub2NpZG8gbyBwb3IgY29ub2Nlci4gRUwgQVVUT1IsIGV4cHJlc2EgcXVlIGVsIGRvY3VtZW50byAodHJhYmFqbyBkZSBncmFkbywgcGFzYW50w61hLCBjYXNvcyBvIHRlc2lzKSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIGVsYWJvcsOzIHNpbiBxdWVicmFudGFyIG5pIHN1cGxhbnRhciBsb3MgZGVyZWNob3MgZGUgYXV0b3IgZGUgdGVyY2Vyb3MsIHkgZGUgdGFsIGZvcm1hLCBlbCBkb2N1bWVudG8gKHRyYWJham8gZGUgZ3JhZG8sIHBhc2FudMOtYSwgY2Fzb3MgbyB0ZXNpcykgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgw6lzdGUuIFBBUkFHUkFGTzogZW4gY2FzbyBkZSBwcmVzZW50YXJzZSBhbGd1bmEgcmVjbGFtYWNpw7NuIG8gYWNjacOzbiBwb3IgcGFydGUgZGUgdW4gdGVyY2VybywgcmVmZXJlbnRlIGEgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIHNvYnJlIGVsIGRvY3VtZW50byAoVHJhYmFqbyBkZSBncmFkbywgUGFzYW50w61hLCBjYXNvcyBvIHRlc2lzKSBlbiBjdWVzdGnDs24sIEVMIEFVVE9SLCBhc3VtaXLDoSBsYSByZXNwb25zYWJpbGlkYWQgdG90YWwsIHkgc2FsZHLDoSBlbiBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvczsgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcywgbGEgVW5pdmVyc2lkYWQgIEF1dMOzbm9tYSBkZSBPY2NpZGVudGUgYWN0w7phIGNvbW8gdW4gdGVyY2VybyBkZSBidWVuYSBmZS4gVG9kYSBwZXJzb25hIHF1ZSBjb25zdWx0ZSB5YSBzZWEgZW4gbGEgYmlibGlvdGVjYSBvIGVuIG1lZGlvIGVsZWN0csOzbmljbyBwb2Ryw6EgY29waWFyIGFwYXJ0ZXMgZGVsIHRleHRvIGNpdGFuZG8gc2llbXByZSBsYSBmdWVudGUsIGVzIGRlY2lyIGVsIHTDrXR1bG8gZGVsIHRyYWJham8geSBlbCBhdXRvci4gRXN0YSBhdXRvcml6YWNpw7NuIG5vIGltcGxpY2EgcmVudW5jaWEgYSBsYSBmYWN1bHRhZCBxdWUgdGllbmUgRUwgQVVUT1IgZGUgcHVibGljYXIgdG90YWwgbyBwYXJjaWFsbWVudGUgbGEgb2JyYS4K |