An evaluation of the state of time synchronization on leadership class supercomputers

We present a detailed examination of time agreement characteristics for nodes within extreme‐scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high‐performance computers sited at three national laborat...

Full description

Autores:
Mondragón Martínez, Oscar Hernán
Jones, Terry
Bridges, Patrick
Ostrouchov, George
Koenig, Gregory A.
Tipo de recurso:
Article of journal
Fecha de publicación:
2019
Institución:
Universidad Autónoma de Occidente
Repositorio:
RED: Repositorio Educativo Digital UAO
Idioma:
eng
OAI Identifier:
oai:red.uao.edu.co:10614/11190
Acceso en línea:
http://hdl.handle.net/10614/11190
https://doi.org/10.1002/cpe.4341
Palabra clave:
Ingeniería de computación
Computer engineering
Clock synchronization
Large-scale systems
System software
Time service
Rights
openAccess
License
Derechos Reservados - Universidad Autónoma de Occidente
id REPOUAO2_6d81f1ec1de1ec178804bbd9d06bf797
oai_identifier_str oai:red.uao.edu.co:10614/11190
network_acronym_str REPOUAO2
network_name_str RED: Repositorio Educativo Digital UAO
repository_id_str
dc.title.eng.fl_str_mv An evaluation of the state of time synchronization on leadership class supercomputers
title An evaluation of the state of time synchronization on leadership class supercomputers
spellingShingle An evaluation of the state of time synchronization on leadership class supercomputers
Ingeniería de computación
Computer engineering
Clock synchronization
Large-scale systems
System software
Time service
title_short An evaluation of the state of time synchronization on leadership class supercomputers
title_full An evaluation of the state of time synchronization on leadership class supercomputers
title_fullStr An evaluation of the state of time synchronization on leadership class supercomputers
title_full_unstemmed An evaluation of the state of time synchronization on leadership class supercomputers
title_sort An evaluation of the state of time synchronization on leadership class supercomputers
dc.creator.fl_str_mv Mondragón Martínez, Oscar Hernán
Jones, Terry
Bridges, Patrick
Ostrouchov, George
Koenig, Gregory A.
dc.contributor.author.none.fl_str_mv Mondragón Martínez, Oscar Hernán
Jones, Terry
Bridges, Patrick
Ostrouchov, George
Koenig, Gregory A.
dc.subject.lemb.eng.fl_str_mv Ingeniería de computación
topic Ingeniería de computación
Computer engineering
Clock synchronization
Large-scale systems
System software
Time service
dc.subject.lemb.spa.fl_str_mv Computer engineering
dc.subject.proposal.eng.fl_str_mv Clock synchronization
Large-scale systems
System software
Time service
description We present a detailed examination of time agreement characteristics for nodes within extreme‐scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high‐performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings
publishDate 2019
dc.date.issued.none.fl_str_mv 20180225
dc.date.accessioned.none.fl_str_mv 2019-10-09T21:16:13Z
dc.date.available.none.fl_str_mv 2019-10-09T21:16:13Z
dc.type.spa.fl_str_mv Artículo de revista
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.eng.fl_str_mv http://purl.org/coar/resource_type/c_6501
dc.type.content.eng.fl_str_mv Text
dc.type.driver.eng.fl_str_mv info:eu-repo/semantics/article
dc.type.redcol.eng.fl_str_mv http://purl.org/redcol/resource_type/ARTREF
dc.type.version.eng.fl_str_mv info:eu-repo/semantics/publishedVersion
format http://purl.org/coar/resource_type/c_6501
status_str publishedVersion
dc.identifier.issn.spa.fl_str_mv 15320634 (en línea)
15320626 (impresa)
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/10614/11190
dc.identifier.doi.spa.fl_str_mv https://doi.org/10.1002/cpe.4341
identifier_str_mv 15320634 (en línea)
15320626 (impresa)
url http://hdl.handle.net/10614/11190
https://doi.org/10.1002/cpe.4341
dc.language.iso.eng.fl_str_mv eng
language eng
dc.relation.citationendpage.none.fl_str_mv 16
dc.relation.citationissue.none.fl_str_mv 4
dc.relation.citationstartpage.none.fl_str_mv 1
dc.relation.citationvolume.none.fl_str_mv 30
dc.relation.cites.spa.fl_str_mv Jones, T., Ostrouchov, G., Koenig, G. A., Mondragon, O. H., & Bridges, P. G. (2018). An evaluation of the state of time synchronization on leadership class supercomputers. Concurrency and Computation, 30 (4), 1-16. DOI: 10.1002/cpe.4341
dc.relation.ispartofjournal.eng.fl_str_mv Concurrency and Computation. Practice and Experience
dc.relation.references.none.fl_str_mv 1. Veitch D, Ridoux J, Korada SB. Robust synchronization of absolute and difference clocks over networks. IEEE/ACM Trans Networking (TON). 2009;17(2):417-430.
2. Mills DL. Internet time synchronization: the network time protocol. Commun IEEE Trans. 1991;39(10):1482-1493.
3. Top 500 Supercomputing Sites. https://www.top500.org, Accessed: 04-2015.
4. Valiant LG. A bridging model for parallel computation. Commun ACM. 1990;33(8):103-111.
5. Oliner AJ, Kulkarni AV, Aiken A. Using correlated surprise to infer shared influence. In: 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL: IEEE; 2010:191-200.
6. Oeste S, Knüpfer A, Ilsche T. Towards parallel performance analysis tools for the openSHMEM standard. Workshop on OpenSHMEM and Related Technologies, Annapolis, MD: Springer; 2014:90-104.
7. Marangos N, Rizomiliotis P,Mitrou L. Time synchronization: pivotal element in cloud forensics. Security and CommunicationNetworks. 2014;9(6):571-582.
8. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Scheduling in-situ analytics in next-generation applications. In: Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, Cartegena, Colombia: IEEE; 2016:102-105.
9. Mondragon OH, Bridges PG, Jones T. Quantifying scheduling challenges for exascale system software. In: Proceedings of the 5th InternationalWorkshop on Runtime and Operating Systems for Supercomputers, Portland, OR: ACM; 2015:1-8.
10. Jones T, Dawson S, Neely R, et al. Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: Supercomputing, 2003 ACM/IEEE Conference, Phoenix, AZ: IEEE; 2003:10-10.
11. Feitelson DG, Rudolph L. Gang scheduling performance benefits for fine-grain synchronization. J Parallel Distrib Comput. 1992;16(4):306-318.
12. Brightwell R, Oldfield R,Maccabe AB, Bernholdt DE. Hobbes: Composition and virtualization as the foundations of an extreme-scale os/r. In: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, Eugene, OR: ACM; 2013:1-8.
13. Akkan H, Lang M, Liebrock L. Understanding and isolating the noise in the linux kernel. Int J High Perform Comput Appl. 2013;27(2):136-146.
14. De P, Kothari R, Mann V. Identifying sources of operating system jitter through fine-grained kernel instrumentation. In: 2007 IEEE International Conference on Cluster Computing, Austin, TX: IEEE; 2007:331-340.
15. Jones T. Linux kernel co-scheduling and bulk synchronous parallelism. Int J High Perform Comput Appl. 2012;26(2):136-145.
16. Ferreira KB, Brightwell R, Bridges PG. Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC'08), Austin, TX; 2008:12 pp.
17. Seelam S, Fong L, TantawiA, Lewars J, Divirgilio J, GildeaK. Extreme scale computing:modeling the impact of system noise inmulti-core clusteredsystems. J Parallel Distrib Comput. 2013;73(7):898-910.
18. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Understanding performance interference in next-generation hpc systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Press; 2016:384-395.
19. Levy S, Ferreira KB, Widener P, Bridges PG, Mondragon OH. How I learned to stop worrying and love in situ analytics. In: EuroMPI 2016, Edinburgh, United Kingdom; 2016:140-153.
20. Hammouda A, Siegel AR, Siegel SF. Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans Parallel Comput. 2015;2(1):1-33.
21. Corbett JC, Dean J, Epstein M, et al. Spanner: Googles globally distributed database. ACM Trans Comput Syst (TOCS). 2013;31(3):1-22.
22. Liskov B. Practical uses of synchronized clocks in distributed systems. Distrib Comput. 1993;6(4):211-219.
23. Hoefler T, Schneider T, Lumsdaine A. Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings ofthe 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA: IEEE Computer Society; 2010:1-11.
24. Fidge C. Logical time in distributed computing systems. Computer. 1991;24(8):28-33.
25. Elson J, Girod L, Estrin D. Fine-grained network time synchronization using reference broadcasts. ACM SIGOPS Oper Syst Rev. 2002;36(SI):147-163.
26. Lamport L. Time, clocks, and the ordering of events in a distributed system. Commun ACM. 1978;21(7):558-565.
27. Mattern F. Virtual time and global states of distributed systems. Parallel Distrib Algo. 1989;1(23):215-226.
28. DeRose L, Poxon H. A paradigm change: from performance monitoring to performance analysis. In: Computer Architecture and High Performance Computing, 2009. SBAC-PAĎ S09. 21st International Symposium on, Sao Paulo, Brazil: IEEE; 2009:119-126.
29. Schmuck FB, Haskin RL. Gpfs: A shared-disk file system for large computing clusters. In: Fast, Vol. 2, Monterey, CA; 2002:231-244.
30. Becker D, Linford JC, Rabenseifner R, Wolf F. Replay-based synchronization of timestamps in event traces of massively parallel applications. In: Parallel Processing-Workshops, 2008. ICPP-W̌ S08. International Conference on, Portland, OR: IEEE; 2008:212-219.
31. Mills DL. On the accuracy and stablility of clocks synchronized by the network time protocol in the internet system. ACM SIGCOMM Comput Commun Rev. 1989;20(1):65-75.
32. Ridoux J, Veitch D, Broomhead T. The case for feed-forward clock synchronization. Networking, IEEE/ACM Trans. 2012;20(1):231-242.
33. RADClock Installation. http://www.synclab.org/radclock/installation_linux, Accessed: 02-2017.
34. EEE 1588-2008. IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. Technical Report, IEEE; 24 July 2008:pp. 289. https://doi.org/10.1109/IEEESTD.2008.4579760
35. Correll K, Barendt N, Branicky M. Design considerations for software only implementations of the ieee 1588 precision time protocol. In: Conference on IEEE, Vol. 1588, Winterhur, Switzerland; 2005:11-15.
36. Hong C-Y, Lin C-C, Caesar M. Clockscalpel: understanding root causes of internet clock synchronization inaccuracy. Passive and Active Measurement, Atlanta, GA: Springer; 2011:204-213.
37. Maillet E, Tron C. On efficiently implementing global time for performance evaluation on multiprocessor systems. J Parallel Distrib Comput. 1995;28(1):84-93.
38. Gurewitz O, Cidon I, Sidi M. Network classless time protocol based on clock offset optimization. IEEE/ACM Trans Networking (TON). 2006;14(4):876-888.
39. Gurewitz O, Cidon I, Sidi M. One-way delay estimation using network-wide measurements. IEEE/ACM Trans Networking (TON). 2006;14(SI):2710-2724.
40. Jeske DR. On maximum-likelihood estimation of clock offset. IEEE Trans Commun. 2005;53(1):53-54.
41. Doleschal J, Knüpfer A, Müller MS, Nagel WE. Internal timer synchronization for parallel event tracing. European Parallel Virtual Machine/Message Passing Interface Users Group Meeting, Dublin, Ireland: Springer; 2008:202-209.
42. Jones T, Koenig GA. A clock synchronization strategy for minimizing clock variance at runtime in high-end computing environments. In: Computer Architecture and High Performance Computing (SBAC-PAD), 2010 22nd International Symposium on, Petrópolis, Rio de Janeiro, Brazil: IEEE; 2010:207-214.
43. Jones T, Koenig GA. Clock synchronization in high-end computing environments: a strategy for minimizing clock variance at runtime. Concurr Comput Practice Experience. 2013;25(6):881-897.
44. OLCF. The Oak Ridge Leadership Computing Facility. https://www.olcf.ornl.gov, Accessed: 03-2015.
45. Cray XK7 Data Sheet. http://www.cray.com/Products/Computing/XK7/Specifications.aspx, Accessed: 03-2015.
46. Alverson R, Roweth D, Kaplan L. The gemini system interconnect. In: High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on, Mountain View, CA: IEEE; 2010:83-87.
47. Faanes G, Bataineh A, Roweth D, et al. Cray cascade: a scalable hpc system based on a dragonfly network. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Computer Society Press; 2012:103.
48. Cray XC Data Sheet. http://www.cray.com/Products/Computing/XC/Specs/Spcifications-XC30.aspx, Accessed: 03-2015.
49. Alverson B, Froese E, Kaplan L, Roweth D. Cray xc series network. Cray Inc., White Paper WP-Aries01-1112; 2012.
50. Kumaran K. Introduction to Mira. In: Code for Q Workshop, Lemont, IL; 2012:24 pp.
51. Morozov V, Kumaran K, Vishwanath V, Meng J, Papka ME. Early experience on the blue gene/q supercomputing system. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, Boston, MA: IEEE; 2013:1229-1240.
52. Milano J, Lembke P, et al. IBM System Blue Gene Solution: Blue Gene/Q Hardware Overview and Installation Planning, Armonk, NY: IBM Redbooks; 2013.
53. Chiu G, Coteus P, Wisniewski R, Sexton J. BlueGene/Q Overview and Update. https://www.alcf.anl.gov/files/IBM_BGQ_Architecture_0.pdf, Accessed:04-2015.
54. Kim J, Dally WJ, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. In: ACM SIGARCH Computer Architecture News, Vol. 36, New York, NY: IEEE Computer Society; 2008:77-88.
55. Mink A, Carpenter RJ, Courson M. Time synchronized measurements in cluster computing systems. Parity. 2000;1:1-7.
56. Mizrahi T, Moses Y. Serving time in the cloud: Why time-as-a-service? In: Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on, San Francisco, CA: IEEE; 2016:95-96.
57. Configuring Mellanox ConnectX for PTP. https://community.mellanox.com/docs/DOC-2403, Accessed: 02-2017.
58. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2015.
dc.rights.spa.fl_str_mv Derechos Reservados - Universidad Autónoma de Occidente
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.uri.eng.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.eng.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.creativecommons.spa.fl_str_mv Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)
rights_invalid_str_mv Derechos Reservados - Universidad Autónoma de Occidente
https://creativecommons.org/licenses/by-nc-nd/4.0/
Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.eng.fl_str_mv application/pdf
dc.format.extent.spa.fl_str_mv 16 páginas
dc.coverage.spatial.spa.fl_str_mv Universidad Autónoma de Occidente. Calle 25 115-85. Km 2 vía Cali-Jamundí
dc.publisher.eng.fl_str_mv Wiley
institution Universidad Autónoma de Occidente
bitstream.url.fl_str_mv https://red.uao.edu.co/bitstreams/e9068bfd-5488-47d6-bcf4-b5501b976cb1/download
https://red.uao.edu.co/bitstreams/3d8ff5cc-7428-458a-b55c-f17f9a3f1b95/download
https://red.uao.edu.co/bitstreams/2914f52a-d4c3-4c96-b1f8-894c6bafbaa5/download
https://red.uao.edu.co/bitstreams/d003ff0d-bdad-4a2a-8e01-5ad29f77064b/download
https://red.uao.edu.co/bitstreams/602ee01c-8d8a-47c4-9d64-6a92cd388c64/download
bitstream.checksum.fl_str_mv 4460e5956bc1d1639be9ae6146a50347
20b5ba22b1117f71589c7318baa2c560
cb0751115620b1659871ab4186cd148c
3dda254ecda4dc033fad01a38e031d6a
fc5faa884f701dd58d5fae2a8a654e54
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Digital Universidad Autonoma de Occidente
repository.mail.fl_str_mv repositorio@uao.edu.co
_version_ 1808478747761836032
spelling Mondragón Martínez, Oscar Hernánvirtual::3375-1Jones, Terryb0e17ed01435636d921e40d53fea4f84Bridges, Patrick8c94ff840306011034c23c006f4679ceOstrouchov, Georgea5ff67dd80bcc60921f313a0759092c6Koenig, Gregory A.b4879dce4d03b5eb75ff6c702ea90787Universidad Autónoma de Occidente. Calle 25 115-85. Km 2 vía Cali-Jamundí2019-10-09T21:16:13Z2019-10-09T21:16:13Z2018022515320634 (en línea)15320626 (impresa)http://hdl.handle.net/10614/11190https://doi.org/10.1002/cpe.4341We present a detailed examination of time agreement characteristics for nodes within extreme‐scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high‐performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomingsapplication/pdf16 páginasengWileyDerechos Reservados - Universidad Autónoma de Occidentehttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessAtribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)http://purl.org/coar/access_right/c_abf2An evaluation of the state of time synchronization on leadership class supercomputersArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTREFinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/version/c_970fb48d4fbd8a85Ingeniería de computaciónComputer engineeringClock synchronizationLarge-scale systemsSystem softwareTime service164130Jones, T., Ostrouchov, G., Koenig, G. A., Mondragon, O. H., & Bridges, P. G. (2018). An evaluation of the state of time synchronization on leadership class supercomputers. Concurrency and Computation, 30 (4), 1-16. DOI: 10.1002/cpe.4341Concurrency and Computation. Practice and Experience1. Veitch D, Ridoux J, Korada SB. Robust synchronization of absolute and difference clocks over networks. IEEE/ACM Trans Networking (TON). 2009;17(2):417-430.2. Mills DL. Internet time synchronization: the network time protocol. Commun IEEE Trans. 1991;39(10):1482-1493.3. Top 500 Supercomputing Sites. https://www.top500.org, Accessed: 04-2015.4. Valiant LG. A bridging model for parallel computation. Commun ACM. 1990;33(8):103-111.5. Oliner AJ, Kulkarni AV, Aiken A. Using correlated surprise to infer shared influence. In: 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL: IEEE; 2010:191-200.6. Oeste S, Knüpfer A, Ilsche T. Towards parallel performance analysis tools for the openSHMEM standard. Workshop on OpenSHMEM and Related Technologies, Annapolis, MD: Springer; 2014:90-104.7. Marangos N, Rizomiliotis P,Mitrou L. Time synchronization: pivotal element in cloud forensics. Security and CommunicationNetworks. 2014;9(6):571-582.8. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Scheduling in-situ analytics in next-generation applications. In: Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on, Cartegena, Colombia: IEEE; 2016:102-105.9. Mondragon OH, Bridges PG, Jones T. Quantifying scheduling challenges for exascale system software. In: Proceedings of the 5th InternationalWorkshop on Runtime and Operating Systems for Supercomputers, Portland, OR: ACM; 2015:1-8.10. Jones T, Dawson S, Neely R, et al. Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: Supercomputing, 2003 ACM/IEEE Conference, Phoenix, AZ: IEEE; 2003:10-10.11. Feitelson DG, Rudolph L. Gang scheduling performance benefits for fine-grain synchronization. J Parallel Distrib Comput. 1992;16(4):306-318.12. Brightwell R, Oldfield R,Maccabe AB, Bernholdt DE. Hobbes: Composition and virtualization as the foundations of an extreme-scale os/r. In: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, Eugene, OR: ACM; 2013:1-8.13. Akkan H, Lang M, Liebrock L. Understanding and isolating the noise in the linux kernel. Int J High Perform Comput Appl. 2013;27(2):136-146.14. De P, Kothari R, Mann V. Identifying sources of operating system jitter through fine-grained kernel instrumentation. In: 2007 IEEE International Conference on Cluster Computing, Austin, TX: IEEE; 2007:331-340.15. Jones T. Linux kernel co-scheduling and bulk synchronous parallelism. Int J High Perform Comput Appl. 2012;26(2):136-145.16. Ferreira KB, Brightwell R, Bridges PG. Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC'08), Austin, TX; 2008:12 pp.17. Seelam S, Fong L, TantawiA, Lewars J, Divirgilio J, GildeaK. Extreme scale computing:modeling the impact of system noise inmulti-core clusteredsystems. J Parallel Distrib Comput. 2013;73(7):898-910.18. Mondragon OH, Bridges PG, Levy S, Ferreira KB, Widener P. Understanding performance interference in next-generation hpc systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Press; 2016:384-395.19. Levy S, Ferreira KB, Widener P, Bridges PG, Mondragon OH. How I learned to stop worrying and love in situ analytics. In: EuroMPI 2016, Edinburgh, United Kingdom; 2016:140-153.20. Hammouda A, Siegel AR, Siegel SF. Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans Parallel Comput. 2015;2(1):1-33.21. Corbett JC, Dean J, Epstein M, et al. Spanner: Googles globally distributed database. ACM Trans Comput Syst (TOCS). 2013;31(3):1-22.22. Liskov B. Practical uses of synchronized clocks in distributed systems. Distrib Comput. 1993;6(4):211-219.23. Hoefler T, Schneider T, Lumsdaine A. Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings ofthe 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA: IEEE Computer Society; 2010:1-11.24. Fidge C. Logical time in distributed computing systems. Computer. 1991;24(8):28-33.25. Elson J, Girod L, Estrin D. Fine-grained network time synchronization using reference broadcasts. ACM SIGOPS Oper Syst Rev. 2002;36(SI):147-163.26. Lamport L. Time, clocks, and the ordering of events in a distributed system. Commun ACM. 1978;21(7):558-565.27. Mattern F. Virtual time and global states of distributed systems. Parallel Distrib Algo. 1989;1(23):215-226.28. DeRose L, Poxon H. A paradigm change: from performance monitoring to performance analysis. In: Computer Architecture and High Performance Computing, 2009. SBAC-PAĎ S09. 21st International Symposium on, Sao Paulo, Brazil: IEEE; 2009:119-126.29. Schmuck FB, Haskin RL. Gpfs: A shared-disk file system for large computing clusters. In: Fast, Vol. 2, Monterey, CA; 2002:231-244.30. Becker D, Linford JC, Rabenseifner R, Wolf F. Replay-based synchronization of timestamps in event traces of massively parallel applications. In: Parallel Processing-Workshops, 2008. ICPP-W̌ S08. International Conference on, Portland, OR: IEEE; 2008:212-219.31. Mills DL. On the accuracy and stablility of clocks synchronized by the network time protocol in the internet system. ACM SIGCOMM Comput Commun Rev. 1989;20(1):65-75.32. Ridoux J, Veitch D, Broomhead T. The case for feed-forward clock synchronization. Networking, IEEE/ACM Trans. 2012;20(1):231-242.33. RADClock Installation. http://www.synclab.org/radclock/installation_linux, Accessed: 02-2017.34. EEE 1588-2008. IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. Technical Report, IEEE; 24 July 2008:pp. 289. https://doi.org/10.1109/IEEESTD.2008.457976035. Correll K, Barendt N, Branicky M. Design considerations for software only implementations of the ieee 1588 precision time protocol. In: Conference on IEEE, Vol. 1588, Winterhur, Switzerland; 2005:11-15.36. Hong C-Y, Lin C-C, Caesar M. Clockscalpel: understanding root causes of internet clock synchronization inaccuracy. Passive and Active Measurement, Atlanta, GA: Springer; 2011:204-213.37. Maillet E, Tron C. On efficiently implementing global time for performance evaluation on multiprocessor systems. J Parallel Distrib Comput. 1995;28(1):84-93.38. Gurewitz O, Cidon I, Sidi M. Network classless time protocol based on clock offset optimization. IEEE/ACM Trans Networking (TON). 2006;14(4):876-888.39. Gurewitz O, Cidon I, Sidi M. One-way delay estimation using network-wide measurements. IEEE/ACM Trans Networking (TON). 2006;14(SI):2710-2724.40. Jeske DR. On maximum-likelihood estimation of clock offset. IEEE Trans Commun. 2005;53(1):53-54.41. Doleschal J, Knüpfer A, Müller MS, Nagel WE. Internal timer synchronization for parallel event tracing. European Parallel Virtual Machine/Message Passing Interface Users Group Meeting, Dublin, Ireland: Springer; 2008:202-209.42. Jones T, Koenig GA. A clock synchronization strategy for minimizing clock variance at runtime in high-end computing environments. In: Computer Architecture and High Performance Computing (SBAC-PAD), 2010 22nd International Symposium on, Petrópolis, Rio de Janeiro, Brazil: IEEE; 2010:207-214.43. Jones T, Koenig GA. Clock synchronization in high-end computing environments: a strategy for minimizing clock variance at runtime. Concurr Comput Practice Experience. 2013;25(6):881-897.44. OLCF. The Oak Ridge Leadership Computing Facility. https://www.olcf.ornl.gov, Accessed: 03-2015.45. Cray XK7 Data Sheet. http://www.cray.com/Products/Computing/XK7/Specifications.aspx, Accessed: 03-2015.46. Alverson R, Roweth D, Kaplan L. The gemini system interconnect. In: High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on, Mountain View, CA: IEEE; 2010:83-87.47. Faanes G, Bataineh A, Roweth D, et al. Cray cascade: a scalable hpc system based on a dragonfly network. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT: IEEE Computer Society Press; 2012:103.48. Cray XC Data Sheet. http://www.cray.com/Products/Computing/XC/Specs/Spcifications-XC30.aspx, Accessed: 03-2015.49. Alverson B, Froese E, Kaplan L, Roweth D. Cray xc series network. Cray Inc., White Paper WP-Aries01-1112; 2012.50. Kumaran K. Introduction to Mira. In: Code for Q Workshop, Lemont, IL; 2012:24 pp.51. Morozov V, Kumaran K, Vishwanath V, Meng J, Papka ME. Early experience on the blue gene/q supercomputing system. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, Boston, MA: IEEE; 2013:1229-1240.52. Milano J, Lembke P, et al. IBM System Blue Gene Solution: Blue Gene/Q Hardware Overview and Installation Planning, Armonk, NY: IBM Redbooks; 2013.53. Chiu G, Coteus P, Wisniewski R, Sexton J. BlueGene/Q Overview and Update. https://www.alcf.anl.gov/files/IBM_BGQ_Architecture_0.pdf, Accessed:04-2015.54. Kim J, Dally WJ, Scott S, Abts D. Technology-driven, highly-scalable dragonfly topology. In: ACM SIGARCH Computer Architecture News, Vol. 36, New York, NY: IEEE Computer Society; 2008:77-88.55. Mink A, Carpenter RJ, Courson M. Time synchronized measurements in cluster computing systems. Parity. 2000;1:1-7.56. Mizrahi T, Moses Y. Serving time in the cloud: Why time-as-a-service? In: Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on, San Francisco, CA: IEEE; 2016:95-96.57. Configuring Mellanox ConnectX for PTP. https://community.mellanox.com/docs/DOC-2403, Accessed: 02-2017.58. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2015.Publication12461af9-e662-45ff-8b8a-4266f2973120virtual::3375-112461af9-e662-45ff-8b8a-4266f2973120virtual::3375-1https://scholar.google.com/citations?user=oD3MzGcAAAAJ&hl=esvirtual::3375-10000-0002-5772-6545virtual::3375-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001099140virtual::3375-1CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://red.uao.edu.co/bitstreams/e9068bfd-5488-47d6-bcf4-b5501b976cb1/download4460e5956bc1d1639be9ae6146a50347MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81665https://red.uao.edu.co/bitstreams/3d8ff5cc-7428-458a-b55c-f17f9a3f1b95/download20b5ba22b1117f71589c7318baa2c560MD53ORIGINALAn evaluation of the state of time synchronization on leadership class supercomputers.pdfAn evaluation of the state of time synchronization on leadership class supercomputers.pdfTexto archivo completo del artículo de revista, PDFapplication/pdf1824567https://red.uao.edu.co/bitstreams/2914f52a-d4c3-4c96-b1f8-894c6bafbaa5/downloadcb0751115620b1659871ab4186cd148cMD54TEXTAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.txtAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.txtExtracted texttext/plain72580https://red.uao.edu.co/bitstreams/d003ff0d-bdad-4a2a-8e01-5ad29f77064b/download3dda254ecda4dc033fad01a38e031d6aMD55THUMBNAILAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.jpgAn evaluation of the state of time synchronization on leadership class supercomputers.pdf.jpgGenerated Thumbnailimage/jpeg15600https://red.uao.edu.co/bitstreams/602ee01c-8d8a-47c4-9d64-6a92cd388c64/downloadfc5faa884f701dd58d5fae2a8a654e54MD5610614/11190oai:red.uao.edu.co:10614/111902024-03-11 09:32:58.592https://creativecommons.org/licenses/by-nc-nd/4.0/Derechos Reservados - Universidad Autónoma de Occidenteopen.accesshttps://red.uao.edu.coRepositorio Digital Universidad Autonoma de Occidenterepositorio@uao.edu.coRUwgQVVUT1IgYXV0b3JpemEgYSBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgT2NjaWRlbnRlLCBkZSBmb3JtYSBpbmRlZmluaWRhLCBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgbGEgTGV5IDQ0IGRlIDE5OTMsIGxhIERlY2lzacOzbiBhbmRpbmEgMzUxIGRlIDE5OTMsIGVsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbGV5ZXMgeSBqdXJpc3BydWRlbmNpYSB2aWdlbnRlIGFsIHJlc3BlY3RvLCBoYWdhIHB1YmxpY2FjacOzbiBkZSBlc3RlIGNvbiBmaW5lcyBlZHVjYXRpdm9zLiBQQVJBR1JBRk86IEVzdGEgYXV0b3JpemFjacOzbiBhZGVtw6FzIGRlIHNlciB2w6FsaWRhIHBhcmEgbGFzIGZhY3VsdGFkZXMgeSBkZXJlY2hvcyBkZSB1c28gc29icmUgbGEgb2JyYSBlbiBmb3JtYXRvIG8gc29wb3J0ZSBtYXRlcmlhbCwgdGFtYmnDqW4gcGFyYSBmb3JtYXRvIGRpZ2l0YWwsIGVsZWN0csOzbmljbywgdmlydHVhbCwgcGFyYSB1c29zIGVuIHJlZCwgSW50ZXJuZXQsIGV4dHJhbmV0LCBpbnRyYW5ldCwgYmlibGlvdGVjYSBkaWdpdGFsIHkgZGVtw6FzIHBhcmEgY3VhbHF1aWVyIGZvcm1hdG8gY29ub2NpZG8gbyBwb3IgY29ub2Nlci4gRUwgQVVUT1IsIGV4cHJlc2EgcXVlIGVsIGRvY3VtZW50byAodHJhYmFqbyBkZSBncmFkbywgcGFzYW50w61hLCBjYXNvcyBvIHRlc2lzKSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIGVsYWJvcsOzIHNpbiBxdWVicmFudGFyIG5pIHN1cGxhbnRhciBsb3MgZGVyZWNob3MgZGUgYXV0b3IgZGUgdGVyY2Vyb3MsIHkgZGUgdGFsIGZvcm1hLCBlbCBkb2N1bWVudG8gKHRyYWJham8gZGUgZ3JhZG8sIHBhc2FudMOtYSwgY2Fzb3MgbyB0ZXNpcykgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgw6lzdGUuIFBBUkFHUkFGTzogZW4gY2FzbyBkZSBwcmVzZW50YXJzZSBhbGd1bmEgcmVjbGFtYWNpw7NuIG8gYWNjacOzbiBwb3IgcGFydGUgZGUgdW4gdGVyY2VybywgcmVmZXJlbnRlIGEgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIHNvYnJlIGVsIGRvY3VtZW50byAoVHJhYmFqbyBkZSBncmFkbywgUGFzYW50w61hLCBjYXNvcyBvIHRlc2lzKSBlbiBjdWVzdGnDs24sIEVMIEFVVE9SLCBhc3VtaXLDoSBsYSByZXNwb25zYWJpbGlkYWQgdG90YWwsIHkgc2FsZHLDoSBlbiBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvczsgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcywgbGEgVW5pdmVyc2lkYWQgIEF1dMOzbm9tYSBkZSBPY2NpZGVudGUgYWN0w7phIGNvbW8gdW4gdGVyY2VybyBkZSBidWVuYSBmZS4gVG9kYSBwZXJzb25hIHF1ZSBjb25zdWx0ZSB5YSBzZWEgZW4gbGEgYmlibGlvdGVjYSBvIGVuIG1lZGlvIGVsZWN0csOzbmljbyBwb2Ryw6EgY29waWFyIGFwYXJ0ZXMgZGVsIHRleHRvIGNpdGFuZG8gc2llbXByZSBsYSBmdWVudGUsIGVzIGRlY2lyIGVsIHTDrXR1bG8gZGVsIHRyYWJham8geSBlbCBhdXRvci4gRXN0YSBhdXRvcml6YWNpw7NuIG5vIGltcGxpY2EgcmVudW5jaWEgYSBsYSBmYWN1bHRhZCBxdWUgdGllbmUgRUwgQVVUT1IgZGUgcHVibGljYXIgdG90YWwgbyBwYXJjaWFsbWVudGUgbGEgb2JyYS4K