Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models

Scientific applications are widely used to solve complex problems from different do- mains. These kinds of applications usually have demanding computational require- ments. Hence they must be executed in HPC clusters to guarantee a successful execution and find an optimal solution. In the last years...

Full description

Autores:
Marquez Franco, Jack Daniels
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2022
Institución:
Universidad Autónoma de Occidente
Repositorio:
RED: Repositorio Educativo Digital UAO
Idioma:
eng
OAI Identifier:
oai:red.uao.edu.co:10614/13693
Acceso en línea:
https://hdl.handle.net/10614/13693
https://red.uao.edu.co/
Palabra clave:
Doctorado en Ingeniería
Computación en la nube
Algoritmos genéticos
Computación evolutiva
Cloud computing
Evolutionary computation
HPC Cloud
EVT
Genetic algorithm
Heterogeneous storage
Rights
openAccess
License
Derechos reservados - Universidad Autónoma de Occidente, 2022
id REPOUAO2_39569725d13be0b8636992e5aed55898
oai_identifier_str oai:red.uao.edu.co:10614/13693
network_acronym_str REPOUAO2
network_name_str RED: Repositorio Educativo Digital UAO
repository_id_str
dc.title.eng.fl_str_mv Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
title Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
spellingShingle Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
Doctorado en Ingeniería
Computación en la nube
Algoritmos genéticos
Computación evolutiva
Cloud computing
Evolutionary computation
HPC Cloud
EVT
Genetic algorithm
Heterogeneous storage
title_short Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
title_full Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
title_fullStr Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
title_full_unstemmed Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
title_sort Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
dc.creator.fl_str_mv Marquez Franco, Jack Daniels
dc.contributor.advisor.none.fl_str_mv Mondragon, Oscar
dc.contributor.author.none.fl_str_mv Marquez Franco, Jack Daniels
dc.contributor.corporatename.spa.fl_str_mv Universidad Autónoma de Occidente
dc.subject.spa.fl_str_mv Doctorado en Ingeniería
topic Doctorado en Ingeniería
Computación en la nube
Algoritmos genéticos
Computación evolutiva
Cloud computing
Evolutionary computation
HPC Cloud
EVT
Genetic algorithm
Heterogeneous storage
dc.subject.armarc.spa.fl_str_mv Computación en la nube
Algoritmos genéticos
Computación evolutiva
dc.subject.armarc.eng.fl_str_mv Cloud computing
Evolutionary computation
dc.subject.proposal.eng.fl_str_mv HPC Cloud
EVT
Genetic algorithm
Heterogeneous storage
description Scientific applications are widely used to solve complex problems from different do- mains. These kinds of applications usually have demanding computational require- ments. Hence they must be executed in HPC clusters to guarantee a successful execution and find an optimal solution. In the last years, researchers have tried to find an alternative to run their applications in cloud computing. Recent works have been attempting to migrate the applications because they see a flexibility and sca- lability model in cloud computing that can benefit them and their applications. The cloud computing economic model, where you only pay for what you are using, can reduce the cost of the acquisition, maintenance, and updates in comparison with a HPC cluster. The deployment of HPC applications over cloud computing clusters presents several challenges that have yet to be resolved. One potential problem con- cerns storage systems and file systems, as cloud clusters do not use the same sto- rage and file systems as HPC clusters. Therefore, HPC applications are affected by overheads given by the different technologies and the entire environment. This dis- sertation seeks to reduce HPC applications’ overhead, improving the performance of applications running on heterogeneous storage systems in the HPC Cloud. To do so, this dissertation characterizes the performance of High Performance Computing ap- plications that make use of heterogeneous storage technologies in cloud computing clusters. This dissertation also presents and validates the use of an Extreme Value Theory-based model to characterize, analyze and predict the performance of these applications. Finally, this dissertation presents a genetic algorithm that uses the pro- posed model as input to solve an Integer Linear Programming problem formulated for the data placement of the files used by the applications to the heterogeneous storage devices in a HPC cloud system.
publishDate 2022
dc.date.accessioned.none.fl_str_mv 2022-03-25T16:43:19Z
dc.date.available.none.fl_str_mv 2022-03-25T16:43:19Z
dc.date.issued.none.fl_str_mv 2022-02-14
dc.type.spa.fl_str_mv Trabajo de grado - Doctorado
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.eng.fl_str_mv http://purl.org/coar/resource_type/c_db06
dc.type.content.eng.fl_str_mv Model
dc.type.driver.eng.fl_str_mv info:eu-repo/semantics/doctoralThesis
dc.type.redcol.eng.fl_str_mv https://purl.org/redcol/resource_type/TD
format http://purl.org/coar/resource_type/c_db06
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/10614/13693
dc.identifier.instname.spa.fl_str_mv Universidad Autónoma de Occidente
dc.identifier.reponame.spa.fl_str_mv Repositorio Educativo Digital
dc.identifier.repourl.spa.fl_str_mv https://red.uao.edu.co/
url https://hdl.handle.net/10614/13693
https://red.uao.edu.co/
identifier_str_mv Universidad Autónoma de Occidente
Repositorio Educativo Digital
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.cites.eng.fl_str_mv Márquez Franco, J.D. (2022). Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models [Tesis Doctoral Universidad Autónoma de Occidente]. https://hdl.handle.net/10614/13693
dc.relation.references.none.fl_str_mv [1] About Chameleon | Chameleon.
[2] Archival storage, ssd & memory.
[3] Ibrahim Abaker, Targio Hashem, Ibrar Yaqoob, Badrul Anuar, Salimah Mokhtar, Abdullah Gani, and Samee Ullah Khan. The rise of “big data” on cloud computing review and open research issues. Information Systems, 47:98–115, 2014.
[4] PS Aithal and Vaikunth Pai T. Opportunity for realizing ideal computing system using cloud computing model. International Journal of Case Studies in Business, IT and Education (IJCSBE), 1(2):60–71, 2017.
[5] Ismail M Ali, Karam M Sallam, Nour Moustafa, Ripon Chakraborty, Michael J Ryan, and Kim-Kwang Raymond Choo. An automated task scheduling model using non-dominated sorting genetic algorithm ii for fog-cloud systems. IEEE Transactions on Cloud Computing, 2020.
[6] Entisar S Alkayal, Nicholas R Jennings, and Maysoon F Abulkhair. Efficient task scheduling multi-objective particle swarm optimization in cloud computing. In 2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops), pages 17–24. IEEE, 2016.
[7] José Nelson Amaral. About computing science research methodology.
[8] Pistirica Sorin Andrei, Moldoveanu Florica, Moldoveanu Alin, Asavei Victor, and Caraman Mihai Claudiu. Hardware acceleration in ceph distributed file system. In 2013 IEEE 12th International Symposium on Parallel and Distributed Computing, pages 209–215. IEEE, 2013.
[9] Peter MG Apers. Data allocation in distributed database systems. ACM Transactions on Database Systems (TODS), 13(3):263–304, 1988.
[10] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.
[11] E. Arun, A. Reji, P. Mohammed Shameem, and R.S. Shaji. A novel algorithm for load balancing in mobile cloud networks: Multi-objective optimization approach. Wireless Personal Communications, 97(2):3125–3140, 2017.
[12] Fatemeh Azimzadeh and Fatemeh Biabani. Multi-objective job scheduling algorithm in cloud computing based on reliability and time. In 2017 3th International Conference on Web Research (ICWR), pages 96–101. IEEE, 2017.
[13] Bartosz Balis, Kamil Figiela, Konrad Jopek, Maciej Malawski, and Maciej Pawlik. Porting hpc applications to the cloud: A multi-frontal solver case study. Journal of Computational Science, 18:106–116, 2017.
[14] Timur Bazhirov, Mohammad Mohammadi, Kevin Ding, and Sergey Barabash. Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing. In APS March Meeting Abstracts, volume 2017, pages C1–007, 2017.
[15] Ali Belgacem, Kadda Beghdad-Bey, and Hassina Nacer. Dynamic resource allocation method based on symbiotic organism search algorithm in cloud computing. IEEE Transactions on Cloud Computing, 2020.
[16] Anton Beloglazov, Jemal Abawajy, and Rajkumar Buyya. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future generation computer systems, 28(5):755–768, 2012.
[17] Kostiantyn Berezovskyi, Luca Santinelli, Konstantinos Bletsas, and Eduardo Tovar. Wcet measurement-based and extreme value theory characterisation of cuda kernels. In Proceedings of the 22nd International Conference on Real- Time Networks and Systems, pages 279–288, 2014.
[18] Aprigio Bezerra, Porfidio Hernández, Antonio Espinosa, and Juan Carlos Moure. Job scheduling in hadoop with shared input policy and ramdisk. In 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 355–363. IEEE, 2014.
[19] Ekaba Bisong. An overview of google cloud platform services. Building Machine Learning and Deep Learning Models on Google Cloud Platform, pages 7–10, 2019.
[20] Dhruba Borthakur et al. Hdfs architecture guide. Hadoop Apache Project, 53(1-13):2, 2008.
[21] Eric B Boyer, Matthew C Broomfield, and Terrell A Perrotti. Glusterfs one storage server to rule them all. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2012.
[22] Peter Braam. The lustre storage architecture. arXiv preprint arXiv:1903.01955, 2019.
[23] Piotr Bryk, Maciej Malawski, and Gideon Juve. Storage-aware algorithms for scheduling of workflow ensembles in clouds. Journal of Grid Computing, pages 359–378, 2016.
[24] Binlei Cai, Laiping Zhao, Xiaobo Zhou, Rongqi Zhang, and Keqiu Li. On evaluating the resource usage effectiveness of multi-tenant cloud storage. Journal of Systems Architecture, 98:403–412, 2019.
[25] Yuzhi Cai and Dominic Hames. Minimum sample size determination for generalized extreme value distribution. Communications in Statistics—Simulation and Computation®, 40(1):87–98, 2010.
[26] Brad Calder, JuWang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, JieshengWu, Huseyin Simitci, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143–157, 2011.
[27] Enrique Castillo, Ali S Hadi, Narayanaswamy Balakrishnan, and José-Mariá Sarabia. Extreme value and related models with applications in engineering and science. Jhon Wiley & Sons, 2005.
[28] Joan Del Castillo, Maria Padilla, Jaume Abella, and Francisco J Cazorla. Execution time distributions in embedded safety-critical systems using extreme value theory. International Journal of Data Analysis Techniques and Strategies, 9(4):348–361, 2017.
[29] Mainak Chakraborty and Ajit Pratap Kundan. Grafana. In Monitoring Cloud- Native Applications, pages 187–240. Springer, 2021.
[30] Nathanaël Cheriere, Matthieu Dorier, and Gabriel Antoniu. How fast can one resize a distributed file system? Journal of Parallel and Distributed Computing, 140:80–98, 2020.
[31] Carlos A Coello Coello, Gary B Lamont, David A Van Veldhuizen, et al. Evolutionary algorithms for solving multi-objective problems, volume 5. Springer, 2007.
[32] Carlos A Coello Coello and Efrén Mezura Montes. Constraint-handling in genetic algorithms through the use of dominance-based tournament selection. Advanced Engineering Informatics, 16(3):193–203, 2002.
[33] Susan Coghlan. The magellan final report on cloud computing. Lawrence Berkeley National Laboratory, 2011.
[34] Stuart Coles, Joanna Bawa, Lesley Trenner, and Pat Dorazio. An introduction to statistical modeling of extreme values, volume 208. Springer, 2001.
[35] Alex Conway, Ainesh Bakshi, Yizheng Jiao, William Jannen, Yang Zhan, Jun Yuan, Michael A Bender, Rob Johnson, Bradley C Kuszmaul, Donald E Porter, et al. File systems fated for senescence? nonsense, says science! In 15th {USENIX} Conference on File and Storage Technologies ({FAST} 17), pages 45–58, 2017.
[36] Robert Cypher, Alex Ho, Smaragda Konstantinidou, and Paul Messina. Architectural requirements of parallel scientific applications with explicit communication. ACM SIGARCH Computer Architecture News, 21(2):2–13, 1993.
[37] Daniel de Oliveira, Kary ACS Ocaña, Fernanda Baião, and Marta Mattoso. A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. Journal of grid Computing, 10(3):521–552, 2012.
[38] Y. Deng, Y. Li, R. Seet, X. Tang, and W. Cai. The server allocation problem for session-based multiplayer cloud gaming. IEEE Transactions on Multimedia, 2017.
[39] Snehsudha Popatrao Dhage, Tanpure Renuka Subhash, Rutuja Vilas Kotkar, Prachi Dattatray Varpe, and Sonali Sanjay Pardeshi. An overview-google file system (gfs) and hadoop distributed file system (hdfs). SAMRIDDHI: A Journal of Physical Sciences, Engineering and Technology, 12(SUP 1):126–128, 2020.
[40] Clément Dombry, Ana Ferreira, et al. Maximum likelihood estimators based on the block maxima method. Bernoulli, 25(3):1690–1723, 2019.
[41] Jered Dominguez-Trujillo, Keira Haskins, Soheila Jafari Khouzani, Christopher Leap, Sahba Tashakkori, Quincy Wofford, Trilce Estrada, Patrick G Bridges, and Patrick M Widener. Lightweight measurement and analysis of hpc performance variability. In 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 50– 60. IEEE, 2020.
[42] Maicon Ança dos Santos and Gerson Geraldo H Cavalheiro. Cloud infrastructure for hpc investment analysis. Revista de Informática Teórica e Aplicada, 27(4):45–62, 2020.
[43] Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. The design and operation of cloudlab. In 2019 USENIX Annual Technical Conference (ATC 2019), page 1–14, 7 2019.
[44] Song Feng, Saralees Nadarajah, and Qi Hu. Modeling annual extreme precipitation in china using the generalized extreme value distribution. Journal of the Meteorological Society of Japan. Ser. II, 85(5):599–613, 2007.
[45] Ana Ferreira and Laurens De Haan. On the block maxima method in extreme value theory: Pwm estimators. The Annals of statistics, 43(1):276–298, 2015.
[46] M Ficco, B Di Martino, R Pietrantuono, and S Russo. Optimized task allocation on private cloud for hybrid simulation of large-scale critical systems. Future Generation Computer Systems, 74:104–118, 2017. Cited By :3 Export Date: 15 November 2017.
[47] M. Ficco, C. Esposito, F. Palmieri, and A. Castiglione. A coral-reefs and game theory-based approach for optimizing elastic cloud resource allocation. Future Generation Computer Systems, 78, 2018.
[48] Ronald Aylmer Fisher and Leonard Henry Caleb Tippett. Limiting forms of the frequency distribution of the largest or smallest member of a sample. In Mathematical proceedings of the Cambridge philosophical society, volume 24, pages 180–190. Cambridge University Press, 1928.
[49] John Fragalla. Configure, tune, and benchmark a lustre filesystem. In 2014 Oil & Gas HPC Workshop, 2014.
[50] GA Gabriele and KM Ragsdell. The generalized reduced gradient method: A reliable tool for optimal design. Journal of Engineering for Industry, 99(2):394– 400, 1977.
[51] Xiangqiang Gao, Rongke Liu, and Aryan Kaushik. Hierarchical multi-agent optimization for resource allocation in cloud computing. IEEE Transactions on Parallel and Distributed Systems, 32(3):692–707, 2020.
[52] Y Gao, J Duan, and W Shu. A novel ant optimization algorithm for task scheduling and resource allocation in cloud computing environment. Journal of Internet Technology, 16(7):1329–1338, 2015. Cited By :2 Export Date: 15 November 2017.
[53] Al Geist and Daniel A Reed. A survey of high-performance computing scaling challenges. The International Journal of High Performance Computing Applications, 31(1):104–113, 2017.
[54] Mitsuo Gen and Runwei Cheng. A survey of penalty techniques in genetic algorithms. In Proceedings of IEEE International Conference on Evolutionary Computation, pages 804–809. IEEE, 1996.
[55] Mitsuo Gen and Runwei Cheng. Genetic algorithms and engineering optimization, volume 7. John Wiley & Sons, 2000.
[56] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29–43, 2003.
[57] Yolanda Gil, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole Goble, Miron Livny, Luc Moreau, and Jim Myers. Examining the challenges of scientific workflows. Computer, 40(12), 2007.
[58] Kannan Govindarajan, Vivekanandan Suresh Kumar, and Thamarai Selvi Somasundaram. A distributed cloud resource management framework for highperformance computing (hpc) applications. In 2016 Eighth International Conference on Advanced Computing (ICoAC), pages 1–6. IEEE, 2017.
[59] Carlos Guerrero, Isaac Lera, and Carlos Juiz. Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications. The Journal of Supercomputing, 0.
[60] Giulia Guidi, Marquita Ellis, Aydin Buluc, Katherine Yelick, and David Culler. 10 years later: Cloud computing is closing the performance gap. arXiv preprint arXiv:2011.00656, 2020.
[61] Abhishek Gupta, Paolo Faraboschi, Filippo Gioachin, Laxmikant V Kale, Richard Kaufmann, Bu-Sung Lee, Verdi March, Dejan Milojicic, and Chun Hui Suen. Evaluating and improving the performance and scheduling of hpc applications in cloud. IEEE Transactions on Cloud Computing, 4(3):307–321, 2014.
[62] Mateusz Guzek, Pascal Bouvry, and El-Ghazali Talbi. A survey of evolutionary computation for resource management of processing in cloud computing [review article]. IEEE Computational Intelligence Magazine, 10(2):53–67, 5 2015.
[63] TD Gwiazda. Genetic algorithms reference vol. i. crossover for single-objective numerical optimization problems [online]. tomaszgwiazda e-books, 2006.
[64] Qiming He, Shujia Zhou, Ben Kobler, Dan Duffy, and Tom McGlynn. Case study for running hpc applications in public clouds. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 395–401, 2010.
[65] Jan Heichler. An introduction to beegfs, 2014.
[66] R. Henwood, N. W. Watkins, S. C. Chapman, and R. McLay. A parallel workload has extreme variability in a production environment. arXiv:1801.03898 [cs], 1 2018. arXiv: 1801.03898.
[67] Frank Herold, Sven Breuner, and Jan Heichler. An introduction to beegfs, 2014.
[68] Hideo Hirose. Maximum likelihood estimation in the 3-parameter weibull distribution. a look through the generalized extreme-value distribution. IEEE transactions on dielectrics and electrical insulation, 3(1):43–55, 1996.
[69] Christina Hoffa, Gaurang Mehta, Tim Freeman, Ewa Deelman, Kate Keahey, Bruce Berriman, and John Good. On the use of cloud computing for scientific workflows. In 2008 IEEE fourth international conference on eScience, pages 640–645. IEEE, 2008.
[70] Jonathan RM Hosking. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society: Series B (Methodological), 52(1):105–124, 1990.
[71] H Howie Huang, Shan Li, Alex Szalay, and Andreas Terzis. Performance modeling and analysis of flash-based storage devices. In 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), pages 1–11. IEEE, 2011.
[72] Nusrat Sharmin Islam, Md Wasi-Ur-Rahman, Xiaoyi Lu, Dipti Shankar, and Dhabaleswar K. Panda. Performance characterization and acceleration of inmemory file systems for hadoop and spark applications on hpc clusters. Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, page 243–252, 2015.
[73] Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:1903.05714, 2019.
[74] A. F. Jenkinson. The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Quarterly Journal of the Royal Meteorological Society, 81(348):158–171, 1955.
[75] Joonyong Jeong, Jaewook Kwak, Daeyong Lee, Seungdo Choi, Jungkeol Lee, Jungwook Choi, and Yong Ho Song. Level aware data placement technique for hybrid nand flash storage of log-structured merge-tree based key-value store system. IEEE Access, 8:188256–188268, 2020.
[76] Chandan Kalita, Gautam Barua, and Priya Sehgal. Durablefs: a file system for nvram. CSI Transactions on ICT, 7(4):277–286, 2019.
[77] Scott Klasky, Matthew Wolf, Mark Ainsworth, Chuck Atkins, Jong Choi, Greg Eisenhauer, Berk Geveci, William Godoy, Mark Kim, James Kress, et al. A view from ornl: Scientific data research opportunities in the big data age. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pages 1357–1368. IEEE, 2018.
[78] Dieter Klein and Edward Hannan. An algorithm for the multiple objective integer linear programming problem. European Journal of Operational Research, 9(4):378–385, 1982.
[79] Ana Klimovic, Heiner Litz, and Christos Kozyrakis. Selecta: Heterogeneous cloud storage configuration for data analytics. In 2018 {USENIX} Annual Technical Conferen e, pages 759–773, 2018.
[80] Brian Kocoloski. Scalability in the Presence of Variability. PhD thesis, University of Pittsburgh, 2018.
[81] Antoon Kolen. A genetic algorithm for the partial binary constraint satisfaction problem: an application to a frequency assignment problem. Statistica Neerlandica, 61(1):4–15, 2007.
[82] Joanna Kołodziej. Evolutionary Hierarchical Multi-Criteria Metaheuristics for Scheduling in Large-Scale Grid Systems, volume 419. Springer, 2012.
[83] J. Kołodziej, S.U. Khan, L. Wang, and A.Y. Zomaya. Energy efficient geneticbased schedulers in computational grids. Concurrency Computation, 27(4), 2015.
[84] K. R. Krish, Ali Anwar, and Ali R. Butt. Hats: A heterogeneity-aware tiered storage for hadoop. Proceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014, page 502–511, 2014.
[85] K. R. Krish, Bharti Wadhwa, M. Safdar Iqbal, M. Mustafa Rafique, and Ali R. Butt. On efficient hierarchical storage for big data processing. Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016, page 403–408, 2016.
[86] KR Krish, M Safdar Iqbal, and Ali R Butt. Venu: Orchestrating ssds in hadoop storage. In 2014 IEEE International Conference on Big Data (Big Data), pages 207–212. IEEE, 2014.
[87] A M Kumar and M Venkatesan. A novel based resource allocation method on cloud computing environment using hybrid differential evolution algorithm. Journal of Computational and Theoretical Nanoscience, 14(11):5322–5326, 2017.
[88] Gregory M Kurtzer, Vanessa Sochat, and MichaelWBauer. Singularity: Scientific containers for mobility of compute. PloS one, 12(5):e0177459, 2017.
[89] Leon S Lasdon, Richard L Fox, and Margery W Ratner. Nonlinear optimization using the generalized reduced gradient method. Revue française d’automatique, informatique, recherche opérationnelle. Recherche opérationnelle, 8(V3):73–103, 1974.
[90] LS Lasdon and AD Waren. Generalized reduced gradient software for linearly and nonlinearly constrained problems. Graduate School of Business, University of Texas at Austin Austin, TX, 1977.
[91] George Lima, Dario Dias, and Edna Barros. Extreme value theory for estimating task execution time bounds: A careful look. In 2016 28th Euromicro Conference on Real-Time Systems (ECRTS), pages 200–211. IEEE, 2016.
[92] Qi Liu, Weidong Cai, Jian Shen, Xiaodong Liu, and Nigel Linge. An adaptive approach to better load balancing in a consumer-centric cloud environment. IEEE Transactions on Consumer Electronics, 62(3):243–250, 2016.
[93] Li-Hsiung Lu and Jery R Stedinger. Variance of two-and three-parameter gev/pwm quantile estimators: formulae, confidence intervals, and a comparison. Journal of Hydrology, 138(1-2):247–267, 1992.
[94] Sandeep Madireddy, Prasanna Balaprakash, Philip Carns, Robert Latham, Robert Ross, Shane Snyder, and Stefan Wild. Modeling i/o performance variability using conditional variational autoencoders. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pages 109–113. IEEE, 2018.
[95] Artur Malinowski and Pawel Czarnul. A solution to image processing with parallel mpi i/o and distributed nvram cache. Scalable Computing: Practice and Experience, 19(1):1–14, 2018.
[96] Giovanni Mariani, Andreea Anghel, Rik Jongerius, and Gero Dittmann. Predicting cloud performance for hpc applications before deployment. Future Generation Computer Systems, 87:618–628, 2018.
[97] Velayoudoum Marimoutou, Bechir Raggad, and Abdelwahed Trabelsi. Extreme value theory and value at risk: application to oil market. Energy Economics, 31(4):519–530, 2009.
[98] Sheri Markose and Amadeo Alentorn. The generalized extreme value dis- tribution, implied tail index, and option pricing. The Journal of Derivatives, 18(3):35–60, 2011.
[99] Jack Marquez, J. and Oscar H. Mondragon. Jack Marquez’s Dissertation.
[100] Jack D. Marquez and Mario Castillo. Performance comparison: Virtual machines and containers running artificial intelligence applications. In Álvaro Rocha, Carlos Ferrás, Paulo Carlos López-López, and Teresa Guarda, editors, Information Technology and Systems, pages 199–209, Cham, 2021. Springer International Publishing.
[101] John Paul Martin, A Kandasamy, and K Chandrasekaran. Exploring the support for high performance applications in the container runtime environment. Human-centric Computing and Information Sciences, 0.
[102] Chihiro Matsui and Ken Takeuchi. 22% higher performance, 2x scm write endurance heterogeneous storage with dual storage class memory and nand flash. In 2017 47th European Solid-State Device Research Conference (ESSDERC), pages 6–9. IEEE, 2017.
[103] Alexander J McNeil. Calculating quantile risk measures for financial return series using extreme value theory. Technical report, ETH Zurich, 1998.
[104] Peter Mell, Tim Grance, et al. The nist definition of cloud computing. Computer Security Division, Information Technology Laboratory, National Institute of Standards and Technology, 2011.
[105] Dirk Merkel. Docker: Lightweight linux containers for consistent development and deployment, 2014. [Online; accessed 2018-04-01].
[106] Zbigniew Michalewicz and Cezary Z Janikow. Handling constraints in genetic algorithms. In ICGA, pages 151–157, 1991.
[107] Rino Micheloni. Solid-state drive (ssd): A nonvolatile storage system. Proceedings of the IEEE, 105(4):583–588, 2017.
dc.rights.spa.fl_str_mv Derechos reservados - Universidad Autónoma de Occidente, 2022
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.uri.eng.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.eng.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.creativecommons.spa.fl_str_mv Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)
rights_invalid_str_mv Derechos reservados - Universidad Autónoma de Occidente, 2022
https://creativecommons.org/licenses/by-nc-nd/4.0/
Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.spa.fl_str_mv 108 páginas
dc.format.mimetype.eng.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Universidad Autónoma de Occidente
dc.publisher.program.spa.fl_str_mv Doctorado en Ingeniería
dc.publisher.faculty.spa.fl_str_mv Facultad de Ingeniería
dc.publisher.place.spa.fl_str_mv Cali
institution Universidad Autónoma de Occidente
bitstream.url.fl_str_mv https://red.uao.edu.co/bitstreams/5d929a4c-68c5-449b-af5b-d93cb219e674/download
https://red.uao.edu.co/bitstreams/a5d226db-2d9b-422d-bc32-414fb4ca8eac/download
https://red.uao.edu.co/bitstreams/2549b389-7240-4a82-a29e-bd8bcd191e42/download
https://red.uao.edu.co/bitstreams/04052a08-7843-41d2-99d4-d8cd5f8eb619/download
https://red.uao.edu.co/bitstreams/2fca125a-0067-4e7a-8964-c129b26a9797/download
https://red.uao.edu.co/bitstreams/24057cc2-5421-4ae9-ae81-9a1fc922ecdd/download
https://red.uao.edu.co/bitstreams/0ea2a46d-fe12-4a77-ae2b-198874a016ee/download
bitstream.checksum.fl_str_mv 20b5ba22b1117f71589c7318baa2c560
88c94e334e3951c5dd4ab5b80f5a473c
52e1a591011d0980f4bc856bea996f63
c337c3bfb4502c7858740589644cfcc5
e1c06d85ae7b8b032bef47e42e4c08f9
4c8bfbb121aceae05b0af6fbbc8dcb12
9597336c76e9b5368f3a41bc1760969a
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Digital Universidad Autonoma de Occidente
repository.mail.fl_str_mv repositorio@uao.edu.co
_version_ 1808478995683999744
spelling Mondragon, Oscara67b4700e7d9553c5e3453fab4c9af00Marquez Franco, Jack Danielsf2775f2505eb788f77890b5dff32316b600Universidad Autónoma de Occidente2022-03-25T16:43:19Z2022-03-25T16:43:19Z2022-02-14https://hdl.handle.net/10614/13693Universidad Autónoma de OccidenteRepositorio Educativo Digitalhttps://red.uao.edu.co/Scientific applications are widely used to solve complex problems from different do- mains. These kinds of applications usually have demanding computational require- ments. Hence they must be executed in HPC clusters to guarantee a successful execution and find an optimal solution. In the last years, researchers have tried to find an alternative to run their applications in cloud computing. Recent works have been attempting to migrate the applications because they see a flexibility and sca- lability model in cloud computing that can benefit them and their applications. The cloud computing economic model, where you only pay for what you are using, can reduce the cost of the acquisition, maintenance, and updates in comparison with a HPC cluster. The deployment of HPC applications over cloud computing clusters presents several challenges that have yet to be resolved. One potential problem con- cerns storage systems and file systems, as cloud clusters do not use the same sto- rage and file systems as HPC clusters. Therefore, HPC applications are affected by overheads given by the different technologies and the entire environment. This dis- sertation seeks to reduce HPC applications’ overhead, improving the performance of applications running on heterogeneous storage systems in the HPC Cloud. To do so, this dissertation characterizes the performance of High Performance Computing ap- plications that make use of heterogeneous storage technologies in cloud computing clusters. This dissertation also presents and validates the use of an Extreme Value Theory-based model to characterize, analyze and predict the performance of these applications. Finally, this dissertation presents a genetic algorithm that uses the pro- posed model as input to solve an Integer Linear Programming problem formulated for the data placement of the files used by the applications to the heterogeneous storage devices in a HPC cloud system.Tesis (Doctor en Ingenieria)-- Universidad Autónoma de Occidente, 2022DoctoradoDoctor(a) en Ingeniería108 páginasapplication/pdfengUniversidad Autónoma de OccidenteDoctorado en IngenieríaFacultad de IngenieríaCaliDerechos reservados - Universidad Autónoma de Occidente, 2022https://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessAtribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)http://purl.org/coar/access_right/c_abf2Doctorado en IngenieríaComputación en la nubeAlgoritmos genéticosComputación evolutivaCloud computingEvolutionary computationHPC CloudEVTGenetic algorithmHeterogeneous storageImproving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical modelsTrabajo de grado - Doctoradohttp://purl.org/coar/resource_type/c_db06Modelinfo:eu-repo/semantics/doctoralThesishttps://purl.org/redcol/resource_type/TDhttp://purl.org/coar/version/c_970fb48d4fbd8a85Márquez Franco, J.D. (2022). Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models [Tesis Doctoral Universidad Autónoma de Occidente]. https://hdl.handle.net/10614/13693[1] About Chameleon | Chameleon.[2] Archival storage, ssd & memory.[3] Ibrahim Abaker, Targio Hashem, Ibrar Yaqoob, Badrul Anuar, Salimah Mokhtar, Abdullah Gani, and Samee Ullah Khan. The rise of “big data” on cloud computing review and open research issues. Information Systems, 47:98–115, 2014.[4] PS Aithal and Vaikunth Pai T. Opportunity for realizing ideal computing system using cloud computing model. International Journal of Case Studies in Business, IT and Education (IJCSBE), 1(2):60–71, 2017.[5] Ismail M Ali, Karam M Sallam, Nour Moustafa, Ripon Chakraborty, Michael J Ryan, and Kim-Kwang Raymond Choo. An automated task scheduling model using non-dominated sorting genetic algorithm ii for fog-cloud systems. IEEE Transactions on Cloud Computing, 2020.[6] Entisar S Alkayal, Nicholas R Jennings, and Maysoon F Abulkhair. Efficient task scheduling multi-objective particle swarm optimization in cloud computing. In 2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops), pages 17–24. IEEE, 2016.[7] José Nelson Amaral. About computing science research methodology.[8] Pistirica Sorin Andrei, Moldoveanu Florica, Moldoveanu Alin, Asavei Victor, and Caraman Mihai Claudiu. Hardware acceleration in ceph distributed file system. In 2013 IEEE 12th International Symposium on Parallel and Distributed Computing, pages 209–215. IEEE, 2013.[9] Peter MG Apers. Data allocation in distributed database systems. ACM Transactions on Database Systems (TODS), 13(3):263–304, 1988.[10] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.[11] E. Arun, A. Reji, P. Mohammed Shameem, and R.S. Shaji. A novel algorithm for load balancing in mobile cloud networks: Multi-objective optimization approach. Wireless Personal Communications, 97(2):3125–3140, 2017.[12] Fatemeh Azimzadeh and Fatemeh Biabani. Multi-objective job scheduling algorithm in cloud computing based on reliability and time. In 2017 3th International Conference on Web Research (ICWR), pages 96–101. IEEE, 2017.[13] Bartosz Balis, Kamil Figiela, Konrad Jopek, Maciej Malawski, and Maciej Pawlik. Porting hpc applications to the cloud: A multi-frontal solver case study. Journal of Computational Science, 18:106–116, 2017.[14] Timur Bazhirov, Mohammad Mohammadi, Kevin Ding, and Sergey Barabash. Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing. In APS March Meeting Abstracts, volume 2017, pages C1–007, 2017.[15] Ali Belgacem, Kadda Beghdad-Bey, and Hassina Nacer. Dynamic resource allocation method based on symbiotic organism search algorithm in cloud computing. IEEE Transactions on Cloud Computing, 2020.[16] Anton Beloglazov, Jemal Abawajy, and Rajkumar Buyya. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future generation computer systems, 28(5):755–768, 2012.[17] Kostiantyn Berezovskyi, Luca Santinelli, Konstantinos Bletsas, and Eduardo Tovar. Wcet measurement-based and extreme value theory characterisation of cuda kernels. In Proceedings of the 22nd International Conference on Real- Time Networks and Systems, pages 279–288, 2014.[18] Aprigio Bezerra, Porfidio Hernández, Antonio Espinosa, and Juan Carlos Moure. Job scheduling in hadoop with shared input policy and ramdisk. In 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 355–363. IEEE, 2014.[19] Ekaba Bisong. An overview of google cloud platform services. Building Machine Learning and Deep Learning Models on Google Cloud Platform, pages 7–10, 2019.[20] Dhruba Borthakur et al. Hdfs architecture guide. Hadoop Apache Project, 53(1-13):2, 2008.[21] Eric B Boyer, Matthew C Broomfield, and Terrell A Perrotti. Glusterfs one storage server to rule them all. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2012.[22] Peter Braam. The lustre storage architecture. arXiv preprint arXiv:1903.01955, 2019.[23] Piotr Bryk, Maciej Malawski, and Gideon Juve. Storage-aware algorithms for scheduling of workflow ensembles in clouds. Journal of Grid Computing, pages 359–378, 2016.[24] Binlei Cai, Laiping Zhao, Xiaobo Zhou, Rongqi Zhang, and Keqiu Li. On evaluating the resource usage effectiveness of multi-tenant cloud storage. Journal of Systems Architecture, 98:403–412, 2019.[25] Yuzhi Cai and Dominic Hames. Minimum sample size determination for generalized extreme value distribution. Communications in Statistics—Simulation and Computation®, 40(1):87–98, 2010.[26] Brad Calder, JuWang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, JieshengWu, Huseyin Simitci, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143–157, 2011.[27] Enrique Castillo, Ali S Hadi, Narayanaswamy Balakrishnan, and José-Mariá Sarabia. Extreme value and related models with applications in engineering and science. Jhon Wiley & Sons, 2005.[28] Joan Del Castillo, Maria Padilla, Jaume Abella, and Francisco J Cazorla. Execution time distributions in embedded safety-critical systems using extreme value theory. International Journal of Data Analysis Techniques and Strategies, 9(4):348–361, 2017.[29] Mainak Chakraborty and Ajit Pratap Kundan. Grafana. In Monitoring Cloud- Native Applications, pages 187–240. Springer, 2021.[30] Nathanaël Cheriere, Matthieu Dorier, and Gabriel Antoniu. How fast can one resize a distributed file system? Journal of Parallel and Distributed Computing, 140:80–98, 2020.[31] Carlos A Coello Coello, Gary B Lamont, David A Van Veldhuizen, et al. Evolutionary algorithms for solving multi-objective problems, volume 5. Springer, 2007.[32] Carlos A Coello Coello and Efrén Mezura Montes. Constraint-handling in genetic algorithms through the use of dominance-based tournament selection. Advanced Engineering Informatics, 16(3):193–203, 2002.[33] Susan Coghlan. The magellan final report on cloud computing. Lawrence Berkeley National Laboratory, 2011.[34] Stuart Coles, Joanna Bawa, Lesley Trenner, and Pat Dorazio. An introduction to statistical modeling of extreme values, volume 208. Springer, 2001.[35] Alex Conway, Ainesh Bakshi, Yizheng Jiao, William Jannen, Yang Zhan, Jun Yuan, Michael A Bender, Rob Johnson, Bradley C Kuszmaul, Donald E Porter, et al. File systems fated for senescence? nonsense, says science! In 15th {USENIX} Conference on File and Storage Technologies ({FAST} 17), pages 45–58, 2017.[36] Robert Cypher, Alex Ho, Smaragda Konstantinidou, and Paul Messina. Architectural requirements of parallel scientific applications with explicit communication. ACM SIGARCH Computer Architecture News, 21(2):2–13, 1993.[37] Daniel de Oliveira, Kary ACS Ocaña, Fernanda Baião, and Marta Mattoso. A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. Journal of grid Computing, 10(3):521–552, 2012.[38] Y. Deng, Y. Li, R. Seet, X. Tang, and W. Cai. The server allocation problem for session-based multiplayer cloud gaming. IEEE Transactions on Multimedia, 2017.[39] Snehsudha Popatrao Dhage, Tanpure Renuka Subhash, Rutuja Vilas Kotkar, Prachi Dattatray Varpe, and Sonali Sanjay Pardeshi. An overview-google file system (gfs) and hadoop distributed file system (hdfs). SAMRIDDHI: A Journal of Physical Sciences, Engineering and Technology, 12(SUP 1):126–128, 2020.[40] Clément Dombry, Ana Ferreira, et al. Maximum likelihood estimators based on the block maxima method. Bernoulli, 25(3):1690–1723, 2019.[41] Jered Dominguez-Trujillo, Keira Haskins, Soheila Jafari Khouzani, Christopher Leap, Sahba Tashakkori, Quincy Wofford, Trilce Estrada, Patrick G Bridges, and Patrick M Widener. Lightweight measurement and analysis of hpc performance variability. In 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 50– 60. IEEE, 2020.[42] Maicon Ança dos Santos and Gerson Geraldo H Cavalheiro. Cloud infrastructure for hpc investment analysis. Revista de Informática Teórica e Aplicada, 27(4):45–62, 2020.[43] Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. The design and operation of cloudlab. In 2019 USENIX Annual Technical Conference (ATC 2019), page 1–14, 7 2019.[44] Song Feng, Saralees Nadarajah, and Qi Hu. Modeling annual extreme precipitation in china using the generalized extreme value distribution. Journal of the Meteorological Society of Japan. Ser. II, 85(5):599–613, 2007.[45] Ana Ferreira and Laurens De Haan. On the block maxima method in extreme value theory: Pwm estimators. The Annals of statistics, 43(1):276–298, 2015.[46] M Ficco, B Di Martino, R Pietrantuono, and S Russo. Optimized task allocation on private cloud for hybrid simulation of large-scale critical systems. Future Generation Computer Systems, 74:104–118, 2017. Cited By :3 Export Date: 15 November 2017.[47] M. Ficco, C. Esposito, F. Palmieri, and A. Castiglione. A coral-reefs and game theory-based approach for optimizing elastic cloud resource allocation. Future Generation Computer Systems, 78, 2018.[48] Ronald Aylmer Fisher and Leonard Henry Caleb Tippett. Limiting forms of the frequency distribution of the largest or smallest member of a sample. In Mathematical proceedings of the Cambridge philosophical society, volume 24, pages 180–190. Cambridge University Press, 1928.[49] John Fragalla. Configure, tune, and benchmark a lustre filesystem. In 2014 Oil & Gas HPC Workshop, 2014.[50] GA Gabriele and KM Ragsdell. The generalized reduced gradient method: A reliable tool for optimal design. Journal of Engineering for Industry, 99(2):394– 400, 1977.[51] Xiangqiang Gao, Rongke Liu, and Aryan Kaushik. Hierarchical multi-agent optimization for resource allocation in cloud computing. IEEE Transactions on Parallel and Distributed Systems, 32(3):692–707, 2020.[52] Y Gao, J Duan, and W Shu. A novel ant optimization algorithm for task scheduling and resource allocation in cloud computing environment. Journal of Internet Technology, 16(7):1329–1338, 2015. Cited By :2 Export Date: 15 November 2017.[53] Al Geist and Daniel A Reed. A survey of high-performance computing scaling challenges. The International Journal of High Performance Computing Applications, 31(1):104–113, 2017.[54] Mitsuo Gen and Runwei Cheng. A survey of penalty techniques in genetic algorithms. In Proceedings of IEEE International Conference on Evolutionary Computation, pages 804–809. IEEE, 1996.[55] Mitsuo Gen and Runwei Cheng. Genetic algorithms and engineering optimization, volume 7. John Wiley & Sons, 2000.[56] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29–43, 2003.[57] Yolanda Gil, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole Goble, Miron Livny, Luc Moreau, and Jim Myers. Examining the challenges of scientific workflows. Computer, 40(12), 2007.[58] Kannan Govindarajan, Vivekanandan Suresh Kumar, and Thamarai Selvi Somasundaram. A distributed cloud resource management framework for highperformance computing (hpc) applications. In 2016 Eighth International Conference on Advanced Computing (ICoAC), pages 1–6. IEEE, 2017.[59] Carlos Guerrero, Isaac Lera, and Carlos Juiz. Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications. The Journal of Supercomputing, 0.[60] Giulia Guidi, Marquita Ellis, Aydin Buluc, Katherine Yelick, and David Culler. 10 years later: Cloud computing is closing the performance gap. arXiv preprint arXiv:2011.00656, 2020.[61] Abhishek Gupta, Paolo Faraboschi, Filippo Gioachin, Laxmikant V Kale, Richard Kaufmann, Bu-Sung Lee, Verdi March, Dejan Milojicic, and Chun Hui Suen. Evaluating and improving the performance and scheduling of hpc applications in cloud. IEEE Transactions on Cloud Computing, 4(3):307–321, 2014.[62] Mateusz Guzek, Pascal Bouvry, and El-Ghazali Talbi. A survey of evolutionary computation for resource management of processing in cloud computing [review article]. IEEE Computational Intelligence Magazine, 10(2):53–67, 5 2015.[63] TD Gwiazda. Genetic algorithms reference vol. i. crossover for single-objective numerical optimization problems [online]. tomaszgwiazda e-books, 2006.[64] Qiming He, Shujia Zhou, Ben Kobler, Dan Duffy, and Tom McGlynn. Case study for running hpc applications in public clouds. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 395–401, 2010.[65] Jan Heichler. An introduction to beegfs, 2014.[66] R. Henwood, N. W. Watkins, S. C. Chapman, and R. McLay. A parallel workload has extreme variability in a production environment. arXiv:1801.03898 [cs], 1 2018. arXiv: 1801.03898.[67] Frank Herold, Sven Breuner, and Jan Heichler. An introduction to beegfs, 2014.[68] Hideo Hirose. Maximum likelihood estimation in the 3-parameter weibull distribution. a look through the generalized extreme-value distribution. IEEE transactions on dielectrics and electrical insulation, 3(1):43–55, 1996.[69] Christina Hoffa, Gaurang Mehta, Tim Freeman, Ewa Deelman, Kate Keahey, Bruce Berriman, and John Good. On the use of cloud computing for scientific workflows. In 2008 IEEE fourth international conference on eScience, pages 640–645. IEEE, 2008.[70] Jonathan RM Hosking. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society: Series B (Methodological), 52(1):105–124, 1990.[71] H Howie Huang, Shan Li, Alex Szalay, and Andreas Terzis. Performance modeling and analysis of flash-based storage devices. In 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), pages 1–11. IEEE, 2011.[72] Nusrat Sharmin Islam, Md Wasi-Ur-Rahman, Xiaoyi Lu, Dipti Shankar, and Dhabaleswar K. Panda. Performance characterization and acceleration of inmemory file systems for hadoop and spark applications on hpc clusters. Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, page 243–252, 2015.[73] Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:1903.05714, 2019.[74] A. F. Jenkinson. The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Quarterly Journal of the Royal Meteorological Society, 81(348):158–171, 1955.[75] Joonyong Jeong, Jaewook Kwak, Daeyong Lee, Seungdo Choi, Jungkeol Lee, Jungwook Choi, and Yong Ho Song. Level aware data placement technique for hybrid nand flash storage of log-structured merge-tree based key-value store system. IEEE Access, 8:188256–188268, 2020.[76] Chandan Kalita, Gautam Barua, and Priya Sehgal. Durablefs: a file system for nvram. CSI Transactions on ICT, 7(4):277–286, 2019.[77] Scott Klasky, Matthew Wolf, Mark Ainsworth, Chuck Atkins, Jong Choi, Greg Eisenhauer, Berk Geveci, William Godoy, Mark Kim, James Kress, et al. A view from ornl: Scientific data research opportunities in the big data age. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pages 1357–1368. IEEE, 2018.[78] Dieter Klein and Edward Hannan. An algorithm for the multiple objective integer linear programming problem. European Journal of Operational Research, 9(4):378–385, 1982.[79] Ana Klimovic, Heiner Litz, and Christos Kozyrakis. Selecta: Heterogeneous cloud storage configuration for data analytics. In 2018 {USENIX} Annual Technical Conferen e, pages 759–773, 2018.[80] Brian Kocoloski. Scalability in the Presence of Variability. PhD thesis, University of Pittsburgh, 2018.[81] Antoon Kolen. A genetic algorithm for the partial binary constraint satisfaction problem: an application to a frequency assignment problem. Statistica Neerlandica, 61(1):4–15, 2007.[82] Joanna Kołodziej. Evolutionary Hierarchical Multi-Criteria Metaheuristics for Scheduling in Large-Scale Grid Systems, volume 419. Springer, 2012.[83] J. Kołodziej, S.U. Khan, L. Wang, and A.Y. Zomaya. Energy efficient geneticbased schedulers in computational grids. Concurrency Computation, 27(4), 2015.[84] K. R. Krish, Ali Anwar, and Ali R. Butt. Hats: A heterogeneity-aware tiered storage for hadoop. Proceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014, page 502–511, 2014.[85] K. R. Krish, Bharti Wadhwa, M. Safdar Iqbal, M. Mustafa Rafique, and Ali R. Butt. On efficient hierarchical storage for big data processing. Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016, page 403–408, 2016.[86] KR Krish, M Safdar Iqbal, and Ali R Butt. Venu: Orchestrating ssds in hadoop storage. In 2014 IEEE International Conference on Big Data (Big Data), pages 207–212. IEEE, 2014.[87] A M Kumar and M Venkatesan. A novel based resource allocation method on cloud computing environment using hybrid differential evolution algorithm. Journal of Computational and Theoretical Nanoscience, 14(11):5322–5326, 2017.[88] Gregory M Kurtzer, Vanessa Sochat, and MichaelWBauer. Singularity: Scientific containers for mobility of compute. PloS one, 12(5):e0177459, 2017.[89] Leon S Lasdon, Richard L Fox, and Margery W Ratner. Nonlinear optimization using the generalized reduced gradient method. Revue française d’automatique, informatique, recherche opérationnelle. Recherche opérationnelle, 8(V3):73–103, 1974.[90] LS Lasdon and AD Waren. Generalized reduced gradient software for linearly and nonlinearly constrained problems. Graduate School of Business, University of Texas at Austin Austin, TX, 1977.[91] George Lima, Dario Dias, and Edna Barros. Extreme value theory for estimating task execution time bounds: A careful look. In 2016 28th Euromicro Conference on Real-Time Systems (ECRTS), pages 200–211. IEEE, 2016.[92] Qi Liu, Weidong Cai, Jian Shen, Xiaodong Liu, and Nigel Linge. An adaptive approach to better load balancing in a consumer-centric cloud environment. IEEE Transactions on Consumer Electronics, 62(3):243–250, 2016.[93] Li-Hsiung Lu and Jery R Stedinger. Variance of two-and three-parameter gev/pwm quantile estimators: formulae, confidence intervals, and a comparison. Journal of Hydrology, 138(1-2):247–267, 1992.[94] Sandeep Madireddy, Prasanna Balaprakash, Philip Carns, Robert Latham, Robert Ross, Shane Snyder, and Stefan Wild. Modeling i/o performance variability using conditional variational autoencoders. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pages 109–113. IEEE, 2018.[95] Artur Malinowski and Pawel Czarnul. A solution to image processing with parallel mpi i/o and distributed nvram cache. Scalable Computing: Practice and Experience, 19(1):1–14, 2018.[96] Giovanni Mariani, Andreea Anghel, Rik Jongerius, and Gero Dittmann. Predicting cloud performance for hpc applications before deployment. Future Generation Computer Systems, 87:618–628, 2018.[97] Velayoudoum Marimoutou, Bechir Raggad, and Abdelwahed Trabelsi. Extreme value theory and value at risk: application to oil market. Energy Economics, 31(4):519–530, 2009.[98] Sheri Markose and Amadeo Alentorn. The generalized extreme value dis- tribution, implied tail index, and option pricing. The Journal of Derivatives, 18(3):35–60, 2011.[99] Jack Marquez, J. and Oscar H. Mondragon. Jack Marquez’s Dissertation.[100] Jack D. Marquez and Mario Castillo. Performance comparison: Virtual machines and containers running artificial intelligence applications. In Álvaro Rocha, Carlos Ferrás, Paulo Carlos López-López, and Teresa Guarda, editors, Information Technology and Systems, pages 199–209, Cham, 2021. Springer International Publishing.[101] John Paul Martin, A Kandasamy, and K Chandrasekaran. Exploring the support for high performance applications in the container runtime environment. Human-centric Computing and Information Sciences, 0.[102] Chihiro Matsui and Ken Takeuchi. 22% higher performance, 2x scm write endurance heterogeneous storage with dual storage class memory and nand flash. In 2017 47th European Solid-State Device Research Conference (ESSDERC), pages 6–9. IEEE, 2017.[103] Alexander J McNeil. Calculating quantile risk measures for financial return series using extreme value theory. Technical report, ETH Zurich, 1998.[104] Peter Mell, Tim Grance, et al. The nist definition of cloud computing. Computer Security Division, Information Technology Laboratory, National Institute of Standards and Technology, 2011.[105] Dirk Merkel. Docker: Lightweight linux containers for consistent development and deployment, 2014. [Online; accessed 2018-04-01].[106] Zbigniew Michalewicz and Cezary Z Janikow. Handling constraints in genetic algorithms. In ICGA, pages 151–157, 1991.[107] Rino Micheloni. Solid-state drive (ssd): A nonvolatile storage system. Proceedings of the IEEE, 105(4):583–588, 2017.Comunidad generalPublicationLICENSElicense.txtlicense.txttext/plain; charset=utf-81665https://red.uao.edu.co/bitstreams/5d929a4c-68c5-449b-af5b-d93cb219e674/download20b5ba22b1117f71589c7318baa2c560MD53ORIGINALT10159_Improving heterogeneous storage performance in hpc cloud systems using efficient storage algorithms informed by statistical models.pdfT10159_Improving heterogeneous storage performance in hpc cloud systems using efficient storage algorithms informed by statistical models.pdfTexto archivo completo del trabajo de grado, PDFapplication/pdf1411888https://red.uao.edu.co/bitstreams/a5d226db-2d9b-422d-bc32-414fb4ca8eac/download88c94e334e3951c5dd4ab5b80f5a473cMD54TA10159_Autorización trabajo de grado.pdfTA10159_Autorización trabajo de grado.pdfAutorización publicación del trabajo de gradoapplication/pdf792496https://red.uao.edu.co/bitstreams/2549b389-7240-4a82-a29e-bd8bcd191e42/download52e1a591011d0980f4bc856bea996f63MD55TEXTT10159_Improving heterogeneous storage performance in hpc cloud systems using efficient storage algorithms informed by statistical models.pdf.txtT10159_Improving heterogeneous storage performance in hpc cloud systems using efficient storage algorithms informed by statistical models.pdf.txtExtracted texttext/plain181814https://red.uao.edu.co/bitstreams/04052a08-7843-41d2-99d4-d8cd5f8eb619/downloadc337c3bfb4502c7858740589644cfcc5MD56TA10159_Autorización trabajo de grado.pdf.txtTA10159_Autorización trabajo de grado.pdf.txtExtracted texttext/plain2https://red.uao.edu.co/bitstreams/2fca125a-0067-4e7a-8964-c129b26a9797/downloade1c06d85ae7b8b032bef47e42e4c08f9MD58THUMBNAILT10159_Improving heterogeneous storage performance in hpc cloud systems using efficient storage algorithms informed by statistical models.pdf.jpgT10159_Improving heterogeneous storage performance in hpc cloud systems using efficient storage algorithms informed by statistical models.pdf.jpgGenerated Thumbnailimage/jpeg6302https://red.uao.edu.co/bitstreams/24057cc2-5421-4ae9-ae81-9a1fc922ecdd/download4c8bfbb121aceae05b0af6fbbc8dcb12MD57TA10159_Autorización trabajo de grado.pdf.jpgTA10159_Autorización trabajo de grado.pdf.jpgGenerated Thumbnailimage/jpeg12536https://red.uao.edu.co/bitstreams/0ea2a46d-fe12-4a77-ae2b-198874a016ee/download9597336c76e9b5368f3a41bc1760969aMD5910614/13693oai:red.uao.edu.co:10614/136932024-03-18 16:23:16.458https://creativecommons.org/licenses/by-nc-nd/4.0/Derechos reservados - Universidad Autónoma de Occidente, 2022open.accesshttps://red.uao.edu.coRepositorio Digital Universidad Autonoma de Occidenterepositorio@uao.edu.coRUwgQVVUT1IgYXV0b3JpemEgYSBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgT2NjaWRlbnRlLCBkZSBmb3JtYSBpbmRlZmluaWRhLCBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgbGEgTGV5IDQ0IGRlIDE5OTMsIGxhIERlY2lzacOzbiBhbmRpbmEgMzUxIGRlIDE5OTMsIGVsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbGV5ZXMgeSBqdXJpc3BydWRlbmNpYSB2aWdlbnRlIGFsIHJlc3BlY3RvLCBoYWdhIHB1YmxpY2FjacOzbiBkZSBlc3RlIGNvbiBmaW5lcyBlZHVjYXRpdm9zLiBQQVJBR1JBRk86IEVzdGEgYXV0b3JpemFjacOzbiBhZGVtw6FzIGRlIHNlciB2w6FsaWRhIHBhcmEgbGFzIGZhY3VsdGFkZXMgeSBkZXJlY2hvcyBkZSB1c28gc29icmUgbGEgb2JyYSBlbiBmb3JtYXRvIG8gc29wb3J0ZSBtYXRlcmlhbCwgdGFtYmnDqW4gcGFyYSBmb3JtYXRvIGRpZ2l0YWwsIGVsZWN0csOzbmljbywgdmlydHVhbCwgcGFyYSB1c29zIGVuIHJlZCwgSW50ZXJuZXQsIGV4dHJhbmV0LCBpbnRyYW5ldCwgYmlibGlvdGVjYSBkaWdpdGFsIHkgZGVtw6FzIHBhcmEgY3VhbHF1aWVyIGZvcm1hdG8gY29ub2NpZG8gbyBwb3IgY29ub2Nlci4gRUwgQVVUT1IsIGV4cHJlc2EgcXVlIGVsIGRvY3VtZW50byAodHJhYmFqbyBkZSBncmFkbywgcGFzYW50w61hLCBjYXNvcyBvIHRlc2lzKSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIGVsYWJvcsOzIHNpbiBxdWVicmFudGFyIG5pIHN1cGxhbnRhciBsb3MgZGVyZWNob3MgZGUgYXV0b3IgZGUgdGVyY2Vyb3MsIHkgZGUgdGFsIGZvcm1hLCBlbCBkb2N1bWVudG8gKHRyYWJham8gZGUgZ3JhZG8sIHBhc2FudMOtYSwgY2Fzb3MgbyB0ZXNpcykgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgw6lzdGUuIFBBUkFHUkFGTzogZW4gY2FzbyBkZSBwcmVzZW50YXJzZSBhbGd1bmEgcmVjbGFtYWNpw7NuIG8gYWNjacOzbiBwb3IgcGFydGUgZGUgdW4gdGVyY2VybywgcmVmZXJlbnRlIGEgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIHNvYnJlIGVsIGRvY3VtZW50byAoVHJhYmFqbyBkZSBncmFkbywgUGFzYW50w61hLCBjYXNvcyBvIHRlc2lzKSBlbiBjdWVzdGnDs24sIEVMIEFVVE9SLCBhc3VtaXLDoSBsYSByZXNwb25zYWJpbGlkYWQgdG90YWwsIHkgc2FsZHLDoSBlbiBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvczsgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcywgbGEgVW5pdmVyc2lkYWQgIEF1dMOzbm9tYSBkZSBPY2NpZGVudGUgYWN0w7phIGNvbW8gdW4gdGVyY2VybyBkZSBidWVuYSBmZS4gVG9kYSBwZXJzb25hIHF1ZSBjb25zdWx0ZSB5YSBzZWEgZW4gbGEgYmlibGlvdGVjYSBvIGVuIG1lZGlvIGVsZWN0csOzbmljbyBwb2Ryw6EgY29waWFyIGFwYXJ0ZXMgZGVsIHRleHRvIGNpdGFuZG8gc2llbXByZSBsYSBmdWVudGUsIGVzIGRlY2lyIGVsIHTDrXR1bG8gZGVsIHRyYWJham8geSBlbCBhdXRvci4gRXN0YSBhdXRvcml6YWNpw7NuIG5vIGltcGxpY2EgcmVudW5jaWEgYSBsYSBmYWN1bHRhZCBxdWUgdGllbmUgRUwgQVVUT1IgZGUgcHVibGljYXIgdG90YWwgbyBwYXJjaWFsbWVudGUgbGEgb2JyYS4K