A Snapshot of Parallelism in Distributed Deep Learning Training

The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without t...

Full description

Autores:
Romero Sandí, Hairol
Núñez, Gabriel
Rojas, Elvis
Tipo de recurso:
Article of investigation
Fecha de publicación:
2024
Institución:
Universidad Autónoma de Bucaramanga - UNAB
Repositorio:
Repositorio UNAB
Idioma:
spa
OAI Identifier:
oai:repository.unab.edu.co:20.500.12749/26661
Acceso en línea:
http://hdl.handle.net/20.500.12749/26661
https://doi.org/10.29375/25392115.5054
Palabra clave:
Deep learning
Parallelism
Artificial neural networks
Rights
License
http://purl.org/coar/access_right/c_abf2
id UNAB2_cb29868e5b8fee08a90e7761f58d303d
oai_identifier_str oai:repository.unab.edu.co:20.500.12749/26661
network_acronym_str UNAB2
network_name_str Repositorio UNAB
repository_id_str
dc.title.eng.fl_str_mv A Snapshot of Parallelism in Distributed Deep Learning Training
title A Snapshot of Parallelism in Distributed Deep Learning Training
spellingShingle A Snapshot of Parallelism in Distributed Deep Learning Training
Deep learning
Parallelism
Artificial neural networks
title_short A Snapshot of Parallelism in Distributed Deep Learning Training
title_full A Snapshot of Parallelism in Distributed Deep Learning Training
title_fullStr A Snapshot of Parallelism in Distributed Deep Learning Training
title_full_unstemmed A Snapshot of Parallelism in Distributed Deep Learning Training
title_sort A Snapshot of Parallelism in Distributed Deep Learning Training
dc.creator.fl_str_mv Romero Sandí, Hairol
Núñez, Gabriel
Rojas, Elvis
dc.contributor.author.none.fl_str_mv Romero Sandí, Hairol
Núñez, Gabriel
Rojas, Elvis
dc.contributor.orcid.spa.fl_str_mv Romero Sandí, Hairol [0000-0002-3199-1244]
Núñez, Gabriel [0000-0002-6907-533X]
Rojas, Elvis [0000-0002-4238-0908]
dc.subject.keywords.eng.fl_str_mv Deep learning
Parallelism
Artificial neural networks
topic Deep learning
Parallelism
Artificial neural networks
description The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view. The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one.
publishDate 2024
dc.date.accessioned.none.fl_str_mv 2024-09-19T22:18:06Z
dc.date.available.none.fl_str_mv 2024-09-19T22:18:06Z
dc.date.issued.none.fl_str_mv 2024-06-18
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/article
dc.type.local.spa.fl_str_mv Artículo
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.redcol.none.fl_str_mv http://purl.org/redcol/resource_type/ART
format http://purl.org/coar/resource_type/c_2df8fbb1
dc.identifier.issn.spa.fl_str_mv ISSN: 1657-2831
e-ISSN: 2539-2115
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/20.500.12749/26661
dc.identifier.instname.spa.fl_str_mv instname:Universidad Autónoma de Bucaramanga UNAB
dc.identifier.repourl.spa.fl_str_mv repourl:https://repository.unab.edu.co
dc.identifier.doi.none.fl_str_mv https://doi.org/10.29375/25392115.5054
identifier_str_mv ISSN: 1657-2831
e-ISSN: 2539-2115
instname:Universidad Autónoma de Bucaramanga UNAB
repourl:https://repository.unab.edu.co
url http://hdl.handle.net/20.500.12749/26661
https://doi.org/10.29375/25392115.5054
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.spa.fl_str_mv https://revistas.unab.edu.co/index.php/rcc/article/view/5054/3970
dc.relation.uri.spa.fl_str_mv https://revistas.unab.edu.co/index.php/rcc/issue/view/297
dc.relation.references.none.fl_str_mv Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2016, March 14). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv(1603.04467 [cs.DC]). doi:10.48550/arXiv.1603.04467
Agarwal, S., Yan, C., Zhang, Z., & Venkataraman, S. (2023, October). BagPipe: Accelerating Deep Recommendation Model Training. SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles (SOSP '23)(pp. 348-363). Koblenz, Germany: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3600006.3613142
Akintoye, S. B., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. doi:10.1109/ACCESS.2022.3193690
Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET)(pp. 1-6). Antalya, Turkey: IEEE. doi:10.1109/ICEngTechnol.2017.8308186
Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis(pp. 1-15). Dallas, TX, USA: IEEE. doi:10.1109/SC41404.2022.00051
Batur Dinler, Ö., Şahin, B. C., & Abualigah, L. (2021, November 30). Comparison of Performance of Phishing Web Sites with Different DeepLearning4J Models. European Journal of Science and Technology(28), 425-431. doi:10.31590/ejosat.1004778
Ben-Nun, T., & Hoefler, T. (2019, August 30). Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis. ACM Computing Surveys (CSUR), 52(4), 1-43, Article No. 65. doi:10.1145/3320060
Cai, Z., Yan, X., Ma, K., Yidi, W., Huang, Y., Cheng, J., . . . Yu, F. (2022, August 1). TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism. IEEE Transactions on Parallel and Distributed Systems, 33(8), 1967-1981. doi:10.1109/TPDS.2021.3132413
Camp, D., Garth, C., Childs, H., Pugmire, D., & Joy, K. (2011, November). Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1702-1713. doi:10.1109/TVCG.2010.259
Chen, C.-C., Yang, C.-L., & Cheng, H.-Y. (2019, October 28). Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv:1809.02839v4 [cs.DC]. doi:10.48550/arXiv.1809.02839
Chen, M. (2023, March 15). Analysis of Data Parallelism Methods with Deep Neural Network. ITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering(pp. 1857-1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3573428.3573755
Chen, T., Huang, S., Xie, Y., Jiao, B., Jiang, D., Zhou, H., . . . Wei, F. (2022, June 2). Task-Specific Expert Pruning for Sparse Mixture-of-Experts. arXiv:2206.00277v2 [cs.LG], 1-13. doi:10.48550/arXiv.2206.00277
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., . . . Zhang, Z. (2015, December 3). MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv, arXiv:1512.01274v1 [cs.DC], 1-6. doi:10.48550/arXiv.1512.01274
Chen, Z., Deng, Y., Wu, Y., Gu, Q., & Li, Y. (2022). Towards Understanding the Mixture-of-Experts Layer in Deep Learning. In A. H. Oh, A. Agarwal, D. Belgrave, & K. Cho (Ed.), Advances in Neural Information Precessing Systems. New Orleans, Louisiana, USA. Retrieved from https://openreview.net/forum?id=MaYzugDmQV
Collobert, R., Bengio, S., & Mariéthoz, J. (2002, October 30). Torch: a modular machine learning software library. Research Report, IDIAP, Martigny, Switezerland. Retrieved from https://publications.idiap.ch/downloads/reports/2002/rr02-46.pdf
Dai, D., Dong, L., Ma, S., Zheng, B., Sui, Z., Chang, B., & Wei, F. (2022, May). StableMoE: Stable Routing Strategy for Mixture of Experts. In S. Muresan, P. Nakov, & A. Villavicencio (Ed.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1: Long Papers, pp. 7085–7095. Dublin, Ireland: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.489
Duan, Y., Lai, Z., Li, S., Liu, W., Ge, K., Liang, P., & Li, D. (2022). HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 313-323). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00043
Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021, February). DAPPLE: a pipelined data parallel approach for training large models. PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3437801.3441593
Fedus, W., Zoph, B., & Shazeer, N. (2022, January 1). Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. (A. Clark, Ed.) The Journal of Machine Learning Research, 23(1), Article No. 120, 5232-5270. Retrieved from https://dl.acm.org/doi/abs/10.5555/3586589.3586709
Gholami, A., Azad, A., Jin, P., Keutzer, K., & Buluc, A. (2018). Integrated Model, Batch, and Domain Parallelism in Training Neural Networks. SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (pp. 77-86). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3210377.3210394
Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. doi:10.48550/arXiv.1911.04610
Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 8). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC], 1-14. doi:10.48550/arXiv.1806.03377
Hazimeh, H., Zhao, Z., Aakanksha, C., Sathiamoorthy, M., Chen, Y., Mazumder, R., . . . Chi, E. H. (2024). DSelect-k: differentiable selection in the mixture of experts with applications to multi-task learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 2246, pp. 29335-29347. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3542507
He, C., Li, S., Soltanolkotabi, M., & Avestimehr, S. (2021, July). PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 4150-4159. PMLR. Retrieved from https://proceedings.mlr.press/v139/he21a.html
He, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., & Tang, J. (2021, March 24). FastMoE: A Fast Mixture-of-Expert Training System. arXiv:2103.13262v1 [cs.LG], 1-11. doi:10.48550/arXiv.2103.13262
Hey, T. (2020, October 1). Opportunities and Challenges from Artificial Intelligence and Machine Learning for the Advancement of Science, Technology, and the Office of Science Missions. Technical Report, USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR), United States. doi:10.2172/1734848
Hopfield, J. J. (1988, September). Artificial neural networks. IEEE Circuits and Devices Magazine, 4(5), 3-10. doi:10.1109/101.8118
Howison, M., Bethel, E. W., & Childs, H. (2012, January). Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems. IEEE Transactions on Visualization and Computer Graphics, 18(1), 17-29. doi:10.1109/TVCG.2011.24
Hu, Y., Imes, C., Zhao, X., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2021, October 28). Pipeline Parallelism for Inference on Heterogeneous Edge Computing. arXiv:2110.14895v1 [cs.DC], 1-12. doi:10.48550/arXiv.2110.14895
Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, December 8). GPipe: efficient training of giant neural networks using pipeline parallelism. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, & E. B. Fox (Ed.), Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS'19). Article No. 10, pp. 103 - 112. Vancouver, BC, Canada: Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3454287.3454297
Hwang, C., Cui, W., Xiong, Y., Yang, Z., Liu, Z., Hu, H., . . . Xiong, Y. (2023, June 5). Tutel: Adaptive Mixture-of-Experts at Scale. arXiv:2206.03382v2 [cs.DC], 1-19. doi:10.48550/arXiv.2206.03382
Janbi, N., Katib, I., & Mehmood, R. (2023, May). Distributed artificial intelligence: Taxonomy, review, framework, and reference architecture. Intelligent Systems with Applications, 18, 200231. doi:10.1016/j.iswa.2023.200231
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, November 3). Caffe: Convolutional Architecture for Fast Feature Embedding. MM '14: Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). Orlando, Florida, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2647868.2654889
Jia, Z., Lin, S., Qi, C. R., & Aiken, A. (2018). Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks. In J. Dy, & A. Krause (Ed.), Proceedings of the 35th International Conference on Machine Learning. 80, pp. 2274-2283. PMLR. Retrieved from https://proceedings.mlr.press/v80/jia18a.html
Jiang, W., Zhang, Y., Liu, P., Peng, J., Yang, L. T., Ye, G., & Jin, H. (2020, January). Exploiting potential of deep neural networks by layer-wise fine-grained parallelism. Future Generation Computer Systems, 102, 210-221. doi:10.1016/j.future.2019.07.054
Kamruzzaman, M., Swanson, S., & Tullsen, D. M. (2013, November 17). Load-balanced pipeline parallelism. SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Article No. 14, pp. 1-12. Denver, Colorado, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2503210.2503295
Kirby, A. C., Samsi, S., Jones, M., Reuther, A., Kepner, J., & Gadepally, V. (2020, September). Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid. (2007.07336 [cs.LG]), 1-7. doi:10.1109/HPEC43674.2020.9286180
Kossmann, F., Jia, Z., & Aiken, A. (2022, August 2). Optimizing Mixture of Experts using Dynamic Recompilations. arXiv:2205.01848v2 [cs.LG] , 1-13. doi:10.48550/arXiv.2205.01848
Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. doi:10.48550/arXiv.1404.5997
Kukačka, J., Golkov, V., & Cremers, D. (2017, October 29). Regularization for Deep Learning: A Taxonomy. arXiv:1710.10686v1 [cs.LG], 1-23. doi:10.48550/arXiv.1710.10686
Li, C., Yao, Z., Wu, X., Zhang, M., Holmes, C., Li, C., & He, Y. (2024, January 14). DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. arXiv:2212.03597v3 [cs.LG], 1-19. doi:10.48550/arXiv.2212.03597
Li, J., Jiang, Y., Zhu, Y., Wang, C., & Xu, H. (2023, July). Accelerating Distributed MoE Training and Inference with Lina. 2023 USENIX Annual Technical Conference (USENIX ATC 23) (pp. 945-959). USENIX Association, Boston, MA, USA. Retrieved from https://www.usenix.org/conference/atc23/presentation/li-jiamin
Li, S., & Hoefler, T. (2021, November). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476145
Li, S., Liu, H., Bian, Z., Fang, J., Huang, H., Liu, Y., . . . You, Y. (2023, August). Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing (pp. 766-775). Salt Lake City, UT, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3605573.3605613
Li, S., Mangoubi, O., Xu, L., & Guo, T. (2021). Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning. 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS) (pp. 528-538). DC, USA: IEEE. doi:10.1109/ICDCS51616.2021.00057
Li, Y., Huang, J., Li, Z., Zhou, S., Jiang, W., & Wang, J. (2023). HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning. ICPP '22: Proceedings of the 51st International Conference on Parallel Processing (pp. 1-11). Bordeaux, France: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3545008.3545024
Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022, December). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999-7019. doi:10.1109/TNNLS.2021.3084827
Li, Z., Zhuang, S., Guo, S., Zhuo, D., Zhang, H., Song, D., & Stoica, I. (2021). TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 6543-6552. PMLR. Retrieved from https://proceedings.mlr.press/v139/li21y.html
Liang, P., Tang, Y., Zhang, X., Bai, Y., Su, T., Lai, Z., . . . Li, D. (2023, August). A Survey on Auto-Parallelism of Large-Scale Deep Learning Training. IEEE Transactions on Parallel and Distributed Systems, 34(8), 2377-2390. doi:10.1109/TPDS.2023.3281931
Liu, D., Chen, X., Zhou, Z., & Ling, Q. (2020, May 15). HierTrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism in Mobile-Edge-Cloud Computing. IEEE Open Journal of the Communications Society, 1, 634-645. doi:10.1109/OJCOMS.2020.2994737
Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00042
Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., & Chi, E. H. (2018, July). Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1930-1939). London, United Kingdom: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3219819.3220007
Manaswi, N. K. (2018). Understanding and Working with Keras. In N. K. Manaswi, Deep Learning with Applications Using Python: Chatbots and Face, Object, and Speech Recognition With TensorFlow and Keras (pp. 31–43). Berkeley, CA, USA: Apress. doi:10.1007/978-1-4842-3516-4
Mastoras, A., & Gross, T. R. (2018, February 24). Understanding Parallelization Tradeoffs for Linear Pipelines. In Q. Chen, Z. Huang, & P. Balaji (Ed.), PMAM'18: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (pp. 1-10). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3178442.3178443
Miao, X., Wang, Y., Jiang, Y., Shi, C., Nie, X., Zhang, H., & Cui, B. (2022, November 1). Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism. Proceedings of the VLDB Endowment, 16(3), 470-479. doi:10.14778/3570690.3570697
Mirhoseini, A., Pham, H., Le, Q. V., Steiner, B., Larsen, R., Zhou, Y., . . . Dean, J. (2017, June 25). Device Placement Optimization with Reinforcement Learning. arXiv:1706.04972v2 [cs.LG], 1-11. doi:10.48550/arXiv.1706.04972
Mittal, S., & Vaishay, S. (2019, October). A survey of techniques for optimizing deep learning on GPUs. Journal of Systems Architecture, 99, 101635. doi:10.1016/j.sysarc.2019.101635
Moreno-Alvarez, S., Haut, J. M., Paoletti, M. E., & Rico-Gallego, J. A. (2021, June 21). Heterogeneous model parallelism for deep neural networks. Neurocomputing, 441, 1-12. doi:10.1016/j.neucom.2021.01.125
Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., . . . Zaharia, M. (2019, October). PipeDream: generalized pipeline parallelism for DNN training. SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3341301.3359646
Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., Patwary, M., Korthikanti, V., . . . Zaharia, M. (2021). Efficient large-scale language model training on GPU clusters using megatron-LM. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 58, pp. 1-15. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476209
Nie, X., Miao, X., Cao, S., Ma, L., Liu, Q., Xue, J., . . . Cui, B. (2022, October 9). EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate. arXiv:2112.14397v2 [cs.LG], 1-14. doi:10.48550/arXiv.2112.14397
Oyama, Y., Maruyama, N., Dryden, N., McCarthy, E., Harrington, P., Balewski, J., . . . Van Essen, B. (2021, July 1). The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism. IEEE Transactions on Parallel and Distributed Systems, 32(7), 1641-1652. doi:10.1109/TPDS.2020.3047974
Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020, July). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. Retrieved from https://www.usenix.org/conference/atc20/presentation/park
Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Presa, M. R., . . . Iyengar, S. S. (2018, September 18). A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Computing Surveys (CSUR), 51(5), 1-36, Article No. 92. doi:10.1145/3234150

Rajbhandari, S., Li, C., Yao, Z., Zhang, M., Aminabadi, R. Y., Awan, A. A., . . . He, Y. (2022, July). DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, & S. Sabato (Ed.), Proceedings of the 39th International Conference on Machine Learning. 162, pp. 18332-18346. PMLR. Retrieved from https://proceedings.mlr.press/v162/rajbhandari22a.html
Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020, August). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3505 - 3506). Virtual Event, CA, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3394486.3406703
Ravanelli, M., Parcollet, T., & Bengio, Y. (2019). The Pytorch-kaldi Speech Recognition Toolkit. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465-6469). Brighton, UK: IEEE. doi:10.1109/ICASSP.2019.8683713
Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Pinto, A. S., . . . Houlsby, N. (2024, December). Scaling vision with sparse mixture of experts. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 657, pp. 8583-8595. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3540918
Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. J. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 1540, pp. 177-192. Springer, Cham. doi:10.1007/978-3-031-04209-6_13
Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. International Conference on Learning Representations (ICLR 2017), (pp. 1-19). Toulon, France. Retrieved from https://openreview.net/forum?id=B1ckMDqlg
Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2020, March 13). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053v4 [cs.CL], 1-15. doi:10.48550/arXiv.1909.08053
Song, L., Mao, J., Zhuo, Y., Qian, X., Li, H., & Chen, Y. (2019). HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array. 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 56-68). Washington, DC, USA: IEEE. doi:10.1109/HPCA.2019.00027
Stevens, R., Taylor, V., Nichols, J., Maccabe, A. B., Yelick, K., & Brown, D. (2020, February 1). AI for Science: Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science. Technical Report, USDOE; Lawrence Berkeley National Laboratory (LBNL); Argonne National Laboratory (ANL); Oak Ridge National Laboratory (ORNL), United States. doi:10.2172/1604756
Subhlok, J., Stichnoth, J. M., O'Hallaron, D. O., & Gross, T. (1993, July 1). Exploiting task and data parallelism on a multicomputer. ACM SIGPLAN Notices, 28(7), 13-22. doi:10.1145/173284.155334
Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. doi:10.1109/CANDARW51189.2020.00036
Tanaka, M., Taura, K., Hanawa, T., & Torisawa, K. (2021). Automatic Graph Partitioning for Very Large-scale Deep Learning. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1004-1013). Portland, OR, USA: IEEE. doi:10.1109/IPDPS49936.2021.00109
Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ACM Computing Surveys (CSUR), 53(2), 1-33, Article No. 30. doi:10.1145/3377454
Wang, H., Imes, C., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2023). Quantpipe: Applying Adaptive Post-Training Quantization For Distributed Transformer Pipelines In Dynamic Edge Environments. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). Rhodes Island, Greece: IEEE. doi:10.1109/ICASSP49357.2023.10096632
Wang, S.-C. (2003). Artificial Neural Network. In S.-C. Wang, Interdisciplinary Computing in Java Programming (1 ed., Vol. 743, pp. 81-100). Boston, MA, USA: Springer. doi:10.1007/978-1-4615-0377-4_5
Wang, Y., Feng, B., Wang, Z., Geng, T., Barker, K., Li, A., & Ding, Y. (2023, July). MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms. 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23) (pp. 779-795). Boston, MA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi23/presentation/wang-yuke
Wu, J. (2017, May 1). Introduction to Convolutional Neural Networks. Nanjing Universit, National Key Lab for Novel Software Technology, China. Retrieved from https://cs.nju.edu.cn/wujx/paper/CNN.pdf
Yang, B., Zhang, J., Li , J., Ré, C., Aberger, C. R., & De Sa, C. (2021, March 15). Proceedings of the 4th Machine Learning and Systems Conference, 3, pp. 269-296. San Jose, CA, USA. Retrieved from https://proceedings.mlsys.org/paper_files/paper/2021/file/9412531719be7ccf755c4ff98d0969dc-Paper.pdf
Yang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. The Tenth International Conference on Learning Representations (ICLR 2022), (pp. 1-15). Retrieved from https://openreview.net/forum?id=cw-EmNq5zfD
Yoon, J., Byeon, Y., Kim, J., & Lee, H. (2022, July 15). EdgePipe: Tailoring Pipeline Parallelism With Deep Neural Networks for Volatile Wireless Edge Devices. IEEE Internet of Things Journal, 9(14), 11633 - 11647. doi:10.1109/JIOT.2021.3131407
Yuan, L., He, Q., Chen, F., Dou, R., Jin, H., & Yang, Y. (2023, April 30). PipeEdge: A Trusted Pipelining Collaborative Edge Training based on Blockchain. In Y. Ding, J. Tang, J. Sequeda, L. Aroyo, C. Castillo, & G.-J. Houben (Ed.), WWW '23: Proceedings of the ACM Web Conference 2023 (pp. 3033-3043). Austin, TX, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3543507.3583413
Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). San Francisco, CA, USA: IEEE. doi:10.1109/DAC18074.2021.9586300
Zhang, J., Niu, G., Dai, Q., Li, H., Wu, Z., Dong, F., & Wu, Z. (2023, October 28). PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters. Neurocomputing, 555, 126661. doi:10.1016/j.neucom.2023.126661
Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. doi:10.1016/j.future.2023.04.033
Zhang, S., Diao, L., Wang, S., Cao, Z., Gu, Y., Si, C., . . . Lin, W. (2023, February 16). Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform. arXiv:2302.08141v1 [cs.DC], 1-16. doi:10.48550/arXiv.2302.08141
Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2021, January 14). BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training. arXiv:2012.12544v2 [cs.DC]. doi:10.48550/arXiv.2012.12544
Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2022). BaPipe: Balanced Pipeline Parallelism for DNN Training. Parallel Processing Letters, 32(03n04), 2250005, 1-17. doi:10.1142/S0129626422500050
Zhao, S., Li, F., Chen, X., Guan, X., Jiang, J., Huang, D., . . . Cui, H. (2022, March 1). vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training. IEEE Transactions on Parallel and Distributed Systems, 33(3), 489-506. doi:10.1109/TPDS.2021.3094364
Zheng, L., Li, Z., Zhang, H., Zhuang, Y., Chen, Z., Huang, Y., . . . Stoica, I. (2022, July). Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22) (pp. 559-578). Carlsbad, CA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin
Zhou, Q., Guo, S., Qu, Z., Li, P., Li, L., Guo, M., & Wang, K. (2021, May 1). Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid Synchronization. IEEE Transactions on Parallel and Distributed Systems, 32(5), 1030-1043. doi:10.1109/TPDS.2020.3040601
Zhu, X. (2023, April 28). Implement deep neuron networks on VPipe parallel system: a ResNet variant implementation. In X. Li (Ed.), Proceedings Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022). 12610, p. 126104I. Wuhan, China: International Society for Optics and Photonics, SPIE. doi:10.1117/12.2671359
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv http://purl.org/coar/access_right/c_abf2
dc.format.mimetype.spa.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Universidad Autónoma de Bucaramanga UNAB
dc.source.spa.fl_str_mv Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 60-73
institution Universidad Autónoma de Bucaramanga - UNAB
bitstream.url.fl_str_mv https://repository.unab.edu.co/bitstream/20.500.12749/26661/1/Art%c3%adculo.pdf
https://repository.unab.edu.co/bitstream/20.500.12749/26661/2/license.txt
https://repository.unab.edu.co/bitstream/20.500.12749/26661/3/Art%c3%adculo.pdf.jpg
bitstream.checksum.fl_str_mv 2ceb8ac67b998e81e420ed5c099d82b1
855f7d18ea80f5df821f7004dff2f316
65f3f84da420bac7596ec6e98c70869b
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional | Universidad Autónoma de Bucaramanga - UNAB
repository.mail.fl_str_mv repositorio@unab.edu.co
_version_ 1812205590485663744
spelling Romero Sandí, Hairol41889347-9c69-46f3-a39e-cac8fb987a7cNúñez, Gabriela45e771e-ff8c-4a47-8f30-619fc2bfb589Rojas, Elvisf3118ba3-c006-4846-ba9d-c1a41452acadRomero Sandí, Hairol [0000-0002-3199-1244]Núñez, Gabriel [0000-0002-6907-533X]Rojas, Elvis [0000-0002-4238-0908]2024-09-19T22:18:06Z2024-09-19T22:18:06Z2024-06-18ISSN: 1657-2831e-ISSN: 2539-2115http://hdl.handle.net/20.500.12749/26661instname:Universidad Autónoma de Bucaramanga UNABrepourl:https://repository.unab.edu.cohttps://doi.org/10.29375/25392115.5054application/pdfspaUniversidad Autónoma de Bucaramanga UNABhttps://revistas.unab.edu.co/index.php/rcc/article/view/5054/3970https://revistas.unab.edu.co/index.php/rcc/issue/view/297Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2016, March 14). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv(1603.04467 [cs.DC]). doi:10.48550/arXiv.1603.04467Agarwal, S., Yan, C., Zhang, Z., & Venkataraman, S. (2023, October). BagPipe: Accelerating Deep Recommendation Model Training. SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles (SOSP '23)(pp. 348-363). Koblenz, Germany: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3600006.3613142Akintoye, S. B., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. doi:10.1109/ACCESS.2022.3193690Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET)(pp. 1-6). Antalya, Turkey: IEEE. doi:10.1109/ICEngTechnol.2017.8308186Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis(pp. 1-15). Dallas, TX, USA: IEEE. doi:10.1109/SC41404.2022.00051Batur Dinler, Ö., Şahin, B. C., & Abualigah, L. (2021, November 30). Comparison of Performance of Phishing Web Sites with Different DeepLearning4J Models. European Journal of Science and Technology(28), 425-431. doi:10.31590/ejosat.1004778Ben-Nun, T., & Hoefler, T. (2019, August 30). Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis. ACM Computing Surveys (CSUR), 52(4), 1-43, Article No. 65. doi:10.1145/3320060Cai, Z., Yan, X., Ma, K., Yidi, W., Huang, Y., Cheng, J., . . . Yu, F. (2022, August 1). TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism. IEEE Transactions on Parallel and Distributed Systems, 33(8), 1967-1981. doi:10.1109/TPDS.2021.3132413Camp, D., Garth, C., Childs, H., Pugmire, D., & Joy, K. (2011, November). Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1702-1713. doi:10.1109/TVCG.2010.259Chen, C.-C., Yang, C.-L., & Cheng, H.-Y. (2019, October 28). Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv:1809.02839v4 [cs.DC]. doi:10.48550/arXiv.1809.02839Chen, M. (2023, March 15). Analysis of Data Parallelism Methods with Deep Neural Network. ITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering(pp. 1857-1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3573428.3573755Chen, T., Huang, S., Xie, Y., Jiao, B., Jiang, D., Zhou, H., . . . Wei, F. (2022, June 2). Task-Specific Expert Pruning for Sparse Mixture-of-Experts. arXiv:2206.00277v2 [cs.LG], 1-13. doi:10.48550/arXiv.2206.00277Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., . . . Zhang, Z. (2015, December 3). MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv, arXiv:1512.01274v1 [cs.DC], 1-6. doi:10.48550/arXiv.1512.01274Chen, Z., Deng, Y., Wu, Y., Gu, Q., & Li, Y. (2022). Towards Understanding the Mixture-of-Experts Layer in Deep Learning. In A. H. Oh, A. Agarwal, D. Belgrave, & K. Cho (Ed.), Advances in Neural Information Precessing Systems. New Orleans, Louisiana, USA. Retrieved from https://openreview.net/forum?id=MaYzugDmQVCollobert, R., Bengio, S., & Mariéthoz, J. (2002, October 30). Torch: a modular machine learning software library. Research Report, IDIAP, Martigny, Switezerland. Retrieved from https://publications.idiap.ch/downloads/reports/2002/rr02-46.pdfDai, D., Dong, L., Ma, S., Zheng, B., Sui, Z., Chang, B., & Wei, F. (2022, May). StableMoE: Stable Routing Strategy for Mixture of Experts. In S. Muresan, P. Nakov, & A. Villavicencio (Ed.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1: Long Papers, pp. 7085–7095. Dublin, Ireland: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.489Duan, Y., Lai, Z., Li, S., Liu, W., Ge, K., Liang, P., & Li, D. (2022). HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 313-323). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00043Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021, February). DAPPLE: a pipelined data parallel approach for training large models. PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3437801.3441593Fedus, W., Zoph, B., & Shazeer, N. (2022, January 1). Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. (A. Clark, Ed.) The Journal of Machine Learning Research, 23(1), Article No. 120, 5232-5270. Retrieved from https://dl.acm.org/doi/abs/10.5555/3586589.3586709Gholami, A., Azad, A., Jin, P., Keutzer, K., & Buluc, A. (2018). Integrated Model, Batch, and Domain Parallelism in Training Neural Networks. SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (pp. 77-86). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3210377.3210394Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. doi:10.48550/arXiv.1911.04610Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 8). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC], 1-14. doi:10.48550/arXiv.1806.03377Hazimeh, H., Zhao, Z., Aakanksha, C., Sathiamoorthy, M., Chen, Y., Mazumder, R., . . . Chi, E. H. (2024). DSelect-k: differentiable selection in the mixture of experts with applications to multi-task learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 2246, pp. 29335-29347. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3542507He, C., Li, S., Soltanolkotabi, M., & Avestimehr, S. (2021, July). PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 4150-4159. PMLR. Retrieved from https://proceedings.mlr.press/v139/he21a.htmlHe, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., & Tang, J. (2021, March 24). FastMoE: A Fast Mixture-of-Expert Training System. arXiv:2103.13262v1 [cs.LG], 1-11. doi:10.48550/arXiv.2103.13262Hey, T. (2020, October 1). Opportunities and Challenges from Artificial Intelligence and Machine Learning for the Advancement of Science, Technology, and the Office of Science Missions. Technical Report, USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR), United States. doi:10.2172/1734848Hopfield, J. J. (1988, September). Artificial neural networks. IEEE Circuits and Devices Magazine, 4(5), 3-10. doi:10.1109/101.8118Howison, M., Bethel, E. W., & Childs, H. (2012, January). Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems. IEEE Transactions on Visualization and Computer Graphics, 18(1), 17-29. doi:10.1109/TVCG.2011.24Hu, Y., Imes, C., Zhao, X., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2021, October 28). Pipeline Parallelism for Inference on Heterogeneous Edge Computing. arXiv:2110.14895v1 [cs.DC], 1-12. doi:10.48550/arXiv.2110.14895Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, December 8). GPipe: efficient training of giant neural networks using pipeline parallelism. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, & E. B. Fox (Ed.), Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS'19). Article No. 10, pp. 103 - 112. Vancouver, BC, Canada: Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3454287.3454297Hwang, C., Cui, W., Xiong, Y., Yang, Z., Liu, Z., Hu, H., . . . Xiong, Y. (2023, June 5). Tutel: Adaptive Mixture-of-Experts at Scale. arXiv:2206.03382v2 [cs.DC], 1-19. doi:10.48550/arXiv.2206.03382Janbi, N., Katib, I., & Mehmood, R. (2023, May). Distributed artificial intelligence: Taxonomy, review, framework, and reference architecture. Intelligent Systems with Applications, 18, 200231. doi:10.1016/j.iswa.2023.200231Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, November 3). Caffe: Convolutional Architecture for Fast Feature Embedding. MM '14: Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). Orlando, Florida, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2647868.2654889Jia, Z., Lin, S., Qi, C. R., & Aiken, A. (2018). Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks. In J. Dy, & A. Krause (Ed.), Proceedings of the 35th International Conference on Machine Learning. 80, pp. 2274-2283. PMLR. Retrieved from https://proceedings.mlr.press/v80/jia18a.htmlJiang, W., Zhang, Y., Liu, P., Peng, J., Yang, L. T., Ye, G., & Jin, H. (2020, January). Exploiting potential of deep neural networks by layer-wise fine-grained parallelism. Future Generation Computer Systems, 102, 210-221. doi:10.1016/j.future.2019.07.054Kamruzzaman, M., Swanson, S., & Tullsen, D. M. (2013, November 17). Load-balanced pipeline parallelism. SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Article No. 14, pp. 1-12. Denver, Colorado, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2503210.2503295Kirby, A. C., Samsi, S., Jones, M., Reuther, A., Kepner, J., & Gadepally, V. (2020, September). Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid. (2007.07336 [cs.LG]), 1-7. doi:10.1109/HPEC43674.2020.9286180Kossmann, F., Jia, Z., & Aiken, A. (2022, August 2). Optimizing Mixture of Experts using Dynamic Recompilations. arXiv:2205.01848v2 [cs.LG] , 1-13. doi:10.48550/arXiv.2205.01848Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. doi:10.48550/arXiv.1404.5997Kukačka, J., Golkov, V., & Cremers, D. (2017, October 29). Regularization for Deep Learning: A Taxonomy. arXiv:1710.10686v1 [cs.LG], 1-23. doi:10.48550/arXiv.1710.10686Li, C., Yao, Z., Wu, X., Zhang, M., Holmes, C., Li, C., & He, Y. (2024, January 14). DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. arXiv:2212.03597v3 [cs.LG], 1-19. doi:10.48550/arXiv.2212.03597Li, J., Jiang, Y., Zhu, Y., Wang, C., & Xu, H. (2023, July). Accelerating Distributed MoE Training and Inference with Lina. 2023 USENIX Annual Technical Conference (USENIX ATC 23) (pp. 945-959). USENIX Association, Boston, MA, USA. Retrieved from https://www.usenix.org/conference/atc23/presentation/li-jiaminLi, S., & Hoefler, T. (2021, November). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476145Li, S., Liu, H., Bian, Z., Fang, J., Huang, H., Liu, Y., . . . You, Y. (2023, August). Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing (pp. 766-775). Salt Lake City, UT, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3605573.3605613Li, S., Mangoubi, O., Xu, L., & Guo, T. (2021). Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning. 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS) (pp. 528-538). DC, USA: IEEE. doi:10.1109/ICDCS51616.2021.00057Li, Y., Huang, J., Li, Z., Zhou, S., Jiang, W., & Wang, J. (2023). HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning. ICPP '22: Proceedings of the 51st International Conference on Parallel Processing (pp. 1-11). Bordeaux, France: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3545008.3545024Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022, December). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999-7019. doi:10.1109/TNNLS.2021.3084827Li, Z., Zhuang, S., Guo, S., Zhuo, D., Zhang, H., Song, D., & Stoica, I. (2021). TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 6543-6552. PMLR. Retrieved from https://proceedings.mlr.press/v139/li21y.htmlLiang, P., Tang, Y., Zhang, X., Bai, Y., Su, T., Lai, Z., . . . Li, D. (2023, August). A Survey on Auto-Parallelism of Large-Scale Deep Learning Training. IEEE Transactions on Parallel and Distributed Systems, 34(8), 2377-2390. doi:10.1109/TPDS.2023.3281931Liu, D., Chen, X., Zhou, Z., & Ling, Q. (2020, May 15). HierTrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism in Mobile-Edge-Cloud Computing. IEEE Open Journal of the Communications Society, 1, 634-645. doi:10.1109/OJCOMS.2020.2994737Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00042Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., & Chi, E. H. (2018, July). Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1930-1939). London, United Kingdom: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3219819.3220007Manaswi, N. K. (2018). Understanding and Working with Keras. In N. K. Manaswi, Deep Learning with Applications Using Python: Chatbots and Face, Object, and Speech Recognition With TensorFlow and Keras (pp. 31–43). Berkeley, CA, USA: Apress. doi:10.1007/978-1-4842-3516-4Mastoras, A., & Gross, T. R. (2018, February 24). Understanding Parallelization Tradeoffs for Linear Pipelines. In Q. Chen, Z. Huang, & P. Balaji (Ed.), PMAM'18: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (pp. 1-10). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3178442.3178443Miao, X., Wang, Y., Jiang, Y., Shi, C., Nie, X., Zhang, H., & Cui, B. (2022, November 1). Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism. Proceedings of the VLDB Endowment, 16(3), 470-479. doi:10.14778/3570690.3570697Mirhoseini, A., Pham, H., Le, Q. V., Steiner, B., Larsen, R., Zhou, Y., . . . Dean, J. (2017, June 25). Device Placement Optimization with Reinforcement Learning. arXiv:1706.04972v2 [cs.LG], 1-11. doi:10.48550/arXiv.1706.04972Mittal, S., & Vaishay, S. (2019, October). A survey of techniques for optimizing deep learning on GPUs. Journal of Systems Architecture, 99, 101635. doi:10.1016/j.sysarc.2019.101635Moreno-Alvarez, S., Haut, J. M., Paoletti, M. E., & Rico-Gallego, J. A. (2021, June 21). Heterogeneous model parallelism for deep neural networks. Neurocomputing, 441, 1-12. doi:10.1016/j.neucom.2021.01.125Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., . . . Zaharia, M. (2019, October). PipeDream: generalized pipeline parallelism for DNN training. SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3341301.3359646Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., Patwary, M., Korthikanti, V., . . . Zaharia, M. (2021). Efficient large-scale language model training on GPU clusters using megatron-LM. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 58, pp. 1-15. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476209Nie, X., Miao, X., Cao, S., Ma, L., Liu, Q., Xue, J., . . . Cui, B. (2022, October 9). EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate. arXiv:2112.14397v2 [cs.LG], 1-14. doi:10.48550/arXiv.2112.14397Oyama, Y., Maruyama, N., Dryden, N., McCarthy, E., Harrington, P., Balewski, J., . . . Van Essen, B. (2021, July 1). The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism. IEEE Transactions on Parallel and Distributed Systems, 32(7), 1641-1652. doi:10.1109/TPDS.2020.3047974Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020, July). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. Retrieved from https://www.usenix.org/conference/atc20/presentation/parkPouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Presa, M. R., . . . Iyengar, S. S. (2018, September 18). A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Computing Surveys (CSUR), 51(5), 1-36, Article No. 92. doi:10.1145/3234150Rajbhandari, S., Li, C., Yao, Z., Zhang, M., Aminabadi, R. Y., Awan, A. A., . . . He, Y. (2022, July). DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, & S. Sabato (Ed.), Proceedings of the 39th International Conference on Machine Learning. 162, pp. 18332-18346. PMLR. Retrieved from https://proceedings.mlr.press/v162/rajbhandari22a.htmlRasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020, August). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3505 - 3506). Virtual Event, CA, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3394486.3406703Ravanelli, M., Parcollet, T., & Bengio, Y. (2019). The Pytorch-kaldi Speech Recognition Toolkit. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465-6469). Brighton, UK: IEEE. doi:10.1109/ICASSP.2019.8683713Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Pinto, A. S., . . . Houlsby, N. (2024, December). Scaling vision with sparse mixture of experts. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 657, pp. 8583-8595. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3540918Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. J. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 1540, pp. 177-192. Springer, Cham. doi:10.1007/978-3-031-04209-6_13Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. International Conference on Learning Representations (ICLR 2017), (pp. 1-19). Toulon, France. Retrieved from https://openreview.net/forum?id=B1ckMDqlgShoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2020, March 13). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053v4 [cs.CL], 1-15. doi:10.48550/arXiv.1909.08053Song, L., Mao, J., Zhuo, Y., Qian, X., Li, H., & Chen, Y. (2019). HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array. 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 56-68). Washington, DC, USA: IEEE. doi:10.1109/HPCA.2019.00027Stevens, R., Taylor, V., Nichols, J., Maccabe, A. B., Yelick, K., & Brown, D. (2020, February 1). AI for Science: Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science. Technical Report, USDOE; Lawrence Berkeley National Laboratory (LBNL); Argonne National Laboratory (ANL); Oak Ridge National Laboratory (ORNL), United States. doi:10.2172/1604756Subhlok, J., Stichnoth, J. M., O'Hallaron, D. O., & Gross, T. (1993, July 1). Exploiting task and data parallelism on a multicomputer. ACM SIGPLAN Notices, 28(7), 13-22. doi:10.1145/173284.155334Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. doi:10.1109/CANDARW51189.2020.00036Tanaka, M., Taura, K., Hanawa, T., & Torisawa, K. (2021). Automatic Graph Partitioning for Very Large-scale Deep Learning. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1004-1013). Portland, OR, USA: IEEE. doi:10.1109/IPDPS49936.2021.00109Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ACM Computing Surveys (CSUR), 53(2), 1-33, Article No. 30. doi:10.1145/3377454Wang, H., Imes, C., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2023). Quantpipe: Applying Adaptive Post-Training Quantization For Distributed Transformer Pipelines In Dynamic Edge Environments. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). Rhodes Island, Greece: IEEE. doi:10.1109/ICASSP49357.2023.10096632Wang, S.-C. (2003). Artificial Neural Network. In S.-C. Wang, Interdisciplinary Computing in Java Programming (1 ed., Vol. 743, pp. 81-100). Boston, MA, USA: Springer. doi:10.1007/978-1-4615-0377-4_5Wang, Y., Feng, B., Wang, Z., Geng, T., Barker, K., Li, A., & Ding, Y. (2023, July). MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms. 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23) (pp. 779-795). Boston, MA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi23/presentation/wang-yukeWu, J. (2017, May 1). Introduction to Convolutional Neural Networks. Nanjing Universit, National Key Lab for Novel Software Technology, China. Retrieved from https://cs.nju.edu.cn/wujx/paper/CNN.pdfYang, B., Zhang, J., Li , J., Ré, C., Aberger, C. R., & De Sa, C. (2021, March 15). Proceedings of the 4th Machine Learning and Systems Conference, 3, pp. 269-296. San Jose, CA, USA. Retrieved from https://proceedings.mlsys.org/paper_files/paper/2021/file/9412531719be7ccf755c4ff98d0969dc-Paper.pdfYang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. The Tenth International Conference on Learning Representations (ICLR 2022), (pp. 1-15). Retrieved from https://openreview.net/forum?id=cw-EmNq5zfDYoon, J., Byeon, Y., Kim, J., & Lee, H. (2022, July 15). EdgePipe: Tailoring Pipeline Parallelism With Deep Neural Networks for Volatile Wireless Edge Devices. IEEE Internet of Things Journal, 9(14), 11633 - 11647. doi:10.1109/JIOT.2021.3131407Yuan, L., He, Q., Chen, F., Dou, R., Jin, H., & Yang, Y. (2023, April 30). PipeEdge: A Trusted Pipelining Collaborative Edge Training based on Blockchain. In Y. Ding, J. Tang, J. Sequeda, L. Aroyo, C. Castillo, & G.-J. Houben (Ed.), WWW '23: Proceedings of the ACM Web Conference 2023 (pp. 3033-3043). Austin, TX, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3543507.3583413Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). San Francisco, CA, USA: IEEE. doi:10.1109/DAC18074.2021.9586300Zhang, J., Niu, G., Dai, Q., Li, H., Wu, Z., Dong, F., & Wu, Z. (2023, October 28). PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters. Neurocomputing, 555, 126661. doi:10.1016/j.neucom.2023.126661Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. doi:10.1016/j.future.2023.04.033Zhang, S., Diao, L., Wang, S., Cao, Z., Gu, Y., Si, C., . . . Lin, W. (2023, February 16). Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform. arXiv:2302.08141v1 [cs.DC], 1-16. doi:10.48550/arXiv.2302.08141Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2021, January 14). BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training. arXiv:2012.12544v2 [cs.DC]. doi:10.48550/arXiv.2012.12544Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2022). BaPipe: Balanced Pipeline Parallelism for DNN Training. Parallel Processing Letters, 32(03n04), 2250005, 1-17. doi:10.1142/S0129626422500050Zhao, S., Li, F., Chen, X., Guan, X., Jiang, J., Huang, D., . . . Cui, H. (2022, March 1). vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training. IEEE Transactions on Parallel and Distributed Systems, 33(3), 489-506. doi:10.1109/TPDS.2021.3094364Zheng, L., Li, Z., Zhang, H., Zhuang, Y., Chen, Z., Huang, Y., . . . Stoica, I. (2022, July). Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22) (pp. 559-578). Carlsbad, CA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi22/presentation/zheng-lianminZhou, Q., Guo, S., Qu, Z., Li, P., Li, L., Guo, M., & Wang, K. (2021, May 1). Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid Synchronization. IEEE Transactions on Parallel and Distributed Systems, 32(5), 1030-1043. doi:10.1109/TPDS.2020.3040601Zhu, X. (2023, April 28). Implement deep neuron networks on VPipe parallel system: a ResNet variant implementation. In X. Li (Ed.), Proceedings Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022). 12610, p. 126104I. Wuhan, China: International Society for Optics and Photonics, SPIE. doi:10.1117/12.2671359Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 60-73A Snapshot of Parallelism in Distributed Deep Learning Traininginfo:eu-repo/semantics/articleArtículohttp://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a85Deep learningParallelismArtificial neural networksThe accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view. The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one.http://purl.org/coar/access_right/c_abf2ORIGINALArtículo.pdfArtículo.pdfArtículoapplication/pdf535402https://repository.unab.edu.co/bitstream/20.500.12749/26661/1/Art%c3%adculo.pdf2ceb8ac67b998e81e420ed5c099d82b1MD51open accessLICENSElicense.txtlicense.txttext/plain; charset=utf-8347https://repository.unab.edu.co/bitstream/20.500.12749/26661/2/license.txt855f7d18ea80f5df821f7004dff2f316MD52open accessTHUMBNAILArtículo.pdf.jpgArtículo.pdf.jpgIM Thumbnailimage/jpeg9833https://repository.unab.edu.co/bitstream/20.500.12749/26661/3/Art%c3%adculo.pdf.jpg65f3f84da420bac7596ec6e98c70869bMD53open access20.500.12749/26661oai:repository.unab.edu.co:20.500.12749/266612024-09-19 22:02:09.89open accessRepositorio Institucional | Universidad Autónoma de Bucaramanga - UNABrepositorio@unab.edu.coTGEgUmV2aXN0YSBDb2xvbWJpYW5hIGRlIENvbXB1dGFjacOzbiBlcyBmaW5hbmNpYWRhIHBvciBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgQnVjYXJhbWFuZ2EuIEVzdGEgUmV2aXN0YSBubyBjb2JyYSB0YXNhIGRlIHN1bWlzacOzbiB5IHB1YmxpY2FjacOzbiBkZSBhcnTDrWN1bG9zLiBQcm92ZWUgYWNjZXNvIGxpYnJlIGlubWVkaWF0byBhIHN1IGNvbnRlbmlkbyBiYWpvIGVsIHByaW5jaXBpbyBkZSBxdWUgaGFjZXIgZGlzcG9uaWJsZSBncmF0dWl0YW1lbnRlIGludmVzdGlnYWNpw7NuIGFsIHDDumJsaWNvIGFwb3lhIGEgdW4gbWF5b3IgaW50ZXJjYW1iaW8gZGUgY29ub2NpbWllbnRvIGdsb2JhbC4=