A Snapshot of Parallelism in Distributed Deep Learning Training

The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without t...

Full description

Autores:: Romero Sandí, Hairol
Núñez, Gabriel
Rojas, Elvis

Tipo de recurso:: Article of investigation

Fecha de publicación:: 2024

Institución:: Universidad Autónoma de Bucaramanga - UNAB

Repositorio:: Repositorio UNAB

Idioma:: spa

id	UNAB2_cb29868e5b8fee08a90e7761f58d303d
oai_identifier_str	oai:repository.unab.edu.co:20.500.12749/26661
network_acronym_str	UNAB2
network_name_str	Repositorio UNAB
repository_id_str
dc.title.eng.fl_str_mv	A Snapshot of Parallelism in Distributed Deep Learning Training
title	A Snapshot of Parallelism in Distributed Deep Learning Training
spellingShingle	A Snapshot of Parallelism in Distributed Deep Learning Training Deep learning Parallelism Artificial neural networks
title_short	A Snapshot of Parallelism in Distributed Deep Learning Training
title_full	A Snapshot of Parallelism in Distributed Deep Learning Training
title_fullStr	A Snapshot of Parallelism in Distributed Deep Learning Training
title_full_unstemmed	A Snapshot of Parallelism in Distributed Deep Learning Training
title_sort	A Snapshot of Parallelism in Distributed Deep Learning Training
dc.creator.fl_str_mv	Romero Sandí, Hairol Núñez, Gabriel Rojas, Elvis
dc.contributor.author.none.fl_str_mv	Romero Sandí, Hairol Núñez, Gabriel Rojas, Elvis
dc.contributor.orcid.spa.fl_str_mv	Romero Sandí, Hairol [0000-0002-3199-1244] Núñez, Gabriel [0000-0002-6907-533X] Rojas, Elvis [0000-0002-4238-0908]
dc.subject.keywords.eng.fl_str_mv	Deep learning Parallelism Artificial neural networks
topic	Deep learning Parallelism Artificial neural networks
description	The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view. The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one.
publishDate	2024
dc.date.accessioned.none.fl_str_mv	2024-09-19T22:18:06Z
dc.date.available.none.fl_str_mv	2024-09-19T22:18:06Z
dc.date.issued.none.fl_str_mv	2024-06-18
dc.type.coarversion.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.none.fl_str_mv	info:eu-repo/semantics/article
dc.type.local.spa.fl_str_mv	Artículo
dc.type.coar.none.fl_str_mv	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.redcol.none.fl_str_mv	http://purl.org/redcol/resource_type/ART
format	http://purl.org/coar/resource_type/c_2df8fbb1
dc.identifier.issn.spa.fl_str_mv	ISSN: 1657-2831 e-ISSN: 2539-2115
dc.identifier.uri.none.fl_str_mv	http://hdl.handle.net/20.500.12749/26661
dc.identifier.instname.spa.fl_str_mv	instname:Universidad Autónoma de Bucaramanga UNAB
dc.identifier.repourl.spa.fl_str_mv	repourl:https://repository.unab.edu.co
dc.identifier.doi.none.fl_str_mv	https://doi.org/10.29375/25392115.5054
identifier_str_mv	ISSN: 1657-2831 e-ISSN: 2539-2115 instname:Universidad Autónoma de Bucaramanga UNAB repourl:https://repository.unab.edu.co
url	http://hdl.handle.net/20.500.12749/26661 https://doi.org/10.29375/25392115.5054
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.spa.fl_str_mv	https://revistas.unab.edu.co/index.php/rcc/article/view/5054/3970
dc.relation.uri.spa.fl_str_mv	https://revistas.unab.edu.co/index.php/rcc/issue/view/297
dc.relation.references.none.fl_str_mv	Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2016, March 14). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv(1603.04467 [cs.DC]). doi:10.48550/arXiv.1603.04467 Agarwal, S., Yan, C., Zhang, Z., & Venkataraman, S. (2023, October). BagPipe: Accelerating Deep Recommendation Model Training. SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles (SOSP '23)(pp. 348-363). Koblenz, Germany: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3600006.3613142 Akintoye, S. B., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. doi:10.1109/ACCESS.2022.3193690 Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET)(pp. 1-6). Antalya, Turkey: IEEE. doi:10.1109/ICEngTechnol.2017.8308186 Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis(pp. 1-15). Dallas, TX, USA: IEEE. doi:10.1109/SC41404.2022.00051 Batur Dinler, Ö., Şahin, B. C., & Abualigah, L. (2021, November 30). Comparison of Performance of Phishing Web Sites with Different DeepLearning4J Models. European Journal of Science and Technology(28), 425-431. doi:10.31590/ejosat.1004778 Ben-Nun, T., & Hoefler, T. (2019, August 30). Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis. ACM Computing Surveys (CSUR), 52(4), 1-43, Article No. 65. doi:10.1145/3320060 Cai, Z., Yan, X., Ma, K., Yidi, W., Huang, Y., Cheng, J., . . . Yu, F. (2022, August 1). TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism. IEEE Transactions on Parallel and Distributed Systems, 33(8), 1967-1981. doi:10.1109/TPDS.2021.3132413 Camp, D., Garth, C., Childs, H., Pugmire, D., & Joy, K. (2011, November). Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1702-1713. doi:10.1109/TVCG.2010.259 Chen, C.-C., Yang, C.-L., & Cheng, H.-Y. (2019, October 28). Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv:1809.02839v4 [cs.DC]. doi:10.48550/arXiv.1809.02839 Chen, M. (2023, March 15). Analysis of Data Parallelism Methods with Deep Neural Network. ITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering(pp. 1857-1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3573428.3573755 Chen, T., Huang, S., Xie, Y., Jiao, B., Jiang, D., Zhou, H., . . . Wei, F. (2022, June 2). Task-Specific Expert Pruning for Sparse Mixture-of-Experts. arXiv:2206.00277v2 [cs.LG], 1-13. doi:10.48550/arXiv.2206.00277 Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., . . . Zhang, Z. (2015, December 3). MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv, arXiv:1512.01274v1 [cs.DC], 1-6. doi:10.48550/arXiv.1512.01274 Chen, Z., Deng, Y., Wu, Y., Gu, Q., & Li, Y. (2022). Towards Understanding the Mixture-of-Experts Layer in Deep Learning. In A. H. Oh, A. Agarwal, D. Belgrave, & K. Cho (Ed.), Advances in Neural Information Precessing Systems. New Orleans, Louisiana, USA. Retrieved from https://openreview.net/forum?id=MaYzugDmQV Collobert, R., Bengio, S., & Mariéthoz, J. (2002, October 30). Torch: a modular machine learning software library. Research Report, IDIAP, Martigny, Switezerland. Retrieved from https://publications.idiap.ch/downloads/reports/2002/rr02-46.pdf Dai, D., Dong, L., Ma, S., Zheng, B., Sui, Z., Chang, B., & Wei, F. (2022, May). StableMoE: Stable Routing Strategy for Mixture of Experts. In S. Muresan, P. Nakov, & A. Villavicencio (Ed.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1: Long Papers, pp. 7085–7095. Dublin, Ireland: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.489 Duan, Y., Lai, Z., Li, S., Liu, W., Ge, K., Liang, P., & Li, D. (2022). HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 313-323). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00043 Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021, February). DAPPLE: a pipelined data parallel approach for training large models. PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3437801.3441593 Fedus, W., Zoph, B., & Shazeer, N. (2022, January 1). Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. (A. Clark, Ed.) The Journal of Machine Learning Research, 23(1), Article No. 120, 5232-5270. Retrieved from https://dl.acm.org/doi/abs/10.5555/3586589.3586709 Gholami, A., Azad, A., Jin, P., Keutzer, K., & Buluc, A. (2018). Integrated Model, Batch, and Domain Parallelism in Training Neural Networks. SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (pp. 77-86). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3210377.3210394 Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. doi:10.48550/arXiv.1911.04610 Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 8). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC], 1-14. doi:10.48550/arXiv.1806.03377 Hazimeh, H., Zhao, Z., Aakanksha, C., Sathiamoorthy, M., Chen, Y., Mazumder, R., . . . Chi, E. H. (2024). DSelect-k: differentiable selection in the mixture of experts with applications to multi-task learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 2246, pp. 29335-29347. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3542507 He, C., Li, S., Soltanolkotabi, M., & Avestimehr, S. (2021, July). PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 4150-4159. PMLR. Retrieved from https://proceedings.mlr.press/v139/he21a.html He, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., & Tang, J. (2021, March 24). FastMoE: A Fast Mixture-of-Expert Training System. arXiv:2103.13262v1 [cs.LG], 1-11. doi:10.48550/arXiv.2103.13262 Hey, T. (2020, October 1). Opportunities and Challenges from Artificial Intelligence and Machine Learning for the Advancement of Science, Technology, and the Office of Science Missions. Technical Report, USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR), United States. doi:10.2172/1734848 Hopfield, J. J. (1988, September). Artificial neural networks. IEEE Circuits and Devices Magazine, 4(5), 3-10. doi:10.1109/101.8118 Howison, M., Bethel, E. W., & Childs, H. (2012, January). Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems. IEEE Transactions on Visualization and Computer Graphics, 18(1), 17-29. doi:10.1109/TVCG.2011.24 Hu, Y., Imes, C., Zhao, X., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2021, October 28). Pipeline Parallelism for Inference on Heterogeneous Edge Computing. arXiv:2110.14895v1 [cs.DC], 1-12. doi:10.48550/arXiv.2110.14895 Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, December 8). GPipe: efficient training of giant neural networks using pipeline parallelism. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, & E. B. Fox (Ed.), Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS'19). Article No. 10, pp. 103 - 112. Vancouver, BC, Canada: Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3454287.3454297 Hwang, C., Cui, W., Xiong, Y., Yang, Z., Liu, Z., Hu, H., . . . Xiong, Y. (2023, June 5). Tutel: Adaptive Mixture-of-Experts at Scale. arXiv:2206.03382v2 [cs.DC], 1-19. doi:10.48550/arXiv.2206.03382 Janbi, N., Katib, I., & Mehmood, R. (2023, May). Distributed artificial intelligence: Taxonomy, review, framework, and reference architecture. Intelligent Systems with Applications, 18, 200231. doi:10.1016/j.iswa.2023.200231 Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, November 3). Caffe: Convolutional Architecture for Fast Feature Embedding. MM '14: Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). Orlando, Florida, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2647868.2654889 Jia, Z., Lin, S., Qi, C. R., & Aiken, A. (2018). Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks. In J. Dy, & A. Krause (Ed.), Proceedings of the 35th International Conference on Machine Learning. 80, pp. 2274-2283. PMLR. Retrieved from https://proceedings.mlr.press/v80/jia18a.html Jiang, W., Zhang, Y., Liu, P., Peng, J., Yang, L. T., Ye, G., & Jin, H. (2020, January). Exploiting potential of deep neural networks by layer-wise fine-grained parallelism. Future Generation Computer Systems, 102, 210-221. doi:10.1016/j.future.2019.07.054 Kamruzzaman, M., Swanson, S., & Tullsen, D. M. (2013, November 17). Load-balanced pipeline parallelism. SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Article No. 14, pp. 1-12. Denver, Colorado, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2503210.2503295 Kirby, A. C., Samsi, S., Jones, M., Reuther, A., Kepner, J., & Gadepally, V. (2020, September). Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid. (2007.07336 [cs.LG]), 1-7. doi:10.1109/HPEC43674.2020.9286180 Kossmann, F., Jia, Z., & Aiken, A. (2022, August 2). Optimizing Mixture of Experts using Dynamic Recompilations. arXiv:2205.01848v2 [cs.LG] , 1-13. doi:10.48550/arXiv.2205.01848 Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. doi:10.48550/arXiv.1404.5997 Kukačka, J., Golkov, V., & Cremers, D. (2017, October 29). Regularization for Deep Learning: A Taxonomy. arXiv:1710.10686v1 [cs.LG], 1-23. doi:10.48550/arXiv.1710.10686 Li, C., Yao, Z., Wu, X., Zhang, M., Holmes, C., Li, C., & He, Y. (2024, January 14). DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. arXiv:2212.03597v3 [cs.LG], 1-19. doi:10.48550/arXiv.2212.03597 Li, J., Jiang, Y., Zhu, Y., Wang, C., & Xu, H. (2023, July). Accelerating Distributed MoE Training and Inference with Lina. 2023 USENIX Annual Technical Conference (USENIX ATC 23) (pp. 945-959). USENIX Association, Boston, MA, USA. Retrieved from https://www.usenix.org/conference/atc23/presentation/li-jiamin Li, S., & Hoefler, T. (2021, November). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476145 Li, S., Liu, H., Bian, Z., Fang, J., Huang, H., Liu, Y., . . . You, Y. (2023, August). Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing (pp. 766-775). Salt Lake City, UT, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3605573.3605613 Li, S., Mangoubi, O., Xu, L., & Guo, T. (2021). Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning. 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS) (pp. 528-538). DC, USA: IEEE. doi:10.1109/ICDCS51616.2021.00057 Li, Y., Huang, J., Li, Z., Zhou, S., Jiang, W., & Wang, J. (2023). HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning. ICPP '22: Proceedings of the 51st International Conference on Parallel Processing (pp. 1-11). Bordeaux, France: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3545008.3545024 Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022, December). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999-7019. doi:10.1109/TNNLS.2021.3084827 Li, Z., Zhuang, S., Guo, S., Zhuo, D., Zhang, H., Song, D., & Stoica, I. (2021). TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 6543-6552. PMLR. Retrieved from https://proceedings.mlr.press/v139/li21y.html Liang, P., Tang, Y., Zhang, X., Bai, Y., Su, T., Lai, Z., . . . Li, D. (2023, August). A Survey on Auto-Parallelism of Large-Scale Deep Learning Training. IEEE Transactions on Parallel and Distributed Systems, 34(8), 2377-2390. doi:10.1109/TPDS.2023.3281931 Liu, D., Chen, X., Zhou, Z., & Ling, Q. (2020, May 15). HierTrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism in Mobile-Edge-Cloud Computing. IEEE Open Journal of the Communications Society, 1, 634-645. doi:10.1109/OJCOMS.2020.2994737 Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00042 Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., & Chi, E. H. (2018, July). Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1930-1939). London, United Kingdom: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3219819.3220007 Manaswi, N. K. (2018). Understanding and Working with Keras. In N. K. Manaswi, Deep Learning with Applications Using Python: Chatbots and Face, Object, and Speech Recognition With TensorFlow and Keras (pp. 31–43). Berkeley, CA, USA: Apress. doi:10.1007/978-1-4842-3516-4 Mastoras, A., & Gross, T. R. (2018, February 24). Understanding Parallelization Tradeoffs for Linear Pipelines. In Q. Chen, Z. Huang, & P. Balaji (Ed.), PMAM'18: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (pp. 1-10). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3178442.3178443 Miao, X., Wang, Y., Jiang, Y., Shi, C., Nie, X., Zhang, H., & Cui, B. (2022, November 1). Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism. Proceedings of the VLDB Endowment, 16(3), 470-479. doi:10.14778/3570690.3570697 Mirhoseini, A., Pham, H., Le, Q. V., Steiner, B., Larsen, R., Zhou, Y., . . . Dean, J. (2017, June 25). Device Placement Optimization with Reinforcement Learning. arXiv:1706.04972v2 [cs.LG], 1-11. doi:10.48550/arXiv.1706.04972 Mittal, S., & Vaishay, S. (2019, October). A survey of techniques for optimizing deep learning on GPUs. Journal of Systems Architecture, 99, 101635. doi:10.1016/j.sysarc.2019.101635 Moreno-Alvarez, S., Haut, J. M., Paoletti, M. E., & Rico-Gallego, J. A. (2021, June 21). Heterogeneous model parallelism for deep neural networks. Neurocomputing, 441, 1-12. doi:10.1016/j.neucom.2021.01.125 Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., . . . Zaharia, M. (2019, October). PipeDream: generalized pipeline parallelism for DNN training. SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3341301.3359646 Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., Patwary, M., Korthikanti, V., . . . Zaharia, M. (2021). Efficient large-scale language model training on GPU clusters using megatron-LM. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 58, pp. 1-15. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476209 Nie, X., Miao, X., Cao, S., Ma, L., Liu, Q., Xue, J., . . . Cui, B. (2022, October 9). EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate. arXiv:2112.14397v2 [cs.LG], 1-14. doi:10.48550/arXiv.2112.14397 Oyama, Y., Maruyama, N., Dryden, N., McCarthy, E., Harrington, P., Balewski, J., . . . Van Essen, B. (2021, July 1). The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism. IEEE Transactions on Parallel and Distributed Systems, 32(7), 1641-1652. doi:10.1109/TPDS.2020.3047974 Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020, July). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. Retrieved from https://www.usenix.org/conference/atc20/presentation/park Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Presa, M. R., . . . Iyengar, S. S. (2018, September 18). A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Computing Surveys (CSUR), 51(5), 1-36, Article No. 92. doi:10.1145/3234150 Rajbhandari, S., Li, C., Yao, Z., Zhang, M., Aminabadi, R. Y., Awan, A. A., . . . He, Y. (2022, July). DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, & S. Sabato (Ed.), Proceedings of the 39th International Conference on Machine Learning. 162, pp. 18332-18346. PMLR. Retrieved from https://proceedings.mlr.press/v162/rajbhandari22a.html Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020, August). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3505 - 3506). Virtual Event, CA, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3394486.3406703 Ravanelli, M., Parcollet, T., & Bengio, Y. (2019). The Pytorch-kaldi Speech Recognition Toolkit. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465-6469). Brighton, UK: IEEE. doi:10.1109/ICASSP.2019.8683713 Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Pinto, A. S., . . . Houlsby, N. (2024, December). Scaling vision with sparse mixture of experts. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 657, pp. 8583-8595. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3540918 Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. J. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 1540, pp. 177-192. Springer, Cham. doi:10.1007/978-3-031-04209-6_13 Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. International Conference on Learning Representations (ICLR 2017), (pp. 1-19). Toulon, France. Retrieved from https://openreview.net/forum?id=B1ckMDqlg Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2020, March 13). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053v4 [cs.CL], 1-15. doi:10.48550/arXiv.1909.08053 Song, L., Mao, J., Zhuo, Y., Qian, X., Li, H., & Chen, Y. (2019). HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array. 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 56-68). Washington, DC, USA: IEEE. doi:10.1109/HPCA.2019.00027 Stevens, R., Taylor, V., Nichols, J., Maccabe, A. B., Yelick, K., & Brown, D. (2020, February 1). AI for Science: Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science. Technical Report, USDOE; Lawrence Berkeley National Laboratory (LBNL); Argonne National Laboratory (ANL); Oak Ridge National Laboratory (ORNL), United States. doi:10.2172/1604756 Subhlok, J., Stichnoth, J. M., O'Hallaron, D. O., & Gross, T. (1993, July 1). Exploiting task and data parallelism on a multicomputer. ACM SIGPLAN Notices, 28(7), 13-22. doi:10.1145/173284.155334 Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. doi:10.1109/CANDARW51189.2020.00036 Tanaka, M., Taura, K., Hanawa, T., & Torisawa, K. (2021). Automatic Graph Partitioning for Very Large-scale Deep Learning. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1004-1013). Portland, OR, USA: IEEE. doi:10.1109/IPDPS49936.2021.00109 Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ACM Computing Surveys (CSUR), 53(2), 1-33, Article No. 30. doi:10.1145/3377454 Wang, H., Imes, C., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2023). Quantpipe: Applying Adaptive Post-Training Quantization For Distributed Transformer Pipelines In Dynamic Edge Environments. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). Rhodes Island, Greece: IEEE. doi:10.1109/ICASSP49357.2023.10096632 Wang, S.-C. (2003). Artificial Neural Network. In S.-C. Wang, Interdisciplinary Computing in Java Programming (1 ed., Vol. 743, pp. 81-100). Boston, MA, USA: Springer. doi:10.1007/978-1-4615-0377-4_5 Wang, Y., Feng, B., Wang, Z., Geng, T., Barker, K., Li, A., & Ding, Y. (2023, July). MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms. 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23) (pp. 779-795). Boston, MA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi23/presentation/wang-yuke Wu, J. (2017, May 1). Introduction to Convolutional Neural Networks. Nanjing Universit, National Key Lab for Novel Software Technology, China. Retrieved from https://cs.nju.edu.cn/wujx/paper/CNN.pdf Yang, B., Zhang, J., Li , J., Ré, C., Aberger, C. R., & De Sa, C. (2021, March 15). Proceedings of the 4th Machine Learning and Systems Conference, 3, pp. 269-296. San Jose, CA, USA. Retrieved from https://proceedings.mlsys.org/paper_files/paper/2021/file/9412531719be7ccf755c4ff98d0969dc-Paper.pdf Yang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. The Tenth International Conference on Learning Representations (ICLR 2022), (pp. 1-15). Retrieved from https://openreview.net/forum?id=cw-EmNq5zfD Yoon, J., Byeon, Y., Kim, J., & Lee, H. (2022, July 15). EdgePipe: Tailoring Pipeline Parallelism With Deep Neural Networks for Volatile Wireless Edge Devices. IEEE Internet of Things Journal, 9(14), 11633 - 11647. doi:10.1109/JIOT.2021.3131407 Yuan, L., He, Q., Chen, F., Dou, R., Jin, H., & Yang, Y. (2023, April 30). PipeEdge: A Trusted Pipelining Collaborative Edge Training based on Blockchain. In Y. Ding, J. Tang, J. Sequeda, L. Aroyo, C. Castillo, & G.-J. Houben (Ed.), WWW '23: Proceedings of the ACM Web Conference 2023 (pp. 3033-3043). Austin, TX, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3543507.3583413 Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). San Francisco, CA, USA: IEEE. doi:10.1109/DAC18074.2021.9586300 Zhang, J., Niu, G., Dai, Q., Li, H., Wu, Z., Dong, F., & Wu, Z. (2023, October 28). PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters. Neurocomputing, 555, 126661. doi:10.1016/j.neucom.2023.126661 Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. doi:10.1016/j.future.2023.04.033 Zhang, S., Diao, L., Wang, S., Cao, Z., Gu, Y., Si, C., . . . Lin, W. (2023, February 16). Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform. arXiv:2302.08141v1 [cs.DC], 1-16. doi:10.48550/arXiv.2302.08141 Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2021, January 14). BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training. arXiv:2012.12544v2 [cs.DC]. doi:10.48550/arXiv.2012.12544 Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2022). BaPipe: Balanced Pipeline Parallelism for DNN Training. Parallel Processing Letters, 32(03n04), 2250005, 1-17. doi:10.1142/S0129626422500050 Zhao, S., Li, F., Chen, X., Guan, X., Jiang, J., Huang, D., . . . Cui, H. (2022, March 1). vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training. IEEE Transactions on Parallel and Distributed Systems, 33(3), 489-506. doi:10.1109/TPDS.2021.3094364 Zheng, L., Li, Z., Zhang, H., Zhuang, Y., Chen, Z., Huang, Y., . . . Stoica, I. (2022, July). Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22) (pp. 559-578). Carlsbad, CA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin Zhou, Q., Guo, S., Qu, Z., Li, P., Li, L., Guo, M., & Wang, K. (2021, May 1). Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid Synchronization. IEEE Transactions on Parallel and Distributed Systems, 32(5), 1030-1043. doi:10.1109/TPDS.2020.3040601 Zhu, X. (2023, April 28). Implement deep neuron networks on VPipe parallel system: a ResNet variant implementation. In X. Li (Ed.), Proceedings Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022). 12610, p. 126104I. Wuhan, China: International Society for Optics and Photonics, SPIE. doi:10.1117/12.2671359
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv	http://purl.org/coar/access_right/c_abf2
dc.format.mimetype.spa.fl_str_mv	application/pdf
dc.publisher.spa.fl_str_mv	Universidad Autónoma de Bucaramanga UNAB
dc.source.spa.fl_str_mv	Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 60-73
institution	Universidad Autónoma de Bucaramanga - UNAB
bitstream.url.fl_str_mv	https://repository.unab.edu.co/bitstream/20.500.12749/26661/1/Art%c3%adculo.pdf https://repository.unab.edu.co/bitstream/20.500.12749/26661/2/license.txt https://repository.unab.edu.co/bitstream/20.500.12749/26661/3/Art%c3%adculo.pdf.jpg
bitstream.checksum.fl_str_mv	2ceb8ac67b998e81e420ed5c099d82b1 855f7d18ea80f5df821f7004dff2f316 65f3f84da420bac7596ec6e98c70869b
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio Institucional \| Universidad Autónoma de Bucaramanga - UNAB
repository.mail.fl_str_mv	repositorio@unab.edu.co
_version_	1837097695508955136
spelling	Romero Sandí, Hairol41889347-9c69-46f3-a39e-cac8fb987a7cNúñez, Gabriela45e771e-ff8c-4a47-8f30-619fc2bfb589Rojas, Elvisf3118ba3-c006-4846-ba9d-c1a41452acadRomero Sandí, Hairol [0000-0002-3199-1244]Núñez, Gabriel [0000-0002-6907-533X]Rojas, Elvis [0000-0002-4238-0908]2024-09-19T22:18:06Z2024-09-19T22:18:06Z2024-06-18ISSN: 1657-2831e-ISSN: 2539-2115http://hdl.handle.net/20.500.12749/26661instname:Universidad Autónoma de Bucaramanga UNABrepourl:https://repository.unab.edu.cohttps://doi.org/10.29375/25392115.5054application/pdfspaUniversidad Autónoma de Bucaramanga UNABhttps://revistas.unab.edu.co/index.php/rcc/article/view/5054/3970https://revistas.unab.edu.co/index.php/rcc/issue/view/297Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2016, March 14). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv(1603.04467 [cs.DC]). doi:10.48550/arXiv.1603.04467Agarwal, S., Yan, C., Zhang, Z., & Venkataraman, S. (2023, October). BagPipe: Accelerating Deep Recommendation Model Training. SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles (SOSP '23)(pp. 348-363). Koblenz, Germany: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3600006.3613142Akintoye, S. B., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. doi:10.1109/ACCESS.2022.3193690Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET)(pp. 1-6). Antalya, Turkey: IEEE. doi:10.1109/ICEngTechnol.2017.8308186Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis(pp. 1-15). Dallas, TX, USA: IEEE. doi:10.1109/SC41404.2022.00051Batur Dinler, Ö., Şahin, B. C., & Abualigah, L. (2021, November 30). Comparison of Performance of Phishing Web Sites with Different DeepLearning4J Models. European Journal of Science and Technology(28), 425-431. doi:10.31590/ejosat.1004778Ben-Nun, T., & Hoefler, T. (2019, August 30). Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis. ACM Computing Surveys (CSUR), 52(4), 1-43, Article No. 65. doi:10.1145/3320060Cai, Z., Yan, X., Ma, K., Yidi, W., Huang, Y., Cheng, J., . . . Yu, F. (2022, August 1). TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism. IEEE Transactions on Parallel and Distributed Systems, 33(8), 1967-1981. doi:10.1109/TPDS.2021.3132413Camp, D., Garth, C., Childs, H., Pugmire, D., & Joy, K. (2011, November). Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1702-1713. doi:10.1109/TVCG.2010.259Chen, C.-C., Yang, C.-L., & Cheng, H.-Y. (2019, October 28). Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv:1809.02839v4 [cs.DC]. doi:10.48550/arXiv.1809.02839Chen, M. (2023, March 15). Analysis of Data Parallelism Methods with Deep Neural Network. ITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering(pp. 1857-1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3573428.3573755Chen, T., Huang, S., Xie, Y., Jiao, B., Jiang, D., Zhou, H., . . . Wei, F. (2022, June 2). Task-Specific Expert Pruning for Sparse Mixture-of-Experts. arXiv:2206.00277v2 [cs.LG], 1-13. doi:10.48550/arXiv.2206.00277Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., . . . Zhang, Z. (2015, December 3). MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv, arXiv:1512.01274v1 [cs.DC], 1-6. doi:10.48550/arXiv.1512.01274Chen, Z., Deng, Y., Wu, Y., Gu, Q., & Li, Y. (2022). Towards Understanding the Mixture-of-Experts Layer in Deep Learning. In A. H. Oh, A. Agarwal, D. Belgrave, & K. Cho (Ed.), Advances in Neural Information Precessing Systems. New Orleans, Louisiana, USA. Retrieved from https://openreview.net/forum?id=MaYzugDmQVCollobert, R., Bengio, S., & Mariéthoz, J. (2002, October 30). Torch: a modular machine learning software library. Research Report, IDIAP, Martigny, Switezerland. Retrieved from https://publications.idiap.ch/downloads/reports/2002/rr02-46.pdfDai, D., Dong, L., Ma, S., Zheng, B., Sui, Z., Chang, B., & Wei, F. (2022, May). StableMoE: Stable Routing Strategy for Mixture of Experts. In S. Muresan, P. Nakov, & A. Villavicencio (Ed.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1: Long Papers, pp. 7085–7095. Dublin, Ireland: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.489Duan, Y., Lai, Z., Li, S., Liu, W., Ge, K., Liang, P., & Li, D. (2022). HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 313-323). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00043Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021, February). DAPPLE: a pipelined data parallel approach for training large models. PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3437801.3441593Fedus, W., Zoph, B., & Shazeer, N. (2022, January 1). Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. (A. Clark, Ed.) The Journal of Machine Learning Research, 23(1), Article No. 120, 5232-5270. Retrieved from https://dl.acm.org/doi/abs/10.5555/3586589.3586709Gholami, A., Azad, A., Jin, P., Keutzer, K., & Buluc, A. (2018). Integrated Model, Batch, and Domain Parallelism in Training Neural Networks. SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (pp. 77-86). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3210377.3210394Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. doi:10.48550/arXiv.1911.04610Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 8). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC], 1-14. doi:10.48550/arXiv.1806.03377Hazimeh, H., Zhao, Z., Aakanksha, C., Sathiamoorthy, M., Chen, Y., Mazumder, R., . . . Chi, E. H. (2024). DSelect-k: differentiable selection in the mixture of experts with applications to multi-task learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 2246, pp. 29335-29347. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3542507He, C., Li, S., Soltanolkotabi, M., & Avestimehr, S. (2021, July). PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 4150-4159. PMLR. Retrieved from https://proceedings.mlr.press/v139/he21a.htmlHe, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., & Tang, J. (2021, March 24). FastMoE: A Fast Mixture-of-Expert Training System. arXiv:2103.13262v1 [cs.LG], 1-11. doi:10.48550/arXiv.2103.13262Hey, T. (2020, October 1). Opportunities and Challenges from Artificial Intelligence and Machine Learning for the Advancement of Science, Technology, and the Office of Science Missions. Technical Report, USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR), United States. doi:10.2172/1734848Hopfield, J. J. (1988, September). Artificial neural networks. IEEE Circuits and Devices Magazine, 4(5), 3-10. doi:10.1109/101.8118Howison, M., Bethel, E. W., & Childs, H. (2012, January). Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems. IEEE Transactions on Visualization and Computer Graphics, 18(1), 17-29. doi:10.1109/TVCG.2011.24Hu, Y., Imes, C., Zhao, X., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2021, October 28). Pipeline Parallelism for Inference on Heterogeneous Edge Computing. arXiv:2110.14895v1 [cs.DC], 1-12. doi:10.48550/arXiv.2110.14895Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, December 8). GPipe: efficient training of giant neural networks using pipeline parallelism. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, & E. B. Fox (Ed.), Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS'19). Article No. 10, pp. 103 - 112. Vancouver, BC, Canada: Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3454287.3454297Hwang, C., Cui, W., Xiong, Y., Yang, Z., Liu, Z., Hu, H., . . . Xiong, Y. (2023, June 5). Tutel: Adaptive Mixture-of-Experts at Scale. arXiv:2206.03382v2 [cs.DC], 1-19. doi:10.48550/arXiv.2206.03382Janbi, N., Katib, I., & Mehmood, R. (2023, May). Distributed artificial intelligence: Taxonomy, review, framework, and reference architecture. Intelligent Systems with Applications, 18, 200231. doi:10.1016/j.iswa.2023.200231Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, November 3). Caffe: Convolutional Architecture for Fast Feature Embedding. MM '14: Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). Orlando, Florida, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2647868.2654889Jia, Z., Lin, S., Qi, C. R., & Aiken, A. (2018). Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks. In J. Dy, & A. Krause (Ed.), Proceedings of the 35th International Conference on Machine Learning. 80, pp. 2274-2283. PMLR. Retrieved from https://proceedings.mlr.press/v80/jia18a.htmlJiang, W., Zhang, Y., Liu, P., Peng, J., Yang, L. T., Ye, G., & Jin, H. (2020, January). Exploiting potential of deep neural networks by layer-wise fine-grained parallelism. Future Generation Computer Systems, 102, 210-221. doi:10.1016/j.future.2019.07.054Kamruzzaman, M., Swanson, S., & Tullsen, D. M. (2013, November 17). Load-balanced pipeline parallelism. SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Article No. 14, pp. 1-12. Denver, Colorado, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/2503210.2503295Kirby, A. C., Samsi, S., Jones, M., Reuther, A., Kepner, J., & Gadepally, V. (2020, September). Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid. (2007.07336 [cs.LG]), 1-7. doi:10.1109/HPEC43674.2020.9286180Kossmann, F., Jia, Z., & Aiken, A. (2022, August 2). Optimizing Mixture of Experts using Dynamic Recompilations. arXiv:2205.01848v2 [cs.LG] , 1-13. doi:10.48550/arXiv.2205.01848Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. doi:10.48550/arXiv.1404.5997Kukačka, J., Golkov, V., & Cremers, D. (2017, October 29). Regularization for Deep Learning: A Taxonomy. arXiv:1710.10686v1 [cs.LG], 1-23. doi:10.48550/arXiv.1710.10686Li, C., Yao, Z., Wu, X., Zhang, M., Holmes, C., Li, C., & He, Y. (2024, January 14). DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. arXiv:2212.03597v3 [cs.LG], 1-19. doi:10.48550/arXiv.2212.03597Li, J., Jiang, Y., Zhu, Y., Wang, C., & Xu, H. (2023, July). Accelerating Distributed MoE Training and Inference with Lina. 2023 USENIX Annual Technical Conference (USENIX ATC 23) (pp. 945-959). USENIX Association, Boston, MA, USA. Retrieved from https://www.usenix.org/conference/atc23/presentation/li-jiaminLi, S., & Hoefler, T. (2021, November). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476145Li, S., Liu, H., Bian, Z., Fang, J., Huang, H., Liu, Y., . . . You, Y. (2023, August). Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing (pp. 766-775). Salt Lake City, UT, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3605573.3605613Li, S., Mangoubi, O., Xu, L., & Guo, T. (2021). Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning. 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS) (pp. 528-538). DC, USA: IEEE. doi:10.1109/ICDCS51616.2021.00057Li, Y., Huang, J., Li, Z., Zhou, S., Jiang, W., & Wang, J. (2023). HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning. ICPP '22: Proceedings of the 51st International Conference on Parallel Processing (pp. 1-11). Bordeaux, France: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3545008.3545024Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022, December). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999-7019. doi:10.1109/TNNLS.2021.3084827Li, Z., Zhuang, S., Guo, S., Zhuo, D., Zhang, H., Song, D., & Stoica, I. (2021). TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. In M. Meila, & T. Zhang (Ed.), Proceedings of the 38th International Conference on Machine Learning. 139, pp. 6543-6552. PMLR. Retrieved from https://proceedings.mlr.press/v139/li21y.htmlLiang, P., Tang, Y., Zhang, X., Bai, Y., Su, T., Lai, Z., . . . Li, D. (2023, August). A Survey on Auto-Parallelism of Large-Scale Deep Learning Training. IEEE Transactions on Parallel and Distributed Systems, 34(8), 2377-2390. doi:10.1109/TPDS.2023.3281931Liu, D., Chen, X., Zhou, Z., & Ling, Q. (2020, May 15). HierTrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism in Mobile-Edge-Cloud Computing. IEEE Open Journal of the Communications Society, 1, 634-645. doi:10.1109/OJCOMS.2020.2994737Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. doi:10.1109/CLUSTER51413.2022.00042Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., & Chi, E. H. (2018, July). Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1930-1939). London, United Kingdom: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3219819.3220007Manaswi, N. K. (2018). Understanding and Working with Keras. In N. K. Manaswi, Deep Learning with Applications Using Python: Chatbots and Face, Object, and Speech Recognition With TensorFlow and Keras (pp. 31–43). Berkeley, CA, USA: Apress. doi:10.1007/978-1-4842-3516-4Mastoras, A., & Gross, T. R. (2018, February 24). Understanding Parallelization Tradeoffs for Linear Pipelines. In Q. Chen, Z. Huang, & P. Balaji (Ed.), PMAM'18: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores (pp. 1-10). Vienna, Austria: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3178442.3178443Miao, X., Wang, Y., Jiang, Y., Shi, C., Nie, X., Zhang, H., & Cui, B. (2022, November 1). Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism. Proceedings of the VLDB Endowment, 16(3), 470-479. doi:10.14778/3570690.3570697Mirhoseini, A., Pham, H., Le, Q. V., Steiner, B., Larsen, R., Zhou, Y., . . . Dean, J. (2017, June 25). Device Placement Optimization with Reinforcement Learning. arXiv:1706.04972v2 [cs.LG], 1-11. doi:10.48550/arXiv.1706.04972Mittal, S., & Vaishay, S. (2019, October). A survey of techniques for optimizing deep learning on GPUs. Journal of Systems Architecture, 99, 101635. doi:10.1016/j.sysarc.2019.101635Moreno-Alvarez, S., Haut, J. M., Paoletti, M. E., & Rico-Gallego, J. A. (2021, June 21). Heterogeneous model parallelism for deep neural networks. Neurocomputing, 441, 1-12. doi:10.1016/j.neucom.2021.01.125Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., . . . Zaharia, M. (2019, October). PipeDream: generalized pipeline parallelism for DNN training. SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3341301.3359646Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., Patwary, M., Korthikanti, V., . . . Zaharia, M. (2021). Efficient large-scale language model training on GPU clusters using megatron-LM. SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 58, pp. 1-15. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3458817.3476209Nie, X., Miao, X., Cao, S., Ma, L., Liu, Q., Xue, J., . . . Cui, B. (2022, October 9). EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate. arXiv:2112.14397v2 [cs.LG], 1-14. doi:10.48550/arXiv.2112.14397Oyama, Y., Maruyama, N., Dryden, N., McCarthy, E., Harrington, P., Balewski, J., . . . Van Essen, B. (2021, July 1). The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism. IEEE Transactions on Parallel and Distributed Systems, 32(7), 1641-1652. doi:10.1109/TPDS.2020.3047974Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020, July). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. Retrieved from https://www.usenix.org/conference/atc20/presentation/parkPouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Presa, M. R., . . . Iyengar, S. S. (2018, September 18). A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Computing Surveys (CSUR), 51(5), 1-36, Article No. 92. doi:10.1145/3234150Rajbhandari, S., Li, C., Yao, Z., Zhang, M., Aminabadi, R. Y., Awan, A. A., . . . He, Y. (2022, July). DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, & S. Sabato (Ed.), Proceedings of the 39th International Conference on Machine Learning. 162, pp. 18332-18346. PMLR. Retrieved from https://proceedings.mlr.press/v162/rajbhandari22a.htmlRasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020, August). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3505 - 3506). Virtual Event, CA, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3394486.3406703Ravanelli, M., Parcollet, T., & Bengio, Y. (2019). The Pytorch-kaldi Speech Recognition Toolkit. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465-6469). Brighton, UK: IEEE. doi:10.1109/ICASSP.2019.8683713Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Pinto, A. S., . . . Houlsby, N. (2024, December). Scaling vision with sparse mixture of experts. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Ed.), NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. Article No. 657, pp. 8583-8595. Curran Associates Inc., Red Hook, NY, USA. doi:10.5555/3540261.3540918Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. J. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 1540, pp. 177-192. Springer, Cham. doi:10.1007/978-3-031-04209-6_13Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. International Conference on Learning Representations (ICLR 2017), (pp. 1-19). Toulon, France. Retrieved from https://openreview.net/forum?id=B1ckMDqlgShoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2020, March 13). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053v4 [cs.CL], 1-15. doi:10.48550/arXiv.1909.08053Song, L., Mao, J., Zhuo, Y., Qian, X., Li, H., & Chen, Y. (2019). HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array. 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 56-68). Washington, DC, USA: IEEE. doi:10.1109/HPCA.2019.00027Stevens, R., Taylor, V., Nichols, J., Maccabe, A. B., Yelick, K., & Brown, D. (2020, February 1). AI for Science: Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science. Technical Report, USDOE; Lawrence Berkeley National Laboratory (LBNL); Argonne National Laboratory (ANL); Oak Ridge National Laboratory (ORNL), United States. doi:10.2172/1604756Subhlok, J., Stichnoth, J. M., O'Hallaron, D. O., & Gross, T. (1993, July 1). Exploiting task and data parallelism on a multicomputer. ACM SIGPLAN Notices, 28(7), 13-22. doi:10.1145/173284.155334Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. doi:10.1109/CANDARW51189.2020.00036Tanaka, M., Taura, K., Hanawa, T., & Torisawa, K. (2021). Automatic Graph Partitioning for Very Large-scale Deep Learning. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1004-1013). Portland, OR, USA: IEEE. doi:10.1109/IPDPS49936.2021.00109Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ACM Computing Surveys (CSUR), 53(2), 1-33, Article No. 30. doi:10.1145/3377454Wang, H., Imes, C., Kundu, S., Beerel, P. A., Crago, S. P., & Walters, J. P. (2023). Quantpipe: Applying Adaptive Post-Training Quantization For Distributed Transformer Pipelines In Dynamic Edge Environments. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). Rhodes Island, Greece: IEEE. doi:10.1109/ICASSP49357.2023.10096632Wang, S.-C. (2003). Artificial Neural Network. In S.-C. Wang, Interdisciplinary Computing in Java Programming (1 ed., Vol. 743, pp. 81-100). Boston, MA, USA: Springer. doi:10.1007/978-1-4615-0377-4_5Wang, Y., Feng, B., Wang, Z., Geng, T., Barker, K., Li, A., & Ding, Y. (2023, July). MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms. 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23) (pp. 779-795). Boston, MA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi23/presentation/wang-yukeWu, J. (2017, May 1). Introduction to Convolutional Neural Networks. Nanjing Universit, National Key Lab for Novel Software Technology, China. Retrieved from https://cs.nju.edu.cn/wujx/paper/CNN.pdfYang, B., Zhang, J., Li , J., Ré, C., Aberger, C. R., & De Sa, C. (2021, March 15). Proceedings of the 4th Machine Learning and Systems Conference, 3, pp. 269-296. San Jose, CA, USA. Retrieved from https://proceedings.mlsys.org/paper_files/paper/2021/file/9412531719be7ccf755c4ff98d0969dc-Paper.pdfYang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. The Tenth International Conference on Learning Representations (ICLR 2022), (pp. 1-15). Retrieved from https://openreview.net/forum?id=cw-EmNq5zfDYoon, J., Byeon, Y., Kim, J., & Lee, H. (2022, July 15). EdgePipe: Tailoring Pipeline Parallelism With Deep Neural Networks for Volatile Wireless Edge Devices. IEEE Internet of Things Journal, 9(14), 11633 - 11647. doi:10.1109/JIOT.2021.3131407Yuan, L., He, Q., Chen, F., Dou, R., Jin, H., & Yang, Y. (2023, April 30). PipeEdge: A Trusted Pipelining Collaborative Edge Training based on Blockchain. In Y. Ding, J. Tang, J. Sequeda, L. Aroyo, C. Castillo, & G.-J. Houben (Ed.), WWW '23: Proceedings of the ACM Web Conference 2023 (pp. 3033-3043). Austin, TX, USA: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3543507.3583413Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). San Francisco, CA, USA: IEEE. doi:10.1109/DAC18074.2021.9586300Zhang, J., Niu, G., Dai, Q., Li, H., Wu, Z., Dong, F., & Wu, Z. (2023, October 28). PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters. Neurocomputing, 555, 126661. doi:10.1016/j.neucom.2023.126661Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. doi:10.1016/j.future.2023.04.033Zhang, S., Diao, L., Wang, S., Cao, Z., Gu, Y., Si, C., . . . Lin, W. (2023, February 16). Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform. arXiv:2302.08141v1 [cs.DC], 1-16. doi:10.48550/arXiv.2302.08141Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2021, January 14). BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training. arXiv:2012.12544v2 [cs.DC]. doi:10.48550/arXiv.2012.12544Zhao, L., Xu, R., Wang, T., Tian, T., Wang, X., Wu, W., . . . Jin, X. (2022). BaPipe: Balanced Pipeline Parallelism for DNN Training. Parallel Processing Letters, 32(03n04), 2250005, 1-17. doi:10.1142/S0129626422500050Zhao, S., Li, F., Chen, X., Guan, X., Jiang, J., Huang, D., . . . Cui, H. (2022, March 1). vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training. IEEE Transactions on Parallel and Distributed Systems, 33(3), 489-506. doi:10.1109/TPDS.2021.3094364Zheng, L., Li, Z., Zhang, H., Zhuang, Y., Chen, Z., Huang, Y., . . . Stoica, I. (2022, July). Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22) (pp. 559-578). Carlsbad, CA, USA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi22/presentation/zheng-lianminZhou, Q., Guo, S., Qu, Z., Li, P., Li, L., Guo, M., & Wang, K. (2021, May 1). Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid Synchronization. IEEE Transactions on Parallel and Distributed Systems, 32(5), 1030-1043. doi:10.1109/TPDS.2020.3040601Zhu, X. (2023, April 28). Implement deep neuron networks on VPipe parallel system: a ResNet variant implementation. In X. Li (Ed.), Proceedings Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022). 12610, p. 126104I. Wuhan, China: International Society for Optics and Photonics, SPIE. doi:10.1117/12.2671359Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 60-73A Snapshot of Parallelism in Distributed Deep Learning Traininginfo:eu-repo/semantics/articleArtículohttp://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a85Deep learningParallelismArtificial neural networksThe accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view. The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one.http://purl.org/coar/access_right/c_abf2ORIGINALArtículo.pdfArtículo.pdfArtículoapplication/pdf535402https://repository.unab.edu.co/bitstream/20.500.12749/26661/1/Art%c3%adculo.pdf2ceb8ac67b998e81e420ed5c099d82b1MD51open accessLICENSElicense.txtlicense.txttext/plain; charset=utf-8347https://repository.unab.edu.co/bitstream/20.500.12749/26661/2/license.txt855f7d18ea80f5df821f7004dff2f316MD52open accessTHUMBNAILArtículo.pdf.jpgArtículo.pdf.jpgIM Thumbnailimage/jpeg9833https://repository.unab.edu.co/bitstream/20.500.12749/26661/3/Art%c3%adculo.pdf.jpg65f3f84da420bac7596ec6e98c70869bMD53open access20.500.12749/26661oai:repository.unab.edu.co:20.500.12749/266612024-09-19 22:02:09.89open accessRepositorio Institucional \| Universidad Autónoma de Bucaramanga - UNABrepositorio@unab.edu.coTGEgUmV2aXN0YSBDb2xvbWJpYW5hIGRlIENvbXB1dGFjacOzbiBlcyBmaW5hbmNpYWRhIHBvciBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgQnVjYXJhbWFuZ2EuIEVzdGEgUmV2aXN0YSBubyBjb2JyYSB0YXNhIGRlIHN1bWlzacOzbiB5IHB1YmxpY2FjacOzbiBkZSBhcnTDrWN1bG9zLiBQcm92ZWUgYWNjZXNvIGxpYnJlIGlubWVkaWF0byBhIHN1IGNvbnRlbmlkbyBiYWpvIGVsIHByaW5jaXBpbyBkZSBxdWUgaGFjZXIgZGlzcG9uaWJsZSBncmF0dWl0YW1lbnRlIGludmVzdGlnYWNpw7NuIGFsIHDDumJsaWNvIGFwb3lhIGEgdW4gbWF5b3IgaW50ZXJjYW1iaW8gZGUgY29ub2NpbWllbnRvIGdsb2JhbC4=

A Snapshot of Parallelism in Distributed Deep Learning Training

Publicaciones similares