A Study of Pipeline Parallelism in Deep Neural Networks

The current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the b...

Full description

Autores:: Núñez, Gabriel
Romero Sandí, Hairol
Rojas, Elvis
Meneses, Esteban

Tipo de recurso:: Article of investigation

Fecha de publicación:: 2024

Institución:: Universidad Autónoma de Bucaramanga - UNAB

Repositorio:: Repositorio UNAB

Idioma:: spa

id	UNAB2_ee356aff759402c605137b5910bcc2b2
oai_identifier_str	oai:repository.unab.edu.co:20.500.12749/26659
network_acronym_str	UNAB2
network_name_str	Repositorio UNAB
repository_id_str
dc.title.eng.fl_str_mv	A Study of Pipeline Parallelism in Deep Neural Networks
title	A Study of Pipeline Parallelism in Deep Neural Networks
spellingShingle	A Study of Pipeline Parallelism in Deep Neural Networks Deep learning Parallelism Artificial neural networks Distributed training
title_short	A Study of Pipeline Parallelism in Deep Neural Networks
title_full	A Study of Pipeline Parallelism in Deep Neural Networks
title_fullStr	A Study of Pipeline Parallelism in Deep Neural Networks
title_full_unstemmed	A Study of Pipeline Parallelism in Deep Neural Networks
title_sort	A Study of Pipeline Parallelism in Deep Neural Networks
dc.creator.fl_str_mv	Núñez, Gabriel Romero Sandí, Hairol Rojas, Elvis Meneses, Esteban
dc.contributor.author.none.fl_str_mv	Núñez, Gabriel Romero Sandí, Hairol Rojas, Elvis Meneses, Esteban
dc.contributor.orcid.spa.fl_str_mv	Núñez, Gabriel [0000-0002-6907-533X] Romero Sandí, Hairol [0000-0002-3199-1244] Rojas, Elvis [0000-0002-4238-0908] Meneses, Esteban [0000-0002-4307-6000]
dc.subject.keywords.eng.fl_str_mv	Deep learning Parallelism Artificial neural networks Distributed training
topic	Deep learning Parallelism Artificial neural networks Distributed training
description	The current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the basis of current developments in artificial intelligence. These neural networks can be composed of billions of parameters and their training is not feasible without the application of approaches based on parallelism. This paper focuses on studying pipeline parallelism, which is one of the most important types of parallelism used to train neural network models in deep learning. In this study we offer a look at the most important concepts related to the topic and we present a detailed analysis of 3 pipeline parallelism libraries: Torchgpipe, FairScale, and DeepSpeed. We analyze important aspects of these libraries such as their implementation and features. In addition, we evaluated them experimentally, carrying out parallel trainings and taking into account aspects such as the number of stages in the training pipeline and the type of balance.
publishDate	2024
dc.date.accessioned.none.fl_str_mv	2024-09-19T21:46:23Z
dc.date.available.none.fl_str_mv	2024-09-19T21:46:23Z
dc.date.issued.none.fl_str_mv	2024-06-18
dc.type.coarversion.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.none.fl_str_mv	info:eu-repo/semantics/article
dc.type.local.spa.fl_str_mv	Artículo
dc.type.coar.none.fl_str_mv	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.redcol.none.fl_str_mv	http://purl.org/redcol/resource_type/ART
format	http://purl.org/coar/resource_type/c_2df8fbb1
dc.identifier.issn.spa.fl_str_mv	ISSN: 1657-2831 e-ISSN: 2539-2115
dc.identifier.uri.none.fl_str_mv	http://hdl.handle.net/20.500.12749/26659
dc.identifier.instname.spa.fl_str_mv	instname:Universidad Autónoma de Bucaramanga UNAB
dc.identifier.repourl.spa.fl_str_mv	repourl:https://repository.unab.edu.co
dc.identifier.doi.none.fl_str_mv	https://doi.org/10.29375/25392115.5056
identifier_str_mv	ISSN: 1657-2831 e-ISSN: 2539-2115 instname:Universidad Autónoma de Bucaramanga UNAB repourl:https://repository.unab.edu.co
url	http://hdl.handle.net/20.500.12749/26659 https://doi.org/10.29375/25392115.5056
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.spa.fl_str_mv	https://revistas.unab.edu.co/index.php/rcc/article/view/5056/3969
dc.relation.uri.spa.fl_str_mv	https://revistas.unab.edu.co/index.php/rcc/issue/view/297
dc.relation.references.none.fl_str_mv	Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., . . . Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. e Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). November 2–4 (pp. 264-283). Savannah, GA, USA: USENIX Association. https://doi.org/10.48550/arXiv.1605.08695 Akintoye, S., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. https://doi.org/10.1109/ACCESS.2022.3193690 Alshamrani, R., & Ma, X. (2022). Deep Learning. In C. L. McNeely, & L. A. Schintler (Eds.), Encyclopedia of Big Data (pp. 373-377). Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-32010-6_5 Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). Dallas, TX, USA: IEEE. https://doi.org/10.1109/SC41404.2022.00051 Chatelain, A., Djeghri, A., Hesslow, D., & Launay, J. (2022). Is the Number of Trainable Parameters All That Actually Matters? In M. F. Pradier, A. Schein, S. Hyland, F. J. Ruiz, & J. Z. Forde (Ed.), Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops. 163, pp. 27-32. PMLR. https://proceedings.mlr.press/v163/chatelain22a.html Chen, M. (2023). Analysis of Data Parallelism Methods with Deep Neural Network. EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, October 21 - 23 (pp. 1857 - 1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3573428.3573755 Chen, Z., Xu, C., Qian, W., & Zhou, A. (2023). Elastic Averaging for Efficient Pipelined DNN Training. Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP´23 (pp. 380-391). Montreal, QC, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3572848.3577484 Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project Adam: Building an Efficient and Scalable Deep Learning Training System. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI´14). October 6–8 (pp. 570-582). Broomfield, CO: USENIX Association. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q. V., . . . Ng, A. Y. (2012). Large Scale Distributed Deep Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Ed.), Advances in Neural Information Processing Systems (NIPS 2012). 25, pp. 1223-1231. Curran Associates. https://proceedings.neurips.cc/paper_files/paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdf Deep Learning. (2020). In A. Tatnall (Ed.), Encyclopedia of Education and Information Technologies (First ed., p. 558). Springer Cham. https://doi.org/10.1007/978-3-030-10576-1_300164 Deeplearning4j: Deeplearning4j Suite Overview. (2023, July). https://www.deepspeed.ai/ DeepSpeed authors: Deepspeed (overview and features). (2023, July). (Microsoft) https://www.deepspeed.ai/ FairScale authors. (2021). Fairscale: A general purpose modular pytorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021). DAPPLE: a pipelined data parallel approach for training large models. Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3437801.3441593 Farkas, A., Kertész, G., & Lovas, R. (2020). Parallel and Distributed Training of Deep Neural Networks: A brief overview. 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES) (pp. 165-170). Reykjavík, Iceland: IEEE. https://doi.org/10.1109/INES49302.2020.9147123 Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. https://doi.org/10.48550/arXiv.1911.04610 Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 18). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC]. https://doi.org/10.48550/arXiv.1806.03377 Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, July 25). GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv:1811.06965v5 [cs.CV], 1-11. https://doi.org/10.48550/arXiv.1811.06965 Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, June 20). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093v1 [cs.CV], 1-4. Keras: Keras api references. (2023, July). https://keras.io/api/ Kim, C., Lee, H., Jeong, M., Baek, W., Yoon, B., Kim, I., . . . Kim, S. (2020, April 21). torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models. arXiv:2004.09910v1 [cs.DC], 1-10. https://doi.org/10.48550/arXiv.2004.09910 Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. https://doi.org/10.48550/arXiv.1404.5997 Li, S., & Hoefler, T. (2021). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3458817.3476145 Liang, G., & Alsmadi, I. (2022, February 12). Benchmark Assessment for DeepSpeed Optimization Library. arXiv:2202.12831v1 [cs.LG], 1-8. https://doi.org/10.48550/arXiv.2202.12831 Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00042 Luo, Z., Yi, X., Long, G., Fan, S., Wu, C., Yang, J., & Lin, W. (2022). Efficient Pipeline Planning for Expedited Distributed DNN Training. IEEE INFOCOM 2022 - IEEE Conference on Computer Communications (pp. 340-349). IEEE. https://doi.org/INFOCOM48880.2022.9796787 Mofrad, M. H., Melhem, R., Ahmad, Y., & Hammoud, M. (2020). Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms. 2020 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-7). Waltham, MA, USA: IEEE. https://doi.org/10.1109/HPEC43674.2020.9286195 MXNet: Mxnet api docs. (2023, July). https://mxnet.apache.org/versions/1.9.1 Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Gang, G. R., . . . Zaharia, M. (2019). PipeDream: generalized pipeline parallelism for DNN training. (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341301.3359646 Padua, D. (2011). Pipelining. In D. Padua (Ed.), Encyclopedia of Parallel Computing (pp. 1562–1563). Boston, MA, USA: Springer. https://doi.org/10.1007/978-0-387-09766-4_335 Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. https://www.usenix.org/conference/atc20/presentation/park PlaidML: Plaidml api docs. (2023, July). https://github.com/plaidml/plaidml Pytorch: Pytorch documentation. (2023, July). https://pytorch.org/ Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020, May 13). ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054v3 [cs.LG], 1-24. https://doi.org/10.48550/arXiv.1910.02054 Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event. July 6 - 10. CA, USA: Association for Computing Machinery. https://doi.org/10.1145/3394486.3406703 Rojas, E., Pérez, D., Calhoun, J. C., Bautista Gomez, L., Jones, T., & Meneses, E. (2021). Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration. 2021 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 492-503). Portland, OR, USA: IEEE. https://doi.org/10.1109/Cluster48925.2021.00045 Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 8th Latin American Conference, CARLA 2021, October 6–8, 2021, Revised Selected Papers. 1540, pp. 177-192. Guadalajara, Mexico: Springer, Cham. https://doi.org/10.1007/978-3-031-04209-6_13 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., . . . Fei-Fei, L. (2015, January 30). ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575v3 [cs.CV]. https://doi.org/10.48550/arXiv.1409.0575 Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. https://doi.org/10.1109/CANDARW51189.2020.00036 TensorFlow: Overview. (2023, July). https://www.tensorflow.org/ Yang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. International Conference on Learning Representations. https://openreview.net/forum?id=cw-EmNq5zfD Yildirim, E., Arslan, E., Kim, J., & Kosar, T. (2016). Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency. IEEE Transactions on Cloud Computing, 4(1), 63 - 75. https://doi.org/10.1109/TCC.2015.2415804 Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). Francisco, CA, USA: IEEE. https://doi.org/10.1109/DAC18074.2021.9586300 Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. https://doi.org/10.1016/j.future.2023.04.033
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv	http://purl.org/coar/access_right/c_abf2
dc.format.mimetype.spa.fl_str_mv	application/pdf
dc.publisher.spa.fl_str_mv	Universidad Autónoma de Bucaramanga UNAB
dc.source.spa.fl_str_mv	Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 48-59
institution	Universidad Autónoma de Bucaramanga - UNAB
bitstream.url.fl_str_mv	https://repository.unab.edu.co/bitstream/20.500.12749/26659/1/Art%c3%adculo.pdf https://repository.unab.edu.co/bitstream/20.500.12749/26659/2/license.txt https://repository.unab.edu.co/bitstream/20.500.12749/26659/3/Art%c3%adculo.pdf.jpg
bitstream.checksum.fl_str_mv	311600f7d85e89b47f78a563a27ac609 855f7d18ea80f5df821f7004dff2f316 7a4fd4d21dc3293f1bab2cffedaeaefe
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio Institucional \| Universidad Autónoma de Bucaramanga - UNAB
repository.mail.fl_str_mv	repositorio@unab.edu.co
_version_	1837097591666376704
spelling	Núñez, Gabriela45e771e-ff8c-4a47-8f30-619fc2bfb589Romero Sandí, Hairol41889347-9c69-46f3-a39e-cac8fb987a7cRojas, Elvisf3118ba3-c006-4846-ba9d-c1a41452acadMeneses, Esteban4a20e5ac-8885-4884-8394-47d2c557f95fNúñez, Gabriel [0000-0002-6907-533X]Romero Sandí, Hairol [0000-0002-3199-1244]Rojas, Elvis [0000-0002-4238-0908]Meneses, Esteban [0000-0002-4307-6000]2024-09-19T21:46:23Z2024-09-19T21:46:23Z2024-06-18ISSN: 1657-2831e-ISSN: 2539-2115http://hdl.handle.net/20.500.12749/26659instname:Universidad Autónoma de Bucaramanga UNABrepourl:https://repository.unab.edu.cohttps://doi.org/10.29375/25392115.5056application/pdfspaUniversidad Autónoma de Bucaramanga UNABhttps://revistas.unab.edu.co/index.php/rcc/article/view/5056/3969https://revistas.unab.edu.co/index.php/rcc/issue/view/297Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., . . . Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. e Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). November 2–4 (pp. 264-283). Savannah, GA, USA: USENIX Association. https://doi.org/10.48550/arXiv.1605.08695Akintoye, S., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. https://doi.org/10.1109/ACCESS.2022.3193690Alshamrani, R., & Ma, X. (2022). Deep Learning. In C. L. McNeely, & L. A. Schintler (Eds.), Encyclopedia of Big Data (pp. 373-377). Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-32010-6_5Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). Dallas, TX, USA: IEEE. https://doi.org/10.1109/SC41404.2022.00051Chatelain, A., Djeghri, A., Hesslow, D., & Launay, J. (2022). Is the Number of Trainable Parameters All That Actually Matters? In M. F. Pradier, A. Schein, S. Hyland, F. J. Ruiz, & J. Z. Forde (Ed.), Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops. 163, pp. 27-32. PMLR. https://proceedings.mlr.press/v163/chatelain22a.htmlChen, M. (2023). Analysis of Data Parallelism Methods with Deep Neural Network. EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, October 21 - 23 (pp. 1857 - 1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3573428.3573755Chen, Z., Xu, C., Qian, W., & Zhou, A. (2023). Elastic Averaging for Efficient Pipelined DNN Training. Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP´23 (pp. 380-391). Montreal, QC, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3572848.3577484Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project Adam: Building an Efficient and Scalable Deep Learning Training System. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI´14). October 6–8 (pp. 570-582). Broomfield, CO: USENIX Association. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdfDean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q. V., . . . Ng, A. Y. (2012). Large Scale Distributed Deep Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Ed.), Advances in Neural Information Processing Systems (NIPS 2012). 25, pp. 1223-1231. Curran Associates. https://proceedings.neurips.cc/paper_files/paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdfDeep Learning. (2020). In A. Tatnall (Ed.), Encyclopedia of Education and Information Technologies (First ed., p. 558). Springer Cham. https://doi.org/10.1007/978-3-030-10576-1_300164Deeplearning4j: Deeplearning4j Suite Overview. (2023, July). https://www.deepspeed.ai/DeepSpeed authors: Deepspeed (overview and features). (2023, July). (Microsoft) https://www.deepspeed.ai/FairScale authors. (2021). Fairscale: A general purpose modular pytorch library for high performance and large scale training. https://github.com/facebookresearch/fairscaleFan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021). DAPPLE: a pipelined data parallel approach for training large models. Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3437801.3441593Farkas, A., Kertész, G., & Lovas, R. (2020). Parallel and Distributed Training of Deep Neural Networks: A brief overview. 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES) (pp. 165-170). Reykjavík, Iceland: IEEE. https://doi.org/10.1109/INES49302.2020.9147123Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. https://doi.org/10.48550/arXiv.1911.04610Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 18). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC]. https://doi.org/10.48550/arXiv.1806.03377Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, July 25). GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv:1811.06965v5 [cs.CV], 1-11. https://doi.org/10.48550/arXiv.1811.06965Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, June 20). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093v1 [cs.CV], 1-4.Keras: Keras api references. (2023, July). https://keras.io/api/Kim, C., Lee, H., Jeong, M., Baek, W., Yoon, B., Kim, I., . . . Kim, S. (2020, April 21). torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models. arXiv:2004.09910v1 [cs.DC], 1-10. https://doi.org/10.48550/arXiv.2004.09910Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. https://doi.org/10.48550/arXiv.1404.5997Li, S., & Hoefler, T. (2021). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3458817.3476145Liang, G., & Alsmadi, I. (2022, February 12). Benchmark Assessment for DeepSpeed Optimization Library. arXiv:2202.12831v1 [cs.LG], 1-8. https://doi.org/10.48550/arXiv.2202.12831Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00042Luo, Z., Yi, X., Long, G., Fan, S., Wu, C., Yang, J., & Lin, W. (2022). Efficient Pipeline Planning for Expedited Distributed DNN Training. IEEE INFOCOM 2022 - IEEE Conference on Computer Communications (pp. 340-349). IEEE. https://doi.org/INFOCOM48880.2022.9796787Mofrad, M. H., Melhem, R., Ahmad, Y., & Hammoud, M. (2020). Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms. 2020 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-7). Waltham, MA, USA: IEEE. https://doi.org/10.1109/HPEC43674.2020.9286195MXNet: Mxnet api docs. (2023, July). https://mxnet.apache.org/versions/1.9.1Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Gang, G. R., . . . Zaharia, M. (2019). PipeDream: generalized pipeline parallelism for DNN training. (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341301.3359646Padua, D. (2011). Pipelining. In D. Padua (Ed.), Encyclopedia of Parallel Computing (pp. 1562–1563). Boston, MA, USA: Springer. https://doi.org/10.1007/978-0-387-09766-4_335Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. https://www.usenix.org/conference/atc20/presentation/parkPlaidML: Plaidml api docs. (2023, July). https://github.com/plaidml/plaidmlPytorch: Pytorch documentation. (2023, July). https://pytorch.org/Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020, May 13). ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054v3 [cs.LG], 1-24. https://doi.org/10.48550/arXiv.1910.02054Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event. July 6 - 10. CA, USA: Association for Computing Machinery. https://doi.org/10.1145/3394486.3406703Rojas, E., Pérez, D., Calhoun, J. C., Bautista Gomez, L., Jones, T., & Meneses, E. (2021). Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration. 2021 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 492-503). Portland, OR, USA: IEEE. https://doi.org/10.1109/Cluster48925.2021.00045Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 8th Latin American Conference, CARLA 2021, October 6–8, 2021, Revised Selected Papers. 1540, pp. 177-192. Guadalajara, Mexico: Springer, Cham. https://doi.org/10.1007/978-3-031-04209-6_13Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., . . . Fei-Fei, L. (2015, January 30). ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575v3 [cs.CV]. https://doi.org/10.48550/arXiv.1409.0575Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. https://doi.org/10.1109/CANDARW51189.2020.00036TensorFlow: Overview. (2023, July). https://www.tensorflow.org/Yang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. International Conference on Learning Representations. https://openreview.net/forum?id=cw-EmNq5zfDYildirim, E., Arslan, E., Kim, J., & Kosar, T. (2016). Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency. IEEE Transactions on Cloud Computing, 4(1), 63 - 75. https://doi.org/10.1109/TCC.2015.2415804Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). Francisco, CA, USA: IEEE. https://doi.org/10.1109/DAC18074.2021.9586300Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. https://doi.org/10.1016/j.future.2023.04.033Vol. 25 Núm. 1 (2024): Revista Colombiana de Computación (Enero-Junio); 48-59A Study of Pipeline Parallelism in Deep Neural Networksinfo:eu-repo/semantics/articleArtículohttp://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a85Deep learningParallelismArtificial neural networksDistributed trainingThe current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the basis of current developments in artificial intelligence. These neural networks can be composed of billions of parameters and their training is not feasible without the application of approaches based on parallelism. This paper focuses on studying pipeline parallelism, which is one of the most important types of parallelism used to train neural network models in deep learning. In this study we offer a look at the most important concepts related to the topic and we present a detailed analysis of 3 pipeline parallelism libraries: Torchgpipe, FairScale, and DeepSpeed. We analyze important aspects of these libraries such as their implementation and features. In addition, we evaluated them experimentally, carrying out parallel trainings and taking into account aspects such as the number of stages in the training pipeline and the type of balance.http://purl.org/coar/access_right/c_abf2ORIGINALArtículo.pdfArtículo.pdfArtículoapplication/pdf708357https://repository.unab.edu.co/bitstream/20.500.12749/26659/1/Art%c3%adculo.pdf311600f7d85e89b47f78a563a27ac609MD51open accessLICENSElicense.txtlicense.txttext/plain; charset=utf-8347https://repository.unab.edu.co/bitstream/20.500.12749/26659/2/license.txt855f7d18ea80f5df821f7004dff2f316MD52open accessTHUMBNAILArtículo.pdf.jpgArtículo.pdf.jpgIM Thumbnailimage/jpeg9841https://repository.unab.edu.co/bitstream/20.500.12749/26659/3/Art%c3%adculo.pdf.jpg7a4fd4d21dc3293f1bab2cffedaeaefeMD53open access20.500.12749/26659oai:repository.unab.edu.co:20.500.12749/266592024-09-19 22:01:17.572open accessRepositorio Institucional \| Universidad Autónoma de Bucaramanga - UNABrepositorio@unab.edu.coTGEgUmV2aXN0YSBDb2xvbWJpYW5hIGRlIENvbXB1dGFjacOzbiBlcyBmaW5hbmNpYWRhIHBvciBsYSBVbml2ZXJzaWRhZCBBdXTDs25vbWEgZGUgQnVjYXJhbWFuZ2EuIEVzdGEgUmV2aXN0YSBubyBjb2JyYSB0YXNhIGRlIHN1bWlzacOzbiB5IHB1YmxpY2FjacOzbiBkZSBhcnTDrWN1bG9zLiBQcm92ZWUgYWNjZXNvIGxpYnJlIGlubWVkaWF0byBhIHN1IGNvbnRlbmlkbyBiYWpvIGVsIHByaW5jaXBpbyBkZSBxdWUgaGFjZXIgZGlzcG9uaWJsZSBncmF0dWl0YW1lbnRlIGludmVzdGlnYWNpw7NuIGFsIHDDumJsaWNvIGFwb3lhIGEgdW4gbWF5b3IgaW50ZXJjYW1iaW8gZGUgY29ub2NpbWllbnRvIGdsb2JhbC4=

A Study of Pipeline Parallelism in Deep Neural Networks

Publicaciones similares