Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents

The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.

Autores:: Bayona Latorre, Andrés Leonardo

Tipo de recurso:: Trabajo de grado de pregrado

Fecha de publicación:: 2023

Institución:: Universidad de los Andes

Repositorio:: Séneca: repositorio Uniandes

Idioma:: eng

id	UNIANDES2_755b0503293e4faf8eeb02b9310f28f1
oai_identifier_str	oai:repositorio.uniandes.edu.co:1992/68811
network_acronym_str	UNIANDES2
network_name_str	Séneca: repositorio Uniandes
repository_id_str
dc.title.none.fl_str_mv	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
spellingShingle	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents RL MARL Unity ML Agents PPO SAC Ingeniería
title_short	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_full	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_fullStr	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_full_unstemmed	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_sort	Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
dc.creator.fl_str_mv	Bayona Latorre, Andrés Leonardo
dc.contributor.advisor.none.fl_str_mv	Takahashi Rodríguez, Silvia
dc.contributor.author.none.fl_str_mv	Bayona Latorre, Andrés Leonardo
dc.contributor.jury.none.fl_str_mv	Takahashi Rodríguez, Silvia
dc.subject.keyword.none.fl_str_mv	RL MARL Unity ML Agents PPO SAC
topic	RL MARL Unity ML Agents PPO SAC Ingeniería
dc.subject.themes.es_CO.fl_str_mv	Ingeniería
description	The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.
publishDate	2023
dc.date.accessioned.none.fl_str_mv	2023-07-27T13:39:23Z
dc.date.available.none.fl_str_mv	2023-07-27T13:39:23Z
dc.date.issued.none.fl_str_mv	2023-07-25
dc.type.es_CO.fl_str_mv	Trabajo de grado - Pregrado
dc.type.driver.none.fl_str_mv	info:eu-repo/semantics/bachelorThesis
dc.type.version.none.fl_str_mv	info:eu-repo/semantics/acceptedVersion
dc.type.coar.none.fl_str_mv	http://purl.org/coar/resource_type/c_7a1f
dc.type.content.es_CO.fl_str_mv	Text
dc.type.redcol.none.fl_str_mv	http://purl.org/redcol/resource_type/TP
format	http://purl.org/coar/resource_type/c_7a1f
status_str	acceptedVersion
dc.identifier.uri.none.fl_str_mv	http://hdl.handle.net/1992/68811
dc.identifier.instname.es_CO.fl_str_mv	instname:Universidad de los Andes
dc.identifier.reponame.es_CO.fl_str_mv	reponame:Repositorio Institucional Séneca
dc.identifier.repourl.es_CO.fl_str_mv	repourl:https://repositorio.uniandes.edu.co/
url	http://hdl.handle.net/1992/68811
identifier_str_mv	instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.es_CO.fl_str_mv	eng
language	eng
dc.relation.references.es_CO.fl_str_mv	Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, 41-48. https://doi.org/10.1145/1553374.1553380 Busoniu, L., Babuska, R., & De Schutter, B. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156-172. https://doi.org/10.1109/TSMCC.2007.913919 Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (arXiv:1801.01290). arXiv. http://arxiv.org/abs/1801.01290 Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents (arXiv:1809.02627). arXiv. http://arxiv.org/abs/1809.02627 Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Muller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3), 247-278. https://doi.org/10.1109/JPROC.2021.3060483 Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms (arXiv:1707.06347). arXiv. http://arxiv.org/abs/1707.06347 Wong, A., Bäck, T., Kononova, A. V., & Plaat, A. (2023). Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 56(6), 5023-5056. https://doi.org/10.1007/s10462-022-10299-x U. T. (2022, December 14). ml-agents/Training-Configuration-File.md at develop · Unity-Technologies/ml-agents. GitHub. https://github.com/Unity-Technologies/ml-agents Neumann, C, Duboscq, J, Dubuc, C, Ginting, A, Irwan, AM, Agil, M, Widdig, A and Engelhardt, A (2011). Assessing dominance hierarchies: validation and advantages of progressive evaluation with Elo-rating. Animal Behaviour, 82 (4). pp. 911-921. ISSN 0003-3472 ABL. (2023, May 30). PPOvsSAC [Video]. YouTube. https://www.youtube.com/watch?v=ZtdtpRmoFSE ABL. (2023a, May 30). PPOvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=N-aRvKfYnpI ABL. (2023c, May 30). SACvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=744kTLEubK0
dc.rights.license.spa.fl_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 Internacional
dc.rights.uri.*.fl_str_mv	https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv	http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 Internacional https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.extent.es_CO.fl_str_mv	32 páginas
dc.format.mimetype.es_CO.fl_str_mv	application/pdf
dc.publisher.es_CO.fl_str_mv	Universidad de los Andes
dc.publisher.program.es_CO.fl_str_mv	Ingeniería de Sistemas y Computación
dc.publisher.faculty.es_CO.fl_str_mv	Facultad de Ingeniería
dc.publisher.department.es_CO.fl_str_mv	Departamento de Ingeniería Sistemas y Computación
institution	Universidad de los Andes
bitstream.url.fl_str_mv	https://repositorio.uniandes.edu.co/bitstreams/6f52bbe9-4145-4c23-80d2-d4ad02d4e167/download https://repositorio.uniandes.edu.co/bitstreams/5bed9e62-c7ac-4ca6-9c6a-d7f9a7fad853/download https://repositorio.uniandes.edu.co/bitstreams/cadff679-f3f3-43fa-a543-d6313c0a4932/download https://repositorio.uniandes.edu.co/bitstreams/8dc46e46-d21e-4a9a-af8f-2d95aeb9af9e/download https://repositorio.uniandes.edu.co/bitstreams/a41e72fc-98bf-465c-9359-efef18b27044/download https://repositorio.uniandes.edu.co/bitstreams/4c924d1c-04fb-4464-8f10-4cbcd0b4a765/download https://repositorio.uniandes.edu.co/bitstreams/aa155513-7759-4100-8af1-c6b38cf0d58a/download
bitstream.checksum.fl_str_mv	71ca133b9b8e69f3c444e28dd4b7c03c cc9e77cd3a5e2d42e5bb622fcb880b2e a6a3c410d5de894f4fceeba305acd22a 2e366d81617bb4e8863c2b3efeda7d61 5aa5c691a1ffe97abd12c2966efcb8d6 e5bae4911a4d5ba7ff99fbf4765019eb 8a27b410b22e89cdaf3d5a02d712c78f
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio institucional Séneca
repository.mail.fl_str_mv	adminrepositorio@uniandes.edu.co
_version_	1837005426407768064
spelling	Attribution-NonCommercial-NoDerivatives 4.0 Internacionalhttps://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfinfo:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Takahashi Rodríguez, Silviavirtual::16082-1Bayona Latorre, Andrés Leonardobaf941b5-879b-4006-9da5-d53c602eee74600Takahashi Rodríguez, Silvia2023-07-27T13:39:23Z2023-07-27T13:39:23Z2023-07-25http://hdl.handle.net/1992/68811instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.This document presents a comparative study of the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms in the context of Multi-Agent Reinforcement Learning (MARL) using the Unity ML-Agents framework. The objective is to investigate the performance and adaptability of these algorithms in dynamic environments. A collaborative-competitive multi-agent problem is formulated in the context of a food-gathering task. The proposed solution includes a dynamic environment generator and reward-shaping training techniques. The results showcase the effectiveness of SAC and PPO in learning complex behaviors and strategies in the objective MARL task. Using dynamic environments and reward shaping enables the agents to exhibit intelligent and adaptive behaviors. This study highlights the potential of MARL algorithms in addressing real-world challenges and their suitability for training agents in dynamic environments with the Unity ML-Agents framework.Ingeniero de Sistemas y ComputaciónPregrado32 páginasapplication/pdfengUniversidad de los AndesIngeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería Sistemas y ComputaciónComparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agentsTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPRLMARLUnity ML AgentsPPOSACIngenieríaBengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, 41-48. https://doi.org/10.1145/1553374.1553380Busoniu, L., Babuska, R., & De Schutter, B. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156-172. https://doi.org/10.1109/TSMCC.2007.913919Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (arXiv:1801.01290). arXiv. http://arxiv.org/abs/1801.01290Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents (arXiv:1809.02627). arXiv. http://arxiv.org/abs/1809.02627Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Muller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3), 247-278. https://doi.org/10.1109/JPROC.2021.3060483Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms (arXiv:1707.06347). arXiv. http://arxiv.org/abs/1707.06347Wong, A., Bäck, T., Kononova, A. V., & Plaat, A. (2023). Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 56(6), 5023-5056. https://doi.org/10.1007/s10462-022-10299-xU. T. (2022, December 14). ml-agents/Training-Configuration-File.md at develop · Unity-Technologies/ml-agents. GitHub. https://github.com/Unity-Technologies/ml-agentsNeumann, C, Duboscq, J, Dubuc, C, Ginting, A, Irwan, AM, Agil, M, Widdig, A and Engelhardt, A (2011). Assessing dominance hierarchies: validation and advantages of progressive evaluation with Elo-rating. Animal Behaviour, 82 (4). pp. 911-921. ISSN 0003-3472ABL. (2023, May 30). PPOvsSAC [Video]. YouTube. https://www.youtube.com/watch?v=ZtdtpRmoFSEABL. (2023a, May 30). PPOvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=N-aRvKfYnpIABL. (2023c, May 30). SACvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=744kTLEubK0201820692Publicationhttps://scholar.google.es/citations?user=x7gjZ04AAAAJvirtual::16082-10000-0001-7971-8979virtual::16082-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000143898virtual::16082-17ab9a4e1-60f0-4e06-936b-39f2bf93d8a0virtual::16082-17ab9a4e1-60f0-4e06-936b-39f2bf93d8a0virtual::16082-1TEXTTrabajoDeGrado_201820692_Andres_Bayona.pdf.txtTrabajoDeGrado_201820692_Andres_Bayona.pdf.txtExtracted texttext/plain65769https://repositorio.uniandes.edu.co/bitstreams/6f52bbe9-4145-4c23-80d2-d4ad02d4e167/download71ca133b9b8e69f3c444e28dd4b7c03cMD54autorizacion tesis firmada.pdf.txtautorizacion tesis firmada.pdf.txtExtracted texttext/plain1229https://repositorio.uniandes.edu.co/bitstreams/5bed9e62-c7ac-4ca6-9c6a-d7f9a7fad853/downloadcc9e77cd3a5e2d42e5bb622fcb880b2eMD56ORIGINALTrabajoDeGrado_201820692_Andres_Bayona.pdfTrabajoDeGrado_201820692_Andres_Bayona.pdfTrabajo de gradoapplication/pdf966675https://repositorio.uniandes.edu.co/bitstreams/cadff679-f3f3-43fa-a543-d6313c0a4932/downloada6a3c410d5de894f4fceeba305acd22aMD52autorizacion tesis firmada.pdfautorizacion tesis firmada.pdfHIDEapplication/pdf325583https://repositorio.uniandes.edu.co/bitstreams/8dc46e46-d21e-4a9a-af8f-2d95aeb9af9e/download2e366d81617bb4e8863c2b3efeda7d61MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81810https://repositorio.uniandes.edu.co/bitstreams/a41e72fc-98bf-465c-9359-efef18b27044/download5aa5c691a1ffe97abd12c2966efcb8d6MD51THUMBNAILTrabajoDeGrado_201820692_Andres_Bayona.pdf.jpgTrabajoDeGrado_201820692_Andres_Bayona.pdf.jpgIM Thumbnailimage/jpeg12782https://repositorio.uniandes.edu.co/bitstreams/4c924d1c-04fb-4464-8f10-4cbcd0b4a765/downloade5bae4911a4d5ba7ff99fbf4765019ebMD55autorizacion tesis firmada.pdf.jpgautorizacion tesis firmada.pdf.jpgIM Thumbnailimage/jpeg17197https://repositorio.uniandes.edu.co/bitstreams/aa155513-7759-4100-8af1-c6b38cf0d58a/download8a27b410b22e89cdaf3d5a02d712c78fMD571992/68811oai:repositorio.uniandes.edu.co:1992/688112024-03-13 15:37:48.504https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coWW8sIGVuIG1pIGNhbGlkYWQgZGUgYXV0b3IgZGVsIHRyYWJham8gZGUgdGVzaXMsIG1vbm9ncmFmw61hIG8gdHJhYmFqbyBkZSBncmFkbywgaGFnbyBlbnRyZWdhIGRlbCBlamVtcGxhciByZXNwZWN0aXZvIHkgZGUgc3VzIGFuZXhvcyBkZSBzZXIgZWwgY2FzbywgZW4gZm9ybWF0byBkaWdpdGFsIHkvbyBlbGVjdHLDs25pY28geSBhdXRvcml6byBhIGxhIFVuaXZlcnNpZGFkIGRlIGxvcyBBbmRlcyBwYXJhIHF1ZSByZWFsaWNlIGxhIHB1YmxpY2FjacOzbiBlbiBlbCBTaXN0ZW1hIGRlIEJpYmxpb3RlY2FzIG8gZW4gY3VhbHF1aWVyIG90cm8gc2lzdGVtYSBvIGJhc2UgZGUgZGF0b3MgcHJvcGlvIG8gYWplbm8gYSBsYSBVbml2ZXJzaWRhZCB5IHBhcmEgcXVlIGVuIGxvcyB0w6lybWlub3MgZXN0YWJsZWNpZG9zIGVuIGxhIExleSAyMyBkZSAxOTgyLCBMZXkgNDQgZGUgMTk5MywgRGVjaXNpw7NuIEFuZGluYSAzNTEgZGUgMTk5MywgRGVjcmV0byA0NjAgZGUgMTk5NSB5IGRlbcOhcyBub3JtYXMgZ2VuZXJhbGVzIHNvYnJlIGxhIG1hdGVyaWEsIHV0aWxpY2UgZW4gdG9kYXMgc3VzIGZvcm1hcywgbG9zIGRlcmVjaG9zIHBhdHJpbW9uaWFsZXMgZGUgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgdHJhbnNmb3JtYWNpw7NuIHkgZGlzdHJpYnVjacOzbiAoYWxxdWlsZXIsIHByw6lzdGFtbyBww7pibGljbyBlIGltcG9ydGFjacOzbikgcXVlIG1lIGNvcnJlc3BvbmRlbiBjb21vIGNyZWFkb3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50by4gIAoKCkxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgZW1pdGUgZW4gY2FsaWRhZCBkZSBhdXRvciBkZSBsYSBvYnJhIG9iamV0byBkZWwgcHJlc2VudGUgZG9jdW1lbnRvIHkgbm8gY29ycmVzcG9uZGUgYSBjZXNpw7NuIGRlIGRlcmVjaG9zLCBzaW5vIGEgbGEgYXV0b3JpemFjacOzbiBkZSB1c28gYWNhZMOpbWljbyBkZSBjb25mb3JtaWRhZCBjb24gbG8gYW50ZXJpb3JtZW50ZSBzZcOxYWxhZG8uIExhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgaGFjZSBleHRlbnNpdmEgbm8gc29sbyBhIGxhcyBmYWN1bHRhZGVzIHkgZGVyZWNob3MgZGUgdXNvIHNvYnJlIGxhIG9icmEgZW4gZm9ybWF0byBvIHNvcG9ydGUgbWF0ZXJpYWwsIHNpbm8gdGFtYmnDqW4gcGFyYSBmb3JtYXRvIGVsZWN0csOzbmljbywgeSBlbiBnZW5lcmFsIHBhcmEgY3VhbHF1aWVyIGZvcm1hdG8gY29ub2NpZG8gbyBwb3IgY29ub2Nlci4gCgoKRWwgYXV0b3IsIG1hbmlmaWVzdGEgcXVlIGxhIG9icmEgb2JqZXRvIGRlIGxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gZXMgb3JpZ2luYWwgeSBsYSByZWFsaXrDsyBzaW4gdmlvbGFyIG8gdXN1cnBhciBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcywgcG9yIGxvIHRhbnRvLCBsYSBvYnJhIGVzIGRlIHN1IGV4Y2x1c2l2YSBhdXRvcsOtYSB5IHRpZW5lIGxhIHRpdHVsYXJpZGFkIHNvYnJlIGxhIG1pc21hLiAKCgpFbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLiAKCg==

Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents

Publicaciones similares