Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents

The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.

Autores:
Bayona Latorre, Andrés Leonardo
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2023
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/68811
Acceso en línea:
http://hdl.handle.net/1992/68811
Palabra clave:
RL
MARL
Unity ML Agents
PPO
SAC
Ingeniería
Rights
openAccess
License
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
id UNIANDES2_755b0503293e4faf8eeb02b9310f28f1
oai_identifier_str oai:repositorio.uniandes.edu.co:1992/68811
network_acronym_str UNIANDES2
network_name_str Séneca: repositorio Uniandes
repository_id_str
dc.title.none.fl_str_mv Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
spellingShingle Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
RL
MARL
Unity ML Agents
PPO
SAC
Ingeniería
title_short Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_full Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_fullStr Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_full_unstemmed Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
title_sort Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
dc.creator.fl_str_mv Bayona Latorre, Andrés Leonardo
dc.contributor.advisor.none.fl_str_mv Takahashi Rodríguez, Silvia
dc.contributor.author.none.fl_str_mv Bayona Latorre, Andrés Leonardo
dc.contributor.jury.none.fl_str_mv Takahashi Rodríguez, Silvia
dc.subject.keyword.none.fl_str_mv RL
MARL
Unity ML Agents
PPO
SAC
topic RL
MARL
Unity ML Agents
PPO
SAC
Ingeniería
dc.subject.themes.es_CO.fl_str_mv Ingeniería
description The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.
publishDate 2023
dc.date.accessioned.none.fl_str_mv 2023-07-27T13:39:23Z
dc.date.available.none.fl_str_mv 2023-07-27T13:39:23Z
dc.date.issued.none.fl_str_mv 2023-07-25
dc.type.es_CO.fl_str_mv Trabajo de grado - Pregrado
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/bachelorThesis
dc.type.version.none.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_7a1f
dc.type.content.es_CO.fl_str_mv Text
dc.type.redcol.none.fl_str_mv http://purl.org/redcol/resource_type/TP
format http://purl.org/coar/resource_type/c_7a1f
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/1992/68811
dc.identifier.instname.es_CO.fl_str_mv instname:Universidad de los Andes
dc.identifier.reponame.es_CO.fl_str_mv reponame:Repositorio Institucional Séneca
dc.identifier.repourl.es_CO.fl_str_mv repourl:https://repositorio.uniandes.edu.co/
url http://hdl.handle.net/1992/68811
identifier_str_mv instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.es_CO.fl_str_mv eng
language eng
dc.relation.references.es_CO.fl_str_mv Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, 41-48. https://doi.org/10.1145/1553374.1553380
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156-172. https://doi.org/10.1109/TSMCC.2007.913919
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (arXiv:1801.01290). arXiv. http://arxiv.org/abs/1801.01290
Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents (arXiv:1809.02627). arXiv. http://arxiv.org/abs/1809.02627
Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Muller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3), 247-278. https://doi.org/10.1109/JPROC.2021.3060483
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms (arXiv:1707.06347). arXiv. http://arxiv.org/abs/1707.06347
Wong, A., Bäck, T., Kononova, A. V., & Plaat, A. (2023). Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 56(6), 5023-5056. https://doi.org/10.1007/s10462-022-10299-x
U. T. (2022, December 14). ml-agents/Training-Configuration-File.md at develop · Unity-Technologies/ml-agents. GitHub. https://github.com/Unity-Technologies/ml-agents
Neumann, C, Duboscq, J, Dubuc, C, Ginting, A, Irwan, AM, Agil, M, Widdig, A and Engelhardt, A (2011). Assessing dominance hierarchies: validation and advantages of progressive evaluation with Elo-rating. Animal Behaviour, 82 (4). pp. 911-921. ISSN 0003-3472
ABL. (2023, May 30). PPOvsSAC [Video]. YouTube. https://www.youtube.com/watch?v=ZtdtpRmoFSE
ABL. (2023a, May 30). PPOvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=N-aRvKfYnpI
ABL. (2023c, May 30). SACvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=744kTLEubK0
dc.rights.license.spa.fl_str_mv Attribution-NonCommercial-NoDerivatives 4.0 Internacional
dc.rights.uri.*.fl_str_mv https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Attribution-NonCommercial-NoDerivatives 4.0 Internacional
https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.es_CO.fl_str_mv 32 páginas
dc.format.mimetype.es_CO.fl_str_mv application/pdf
dc.publisher.es_CO.fl_str_mv Universidad de los Andes
dc.publisher.program.es_CO.fl_str_mv Ingeniería de Sistemas y Computación
dc.publisher.faculty.es_CO.fl_str_mv Facultad de Ingeniería
dc.publisher.department.es_CO.fl_str_mv Departamento de Ingeniería Sistemas y Computación
institution Universidad de los Andes
bitstream.url.fl_str_mv https://repositorio.uniandes.edu.co/bitstreams/6f52bbe9-4145-4c23-80d2-d4ad02d4e167/download
https://repositorio.uniandes.edu.co/bitstreams/5bed9e62-c7ac-4ca6-9c6a-d7f9a7fad853/download
https://repositorio.uniandes.edu.co/bitstreams/cadff679-f3f3-43fa-a543-d6313c0a4932/download
https://repositorio.uniandes.edu.co/bitstreams/8dc46e46-d21e-4a9a-af8f-2d95aeb9af9e/download
https://repositorio.uniandes.edu.co/bitstreams/a41e72fc-98bf-465c-9359-efef18b27044/download
https://repositorio.uniandes.edu.co/bitstreams/4c924d1c-04fb-4464-8f10-4cbcd0b4a765/download
https://repositorio.uniandes.edu.co/bitstreams/aa155513-7759-4100-8af1-c6b38cf0d58a/download
bitstream.checksum.fl_str_mv 71ca133b9b8e69f3c444e28dd4b7c03c
cc9e77cd3a5e2d42e5bb622fcb880b2e
a6a3c410d5de894f4fceeba305acd22a
2e366d81617bb4e8863c2b3efeda7d61
5aa5c691a1ffe97abd12c2966efcb8d6
e5bae4911a4d5ba7ff99fbf4765019eb
8a27b410b22e89cdaf3d5a02d712c78f
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional Séneca
repository.mail.fl_str_mv adminrepositorio@uniandes.edu.co
_version_ 1812134055305543680
spelling Attribution-NonCommercial-NoDerivatives 4.0 Internacionalhttps://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfinfo:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Takahashi Rodríguez, Silviavirtual::16082-1Bayona Latorre, Andrés Leonardobaf941b5-879b-4006-9da5-d53c602eee74600Takahashi Rodríguez, Silvia2023-07-27T13:39:23Z2023-07-27T13:39:23Z2023-07-25http://hdl.handle.net/1992/68811instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.This document presents a comparative study of the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms in the context of Multi-Agent Reinforcement Learning (MARL) using the Unity ML-Agents framework. The objective is to investigate the performance and adaptability of these algorithms in dynamic environments. A collaborative-competitive multi-agent problem is formulated in the context of a food-gathering task. The proposed solution includes a dynamic environment generator and reward-shaping training techniques. The results showcase the effectiveness of SAC and PPO in learning complex behaviors and strategies in the objective MARL task. Using dynamic environments and reward shaping enables the agents to exhibit intelligent and adaptive behaviors. This study highlights the potential of MARL algorithms in addressing real-world challenges and their suitability for training agents in dynamic environments with the Unity ML-Agents framework.Ingeniero de Sistemas y ComputaciónPregrado32 páginasapplication/pdfengUniversidad de los AndesIngeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería Sistemas y ComputaciónComparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agentsTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPRLMARLUnity ML AgentsPPOSACIngenieríaBengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, 41-48. https://doi.org/10.1145/1553374.1553380Busoniu, L., Babuska, R., & De Schutter, B. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156-172. https://doi.org/10.1109/TSMCC.2007.913919Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (arXiv:1801.01290). arXiv. http://arxiv.org/abs/1801.01290Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents (arXiv:1809.02627). arXiv. http://arxiv.org/abs/1809.02627Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Muller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3), 247-278. https://doi.org/10.1109/JPROC.2021.3060483Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms (arXiv:1707.06347). arXiv. http://arxiv.org/abs/1707.06347Wong, A., Bäck, T., Kononova, A. V., & Plaat, A. (2023). Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 56(6), 5023-5056. https://doi.org/10.1007/s10462-022-10299-xU. T. (2022, December 14). ml-agents/Training-Configuration-File.md at develop · Unity-Technologies/ml-agents. GitHub. https://github.com/Unity-Technologies/ml-agentsNeumann, C, Duboscq, J, Dubuc, C, Ginting, A, Irwan, AM, Agil, M, Widdig, A and Engelhardt, A (2011). Assessing dominance hierarchies: validation and advantages of progressive evaluation with Elo-rating. Animal Behaviour, 82 (4). pp. 911-921. ISSN 0003-3472ABL. (2023, May 30). PPOvsSAC [Video]. YouTube. https://www.youtube.com/watch?v=ZtdtpRmoFSEABL. (2023a, May 30). PPOvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=N-aRvKfYnpIABL. (2023c, May 30). SACvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=744kTLEubK0201820692Publicationhttps://scholar.google.es/citations?user=x7gjZ04AAAAJvirtual::16082-10000-0001-7971-8979virtual::16082-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000143898virtual::16082-17ab9a4e1-60f0-4e06-936b-39f2bf93d8a0virtual::16082-17ab9a4e1-60f0-4e06-936b-39f2bf93d8a0virtual::16082-1TEXTTrabajoDeGrado_201820692_Andres_Bayona.pdf.txtTrabajoDeGrado_201820692_Andres_Bayona.pdf.txtExtracted texttext/plain65769https://repositorio.uniandes.edu.co/bitstreams/6f52bbe9-4145-4c23-80d2-d4ad02d4e167/download71ca133b9b8e69f3c444e28dd4b7c03cMD54autorizacion tesis firmada.pdf.txtautorizacion tesis firmada.pdf.txtExtracted texttext/plain1229https://repositorio.uniandes.edu.co/bitstreams/5bed9e62-c7ac-4ca6-9c6a-d7f9a7fad853/downloadcc9e77cd3a5e2d42e5bb622fcb880b2eMD56ORIGINALTrabajoDeGrado_201820692_Andres_Bayona.pdfTrabajoDeGrado_201820692_Andres_Bayona.pdfTrabajo de gradoapplication/pdf966675https://repositorio.uniandes.edu.co/bitstreams/cadff679-f3f3-43fa-a543-d6313c0a4932/downloada6a3c410d5de894f4fceeba305acd22aMD52autorizacion tesis firmada.pdfautorizacion tesis firmada.pdfHIDEapplication/pdf325583https://repositorio.uniandes.edu.co/bitstreams/8dc46e46-d21e-4a9a-af8f-2d95aeb9af9e/download2e366d81617bb4e8863c2b3efeda7d61MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81810https://repositorio.uniandes.edu.co/bitstreams/a41e72fc-98bf-465c-9359-efef18b27044/download5aa5c691a1ffe97abd12c2966efcb8d6MD51THUMBNAILTrabajoDeGrado_201820692_Andres_Bayona.pdf.jpgTrabajoDeGrado_201820692_Andres_Bayona.pdf.jpgIM Thumbnailimage/jpeg12782https://repositorio.uniandes.edu.co/bitstreams/4c924d1c-04fb-4464-8f10-4cbcd0b4a765/downloade5bae4911a4d5ba7ff99fbf4765019ebMD55autorizacion tesis firmada.pdf.jpgautorizacion tesis firmada.pdf.jpgIM Thumbnailimage/jpeg17197https://repositorio.uniandes.edu.co/bitstreams/aa155513-7759-4100-8af1-c6b38cf0d58a/download8a27b410b22e89cdaf3d5a02d712c78fMD571992/68811oai:repositorio.uniandes.edu.co:1992/688112024-03-13 15:37:48.504https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coWW8sIGVuIG1pIGNhbGlkYWQgZGUgYXV0b3IgZGVsIHRyYWJham8gZGUgdGVzaXMsIG1vbm9ncmFmw61hIG8gdHJhYmFqbyBkZSBncmFkbywgaGFnbyBlbnRyZWdhIGRlbCBlamVtcGxhciByZXNwZWN0aXZvIHkgZGUgc3VzIGFuZXhvcyBkZSBzZXIgZWwgY2FzbywgZW4gZm9ybWF0byBkaWdpdGFsIHkvbyBlbGVjdHLDs25pY28geSBhdXRvcml6byBhIGxhIFVuaXZlcnNpZGFkIGRlIGxvcyBBbmRlcyBwYXJhIHF1ZSByZWFsaWNlIGxhIHB1YmxpY2FjacOzbiBlbiBlbCBTaXN0ZW1hIGRlIEJpYmxpb3RlY2FzIG8gZW4gY3VhbHF1aWVyIG90cm8gc2lzdGVtYSBvIGJhc2UgZGUgZGF0b3MgcHJvcGlvIG8gYWplbm8gYSBsYSBVbml2ZXJzaWRhZCB5IHBhcmEgcXVlIGVuIGxvcyB0w6lybWlub3MgZXN0YWJsZWNpZG9zIGVuIGxhIExleSAyMyBkZSAxOTgyLCBMZXkgNDQgZGUgMTk5MywgRGVjaXNpw7NuIEFuZGluYSAzNTEgZGUgMTk5MywgRGVjcmV0byA0NjAgZGUgMTk5NSB5IGRlbcOhcyBub3JtYXMgZ2VuZXJhbGVzIHNvYnJlIGxhIG1hdGVyaWEsIHV0aWxpY2UgZW4gdG9kYXMgc3VzIGZvcm1hcywgbG9zIGRlcmVjaG9zIHBhdHJpbW9uaWFsZXMgZGUgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgdHJhbnNmb3JtYWNpw7NuIHkgZGlzdHJpYnVjacOzbiAoYWxxdWlsZXIsIHByw6lzdGFtbyBww7pibGljbyBlIGltcG9ydGFjacOzbikgcXVlIG1lIGNvcnJlc3BvbmRlbiBjb21vIGNyZWFkb3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50by4gIAoKCkxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgZW1pdGUgZW4gY2FsaWRhZCBkZSBhdXRvciBkZSBsYSBvYnJhIG9iamV0byBkZWwgcHJlc2VudGUgZG9jdW1lbnRvIHkgbm8gY29ycmVzcG9uZGUgYSBjZXNpw7NuIGRlIGRlcmVjaG9zLCBzaW5vIGEgbGEgYXV0b3JpemFjacOzbiBkZSB1c28gYWNhZMOpbWljbyBkZSBjb25mb3JtaWRhZCBjb24gbG8gYW50ZXJpb3JtZW50ZSBzZcOxYWxhZG8uIExhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgaGFjZSBleHRlbnNpdmEgbm8gc29sbyBhIGxhcyBmYWN1bHRhZGVzIHkgZGVyZWNob3MgZGUgdXNvIHNvYnJlIGxhIG9icmEgZW4gZm9ybWF0byBvIHNvcG9ydGUgbWF0ZXJpYWwsIHNpbm8gdGFtYmnDqW4gcGFyYSBmb3JtYXRvIGVsZWN0csOzbmljbywgeSBlbiBnZW5lcmFsIHBhcmEgY3VhbHF1aWVyIGZvcm1hdG8gY29ub2NpZG8gbyBwb3IgY29ub2Nlci4gCgoKRWwgYXV0b3IsIG1hbmlmaWVzdGEgcXVlIGxhIG9icmEgb2JqZXRvIGRlIGxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gZXMgb3JpZ2luYWwgeSBsYSByZWFsaXrDsyBzaW4gdmlvbGFyIG8gdXN1cnBhciBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcywgcG9yIGxvIHRhbnRvLCBsYSBvYnJhIGVzIGRlIHN1IGV4Y2x1c2l2YSBhdXRvcsOtYSB5IHRpZW5lIGxhIHRpdHVsYXJpZGFkIHNvYnJlIGxhIG1pc21hLiAKCgpFbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLiAKCg==