Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents
The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.
- Autores:
-
Bayona Latorre, Andrés Leonardo
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2023
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/68811
- Acceso en línea:
- http://hdl.handle.net/1992/68811
- Palabra clave:
- RL
MARL
Unity ML Agents
PPO
SAC
Ingeniería
- Rights
- openAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 Internacional
id |
UNIANDES2_755b0503293e4faf8eeb02b9310f28f1 |
---|---|
oai_identifier_str |
oai:repositorio.uniandes.edu.co:1992/68811 |
network_acronym_str |
UNIANDES2 |
network_name_str |
Séneca: repositorio Uniandes |
repository_id_str |
|
dc.title.none.fl_str_mv |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents |
title |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents |
spellingShingle |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents RL MARL Unity ML Agents PPO SAC Ingeniería |
title_short |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents |
title_full |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents |
title_fullStr |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents |
title_full_unstemmed |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents |
title_sort |
Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents |
dc.creator.fl_str_mv |
Bayona Latorre, Andrés Leonardo |
dc.contributor.advisor.none.fl_str_mv |
Takahashi Rodríguez, Silvia |
dc.contributor.author.none.fl_str_mv |
Bayona Latorre, Andrés Leonardo |
dc.contributor.jury.none.fl_str_mv |
Takahashi Rodríguez, Silvia |
dc.subject.keyword.none.fl_str_mv |
RL MARL Unity ML Agents PPO SAC |
topic |
RL MARL Unity ML Agents PPO SAC Ingeniería |
dc.subject.themes.es_CO.fl_str_mv |
Ingeniería |
description |
The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license. |
publishDate |
2023 |
dc.date.accessioned.none.fl_str_mv |
2023-07-27T13:39:23Z |
dc.date.available.none.fl_str_mv |
2023-07-27T13:39:23Z |
dc.date.issued.none.fl_str_mv |
2023-07-25 |
dc.type.es_CO.fl_str_mv |
Trabajo de grado - Pregrado |
dc.type.driver.none.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
dc.type.version.none.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.coar.none.fl_str_mv |
http://purl.org/coar/resource_type/c_7a1f |
dc.type.content.es_CO.fl_str_mv |
Text |
dc.type.redcol.none.fl_str_mv |
http://purl.org/redcol/resource_type/TP |
format |
http://purl.org/coar/resource_type/c_7a1f |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/1992/68811 |
dc.identifier.instname.es_CO.fl_str_mv |
instname:Universidad de los Andes |
dc.identifier.reponame.es_CO.fl_str_mv |
reponame:Repositorio Institucional Séneca |
dc.identifier.repourl.es_CO.fl_str_mv |
repourl:https://repositorio.uniandes.edu.co/ |
url |
http://hdl.handle.net/1992/68811 |
identifier_str_mv |
instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/ |
dc.language.iso.es_CO.fl_str_mv |
eng |
language |
eng |
dc.relation.references.es_CO.fl_str_mv |
Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, 41-48. https://doi.org/10.1145/1553374.1553380 Busoniu, L., Babuska, R., & De Schutter, B. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156-172. https://doi.org/10.1109/TSMCC.2007.913919 Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (arXiv:1801.01290). arXiv. http://arxiv.org/abs/1801.01290 Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents (arXiv:1809.02627). arXiv. http://arxiv.org/abs/1809.02627 Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Muller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3), 247-278. https://doi.org/10.1109/JPROC.2021.3060483 Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms (arXiv:1707.06347). arXiv. http://arxiv.org/abs/1707.06347 Wong, A., Bäck, T., Kononova, A. V., & Plaat, A. (2023). Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 56(6), 5023-5056. https://doi.org/10.1007/s10462-022-10299-x U. T. (2022, December 14). ml-agents/Training-Configuration-File.md at develop · Unity-Technologies/ml-agents. GitHub. https://github.com/Unity-Technologies/ml-agents Neumann, C, Duboscq, J, Dubuc, C, Ginting, A, Irwan, AM, Agil, M, Widdig, A and Engelhardt, A (2011). Assessing dominance hierarchies: validation and advantages of progressive evaluation with Elo-rating. Animal Behaviour, 82 (4). pp. 911-921. ISSN 0003-3472 ABL. (2023, May 30). PPOvsSAC [Video]. YouTube. https://www.youtube.com/watch?v=ZtdtpRmoFSE ABL. (2023a, May 30). PPOvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=N-aRvKfYnpI ABL. (2023c, May 30). SACvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=744kTLEubK0 |
dc.rights.license.spa.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 Internacional |
dc.rights.uri.*.fl_str_mv |
https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 Internacional https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.es_CO.fl_str_mv |
32 páginas |
dc.format.mimetype.es_CO.fl_str_mv |
application/pdf |
dc.publisher.es_CO.fl_str_mv |
Universidad de los Andes |
dc.publisher.program.es_CO.fl_str_mv |
Ingeniería de Sistemas y Computación |
dc.publisher.faculty.es_CO.fl_str_mv |
Facultad de Ingeniería |
dc.publisher.department.es_CO.fl_str_mv |
Departamento de Ingeniería Sistemas y Computación |
institution |
Universidad de los Andes |
bitstream.url.fl_str_mv |
https://repositorio.uniandes.edu.co/bitstreams/6f52bbe9-4145-4c23-80d2-d4ad02d4e167/download https://repositorio.uniandes.edu.co/bitstreams/5bed9e62-c7ac-4ca6-9c6a-d7f9a7fad853/download https://repositorio.uniandes.edu.co/bitstreams/cadff679-f3f3-43fa-a543-d6313c0a4932/download https://repositorio.uniandes.edu.co/bitstreams/8dc46e46-d21e-4a9a-af8f-2d95aeb9af9e/download https://repositorio.uniandes.edu.co/bitstreams/a41e72fc-98bf-465c-9359-efef18b27044/download https://repositorio.uniandes.edu.co/bitstreams/4c924d1c-04fb-4464-8f10-4cbcd0b4a765/download https://repositorio.uniandes.edu.co/bitstreams/aa155513-7759-4100-8af1-c6b38cf0d58a/download |
bitstream.checksum.fl_str_mv |
71ca133b9b8e69f3c444e28dd4b7c03c cc9e77cd3a5e2d42e5bb622fcb880b2e a6a3c410d5de894f4fceeba305acd22a 2e366d81617bb4e8863c2b3efeda7d61 5aa5c691a1ffe97abd12c2966efcb8d6 e5bae4911a4d5ba7ff99fbf4765019eb 8a27b410b22e89cdaf3d5a02d712c78f |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio institucional Séneca |
repository.mail.fl_str_mv |
adminrepositorio@uniandes.edu.co |
_version_ |
1812134055305543680 |
spelling |
Attribution-NonCommercial-NoDerivatives 4.0 Internacionalhttps://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfinfo:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Takahashi Rodríguez, Silviavirtual::16082-1Bayona Latorre, Andrés Leonardobaf941b5-879b-4006-9da5-d53c602eee74600Takahashi Rodríguez, Silvia2023-07-27T13:39:23Z2023-07-27T13:39:23Z2023-07-25http://hdl.handle.net/1992/68811instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/The Unity ML-Agents toolkit which the current project is based upon is open source with an Apache 2.0 license.This document presents a comparative study of the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms in the context of Multi-Agent Reinforcement Learning (MARL) using the Unity ML-Agents framework. The objective is to investigate the performance and adaptability of these algorithms in dynamic environments. A collaborative-competitive multi-agent problem is formulated in the context of a food-gathering task. The proposed solution includes a dynamic environment generator and reward-shaping training techniques. The results showcase the effectiveness of SAC and PPO in learning complex behaviors and strategies in the objective MARL task. Using dynamic environments and reward shaping enables the agents to exhibit intelligent and adaptive behaviors. This study highlights the potential of MARL algorithms in addressing real-world challenges and their suitability for training agents in dynamic environments with the Unity ML-Agents framework.Ingeniero de Sistemas y ComputaciónPregrado32 páginasapplication/pdfengUniversidad de los AndesIngeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería Sistemas y ComputaciónComparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agentsTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPRLMARLUnity ML AgentsPPOSACIngenieríaBengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, 41-48. https://doi.org/10.1145/1553374.1553380Busoniu, L., Babuska, R., & De Schutter, B. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156-172. https://doi.org/10.1109/TSMCC.2007.913919Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (arXiv:1801.01290). arXiv. http://arxiv.org/abs/1801.01290Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents (arXiv:1809.02627). arXiv. http://arxiv.org/abs/1809.02627Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Muller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3), 247-278. https://doi.org/10.1109/JPROC.2021.3060483Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms (arXiv:1707.06347). arXiv. http://arxiv.org/abs/1707.06347Wong, A., Bäck, T., Kononova, A. V., & Plaat, A. (2023). Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 56(6), 5023-5056. https://doi.org/10.1007/s10462-022-10299-xU. T. (2022, December 14). ml-agents/Training-Configuration-File.md at develop · Unity-Technologies/ml-agents. GitHub. https://github.com/Unity-Technologies/ml-agentsNeumann, C, Duboscq, J, Dubuc, C, Ginting, A, Irwan, AM, Agil, M, Widdig, A and Engelhardt, A (2011). Assessing dominance hierarchies: validation and advantages of progressive evaluation with Elo-rating. Animal Behaviour, 82 (4). pp. 911-921. ISSN 0003-3472ABL. (2023, May 30). PPOvsSAC [Video]. YouTube. https://www.youtube.com/watch?v=ZtdtpRmoFSEABL. (2023a, May 30). PPOvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=N-aRvKfYnpIABL. (2023c, May 30). SACvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=744kTLEubK0201820692Publicationhttps://scholar.google.es/citations?user=x7gjZ04AAAAJvirtual::16082-10000-0001-7971-8979virtual::16082-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000143898virtual::16082-17ab9a4e1-60f0-4e06-936b-39f2bf93d8a0virtual::16082-17ab9a4e1-60f0-4e06-936b-39f2bf93d8a0virtual::16082-1TEXTTrabajoDeGrado_201820692_Andres_Bayona.pdf.txtTrabajoDeGrado_201820692_Andres_Bayona.pdf.txtExtracted texttext/plain65769https://repositorio.uniandes.edu.co/bitstreams/6f52bbe9-4145-4c23-80d2-d4ad02d4e167/download71ca133b9b8e69f3c444e28dd4b7c03cMD54autorizacion tesis firmada.pdf.txtautorizacion tesis firmada.pdf.txtExtracted texttext/plain1229https://repositorio.uniandes.edu.co/bitstreams/5bed9e62-c7ac-4ca6-9c6a-d7f9a7fad853/downloadcc9e77cd3a5e2d42e5bb622fcb880b2eMD56ORIGINALTrabajoDeGrado_201820692_Andres_Bayona.pdfTrabajoDeGrado_201820692_Andres_Bayona.pdfTrabajo de gradoapplication/pdf966675https://repositorio.uniandes.edu.co/bitstreams/cadff679-f3f3-43fa-a543-d6313c0a4932/downloada6a3c410d5de894f4fceeba305acd22aMD52autorizacion tesis firmada.pdfautorizacion tesis firmada.pdfHIDEapplication/pdf325583https://repositorio.uniandes.edu.co/bitstreams/8dc46e46-d21e-4a9a-af8f-2d95aeb9af9e/download2e366d81617bb4e8863c2b3efeda7d61MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81810https://repositorio.uniandes.edu.co/bitstreams/a41e72fc-98bf-465c-9359-efef18b27044/download5aa5c691a1ffe97abd12c2966efcb8d6MD51THUMBNAILTrabajoDeGrado_201820692_Andres_Bayona.pdf.jpgTrabajoDeGrado_201820692_Andres_Bayona.pdf.jpgIM Thumbnailimage/jpeg12782https://repositorio.uniandes.edu.co/bitstreams/4c924d1c-04fb-4464-8f10-4cbcd0b4a765/downloade5bae4911a4d5ba7ff99fbf4765019ebMD55autorizacion tesis firmada.pdf.jpgautorizacion tesis firmada.pdf.jpgIM Thumbnailimage/jpeg17197https://repositorio.uniandes.edu.co/bitstreams/aa155513-7759-4100-8af1-c6b38cf0d58a/download8a27b410b22e89cdaf3d5a02d712c78fMD571992/68811oai:repositorio.uniandes.edu.co:1992/688112024-03-13 15:37:48.504https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coWW8sIGVuIG1pIGNhbGlkYWQgZGUgYXV0b3IgZGVsIHRyYWJham8gZGUgdGVzaXMsIG1vbm9ncmFmw61hIG8gdHJhYmFqbyBkZSBncmFkbywgaGFnbyBlbnRyZWdhIGRlbCBlamVtcGxhciByZXNwZWN0aXZvIHkgZGUgc3VzIGFuZXhvcyBkZSBzZXIgZWwgY2FzbywgZW4gZm9ybWF0byBkaWdpdGFsIHkvbyBlbGVjdHLDs25pY28geSBhdXRvcml6byBhIGxhIFVuaXZlcnNpZGFkIGRlIGxvcyBBbmRlcyBwYXJhIHF1ZSByZWFsaWNlIGxhIHB1YmxpY2FjacOzbiBlbiBlbCBTaXN0ZW1hIGRlIEJpYmxpb3RlY2FzIG8gZW4gY3VhbHF1aWVyIG90cm8gc2lzdGVtYSBvIGJhc2UgZGUgZGF0b3MgcHJvcGlvIG8gYWplbm8gYSBsYSBVbml2ZXJzaWRhZCB5IHBhcmEgcXVlIGVuIGxvcyB0w6lybWlub3MgZXN0YWJsZWNpZG9zIGVuIGxhIExleSAyMyBkZSAxOTgyLCBMZXkgNDQgZGUgMTk5MywgRGVjaXNpw7NuIEFuZGluYSAzNTEgZGUgMTk5MywgRGVjcmV0byA0NjAgZGUgMTk5NSB5IGRlbcOhcyBub3JtYXMgZ2VuZXJhbGVzIHNvYnJlIGxhIG1hdGVyaWEsIHV0aWxpY2UgZW4gdG9kYXMgc3VzIGZvcm1hcywgbG9zIGRlcmVjaG9zIHBhdHJpbW9uaWFsZXMgZGUgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgdHJhbnNmb3JtYWNpw7NuIHkgZGlzdHJpYnVjacOzbiAoYWxxdWlsZXIsIHByw6lzdGFtbyBww7pibGljbyBlIGltcG9ydGFjacOzbikgcXVlIG1lIGNvcnJlc3BvbmRlbiBjb21vIGNyZWFkb3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50by4gIAoKCkxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgZW1pdGUgZW4gY2FsaWRhZCBkZSBhdXRvciBkZSBsYSBvYnJhIG9iamV0byBkZWwgcHJlc2VudGUgZG9jdW1lbnRvIHkgbm8gY29ycmVzcG9uZGUgYSBjZXNpw7NuIGRlIGRlcmVjaG9zLCBzaW5vIGEgbGEgYXV0b3JpemFjacOzbiBkZSB1c28gYWNhZMOpbWljbyBkZSBjb25mb3JtaWRhZCBjb24gbG8gYW50ZXJpb3JtZW50ZSBzZcOxYWxhZG8uIExhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgaGFjZSBleHRlbnNpdmEgbm8gc29sbyBhIGxhcyBmYWN1bHRhZGVzIHkgZGVyZWNob3MgZGUgdXNvIHNvYnJlIGxhIG9icmEgZW4gZm9ybWF0byBvIHNvcG9ydGUgbWF0ZXJpYWwsIHNpbm8gdGFtYmnDqW4gcGFyYSBmb3JtYXRvIGVsZWN0csOzbmljbywgeSBlbiBnZW5lcmFsIHBhcmEgY3VhbHF1aWVyIGZvcm1hdG8gY29ub2NpZG8gbyBwb3IgY29ub2Nlci4gCgoKRWwgYXV0b3IsIG1hbmlmaWVzdGEgcXVlIGxhIG9icmEgb2JqZXRvIGRlIGxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gZXMgb3JpZ2luYWwgeSBsYSByZWFsaXrDsyBzaW4gdmlvbGFyIG8gdXN1cnBhciBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcywgcG9yIGxvIHRhbnRvLCBsYSBvYnJhIGVzIGRlIHN1IGV4Y2x1c2l2YSBhdXRvcsOtYSB5IHRpZW5lIGxhIHRpdHVsYXJpZGFkIHNvYnJlIGxhIG1pc21hLiAKCgpFbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLiAKCg== |