Reinforcement learning for finance: A review

Este artículo ofrece una revisión exhaustiva de la aplicación del aprendizaje por refuerzo (AR) en el dominio de las finanzas, y arroja una luz sobre el innovador progreso alcanzado y los desafíos que se avecinan. Exploramos cómo el AR, un subcampo del aprendizaje automático, ha sido instrumental pa...

Full description

Autores:: León Nieto, Diego Ismael

Tipo de recurso:: Article of journal

Fecha de publicación:: 2023

Institución:: Universidad Externado de Colombia

Repositorio:: Biblioteca Digital Universidad Externado de Colombia

Idioma:: spa

id	uexternad2_cb2d656e5451ac27748630fc1298772a
oai_identifier_str	oai:bdigital.uexternado.edu.co:001/15362
network_acronym_str	uexternad2
network_name_str	Biblioteca Digital Universidad Externado de Colombia
repository_id_str
dc.title.spa.fl_str_mv	Reinforcement learning for finance: A review
dc.title.translated.eng.fl_str_mv	Reinforcement learning for finance: A review
title	Reinforcement learning for finance: A review
spellingShingle	Reinforcement learning for finance: A review Reinforcement learning; machine learning; Markov decision process; finance aprendizaje por refuerzo; aprendizaje automático; procesos de decisión de Markov; finanzas
title_short	Reinforcement learning for finance: A review
title_full	Reinforcement learning for finance: A review
title_fullStr	Reinforcement learning for finance: A review
title_full_unstemmed	Reinforcement learning for finance: A review
title_sort	Reinforcement learning for finance: A review
dc.creator.fl_str_mv	León Nieto, Diego Ismael
dc.contributor.author.spa.fl_str_mv	León Nieto, Diego Ismael
dc.subject.eng.fl_str_mv	Reinforcement learning; machine learning; Markov decision process; finance
topic	Reinforcement learning; machine learning; Markov decision process; finance aprendizaje por refuerzo; aprendizaje automático; procesos de decisión de Markov; finanzas
dc.subject.spa.fl_str_mv	aprendizaje por refuerzo; aprendizaje automático; procesos de decisión de Markov; finanzas
description	Este artículo ofrece una revisión exhaustiva de la aplicación del aprendizaje por refuerzo (AR) en el dominio de las finanzas, y arroja una luz sobre el innovador progreso alcanzado y los desafíos que se avecinan. Exploramos cómo el AR, un subcampo del aprendizaje automático, ha sido instrumental para resolver problemas financieros complejos al permitir procesos de toma de decisiones que optimizan las recompensas a largo plazo. El AR es una poderosa técnica de aprendizaje automático que se puede utilizar para entrenar a agentes a fin de tomar decisiones en entornos complejos. En finanzas, el AR se ha utilizado para resolver una variedad de problemas, incluyendo la ejecución óptima, la optimización de carteras, la valoración y cobertura de opciones, la creación de mercados, el enrutamiento inteligente de órdenes y el robo-asesoramiento. En este artículo revisamos los desarrollos recientes en AR para finanzas. Comenzamos proporcionando una introducción al AR y a los procesos de decisión de Markov (MDP), que es el marco matemático para el AR. Luego discutimos los diversos algoritmos de AR que se han utilizado en finanzas, con un enfoque en métodos basados en valor y políticas. También discutimos el uso de redes neuronales en AR para finanzas. Finalmente, abordamos los resultados de estudios recientes que han utilizado AR para resolver problemas financieros. Concluimos discutiendo los desafíos y las oportunidades para futuras investigaciones en AR para finanzas.
publishDate	2023
dc.date.accessioned.none.fl_str_mv	2023-11-30T09:55:17Z 2024-06-07T07:31:14Z
dc.date.available.none.fl_str_mv	2023-11-30T09:55:17Z 2024-06-07T07:31:14Z
dc.date.issued.none.fl_str_mv	2023-11-30
dc.type.spa.fl_str_mv	Artículo de revista
dc.type.coar.fl_str_mv	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coar.spa.fl_str_mv	http://purl.org/coar/resource_type/c_6501
dc.type.coarversion.spa.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.content.spa.fl_str_mv	Text
dc.type.driver.spa.fl_str_mv	info:eu-repo/semantics/article
dc.type.local.eng.fl_str_mv	Journal article
dc.type.redcol.spa.fl_str_mv	http://purl.org/redcol/resource_type/ARTREF
dc.type.version.spa.fl_str_mv	info:eu-repo/semantics/publishedVersion
format	http://purl.org/coar/resource_type/c_6501
status_str	publishedVersion
dc.identifier.doi.none.fl_str_mv	10.18601/17941113.n24.02
dc.identifier.eissn.none.fl_str_mv	2346-2140
dc.identifier.issn.none.fl_str_mv	1794-1113
dc.identifier.uri.none.fl_str_mv	https://bdigital.uexternado.edu.co/handle/001/15362
dc.identifier.url.none.fl_str_mv	https://doi.org/10.18601/17941113.n24.02
identifier_str_mv	10.18601/17941113.n24.02 2346-2140 1794-1113
url	https://bdigital.uexternado.edu.co/handle/001/15362 https://doi.org/10.18601/17941113.n24.02
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.bitstream.none.fl_str_mv	https://revistas.uexternado.edu.co/index.php/odeon/article/download/9072/15142
dc.relation.citationedition.spa.fl_str_mv	Núm. 24 , Año 2023 : Enero-Junio
dc.relation.citationendpage.none.fl_str_mv	24
dc.relation.citationissue.spa.fl_str_mv	24
dc.relation.citationstartpage.none.fl_str_mv	7
dc.relation.ispartofjournal.spa.fl_str_mv	ODEON
dc.relation.references.spa.fl_str_mv	Andreae, J. H. (1963). STELLA: A scheme for a learning machine. IFAC Proceedings Volumes, 1(2), 497-502. https://doi.org/10.1016/S1474-6670(17)69682-4 Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine intelligence, 35(8), 1798-1828. https://doi.org/10.1109/TPAMI.2013.50 Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271-1291. https://doi.org/10.1080/14697688.2019.1571683 Camerer, C. F. (2003). Behavioural studies of strategic thinking in games. Trends in Cognitive Sciences, 7(5), 225-231. https://doi.org/10.1016/S1364-6613(03)00094-9 Cannelli, L., Nuti, G., Sala, M., & Szehr, O. (2020). Hedging using reinforcement learning: Contextual K-armed bandit versus Q-learning. Working paper, arXiv: 2007.01623. Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10–27. https://doi.org/10.3905/jfds.2020.1.052 Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). RL2: Fast reinforcement learning via slow reinforcement learning. Working paper, arXiv:1611.02779. Errecalde, M. L., Muchut, A., Aguirre, G., & Montoya, C. I. (2000). Aprendizaje por Refuerzo aplicado a la resolución de problemas no triviales. In II Workshop de Investigadores en Ciencias de la Computación. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., … & Welty, C. (2010). Building Watson: An Overview of the DeepQA Project. AI Magazine, 31(3), 59-79. https://doi.org/10.1609/aimag.v31i3.2303 Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information processing systems, 29, 1-9. Gosavi, A. (2009). Reinforcement learning: A tutorial survey and recent advances. INFORMS Journal on Computing, 21(2), 178-192. https://doi.org/10.1287/ijoc.1080.0305 Hambly, B., Xu, R., & Yang, H. (2021). Recent advances in reinforcement learning in finance. arXiv preprint arXiv:2112.04553. https://arxiv.org/abs/2112.04553 Halperin, I. (2019). The QLBS Q-learner goes NuQlear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19(9), 1543–1553. https://doi.org/10.1080/14697688.2019.1622302 Halperin, I. (2020). QLBS: Q-learner in the Black-Scholes-Merton world. The Journal of Derivatives, 28(1), 99-122. https://doi.org/10.3905/jod.2020.1.108 Hu, Y. J., & Lin, S. J. (2019). Deep reinforcement learning for optimizing finance portfolio management. In 2019 Amity International Conference on Artificial Intelligence (AICAI) (pp. 14-20). IEEE. https://doi.org/10.1109/AICAI.2019.8701368 Kaelbling, L. P. (1993). Learning in embedded systems. MIT Press. Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285. https://doi.org/10.1613/jair.301 Kapoor, A., Gulli, A., Pal, S., & Chollet, F. (2022). Deep Learning with Tensor Flow and Keras: Build and deploy supervised, unsupervised, deep, and reinforcement learning models. Packt Publishing Ltd. Kohl, N., & Stone, P. (2004, April). Policy gradient reinforcement learning for fast quadrupedal locomotion. In IEEE International Conference on Robotics and Automation, 2004. https://doi.org/10.1109/ROBOT.2004.1307456 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436- 444. https://doi.org/10.1038/nature14539 Li, Y., Szepesvari, C., & Schuurmans, D. (2009). Learning exercise policies for American options. In Artificial intelligence and statistics (pp. 352–359). PMLR. https://proceedings.mlr.press/v5/li09d.html Michie, D. & Chambers, R. A. (1968). BOXES: An experiment in adaptive control. In E. Dale & D. Michie (eds.), Machine Intelligence. Oliver and Boyd. Millea, A., & Edalat, A. (2022). Using deep reinforcement learning with hierarchical risk parity for portfolio optimization. International Journal of Financial Studies, 11(1), 10. https://doi.org/10.3390/ijfs11010010 Minsky, M. L. (1954). Theory of neural-analog reinforcement systems and its application to the brain-model problem. Princeton University. Nath, S., Liu, V., Chan, A., Li, X., White, A., & White, M. (2020). Training recurrent neural networks online by learning explicit state variables. In International conference on learning representations. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489. https://doi.org/10.1038/nature16961 Schlegel, M., Chung, W., Graves, D., Qian, J., & White, M. (2019). Importance resampling for off-policy prediction. Advances in Neural Information Processing Systems, 32. Sun, Q., & Si, Y. W. (2022). Supervised actor-critic reinforcement learning with action feedback for algorithmic trading. Applied Intelligence, 53, 16875-16892. https://doi.org/10.1007/s10489-022-04322-5 Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990 (pp. 216-224). https://doi.org/10.1016/B978-1-55860-141-3.50030-4 Sutton, R. S. (1991). Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4), 160-163. https://doi.org/10.1145/122344.122377 Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An introduction. MIT Press. Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68. https://doi.org/10.1145/203330.203343 Théate, T., & Ernst, D. (2021). An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 173, 114632. https://doi.org/10.1016/j.eswa.2021.114632 Thrun, S. B., & Möller, K. (1991). Active exploration in dynamic environments. Advances in neural information processing systems, 4. https://proceedings.neurips.cc/paper/1991/hash/e5f6ad6ce374177eef023bf5d0c018b 6-Abstract.html Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 1635-1685. https://doi.org/10.5555/1577069.1755839 Thorndike, E. L. (1911). Animal intelligence: Experimental studies. Transaction Publishers. Torres Cortés, L. J., Velázquez Vadillo, F., & Turner Barragán, E. H. (2017). El principio de optimalidad de Bellman aplicado a la estructura financiera corporativa. Caso Mexicano. Análisis Económico, 32(81), 151-181. Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence 2008.
dc.rights.spa.fl_str_mv	Diego Ismael León Nieto - 2023
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.rights.uri.spa.fl_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0
rights_invalid_str_mv	Diego Ismael León Nieto - 2023 http://purl.org/coar/access_right/c_abf2 http://creativecommons.org/licenses/by-nc-sa/4.0
eu_rights_str_mv	openAccess
dc.format.mimetype.spa.fl_str_mv	application/pdf
dc.publisher.spa.fl_str_mv	Universidad Externado de Colombia
dc.source.spa.fl_str_mv	https://revistas.uexternado.edu.co/index.php/odeon/article/view/9072
institution	Universidad Externado de Colombia
bitstream.url.fl_str_mv	https://bdigital.uexternado.edu.co/bitstreams/3d76f126-dda0-4139-9c6e-34b6aaf9b010/download
bitstream.checksum.fl_str_mv	817edc58063324f0a09380207d67b493
bitstream.checksumAlgorithm.fl_str_mv	MD5
repository.name.fl_str_mv	Universidad Externado de Colombia
repository.mail.fl_str_mv	metabiblioteca@metabiblioteca.org
_version_	1837098558450302976
spelling	León Nieto, Diego Ismael2023-11-30T09:55:17Z2024-06-07T07:31:14Z2023-11-30T09:55:17Z2024-06-07T07:31:14Z2023-11-30Este artículo ofrece una revisión exhaustiva de la aplicación del aprendizaje por refuerzo (AR) en el dominio de las finanzas, y arroja una luz sobre el innovador progreso alcanzado y los desafíos que se avecinan. Exploramos cómo el AR, un subcampo del aprendizaje automático, ha sido instrumental para resolver problemas financieros complejos al permitir procesos de toma de decisiones que optimizan las recompensas a largo plazo. El AR es una poderosa técnica de aprendizaje automático que se puede utilizar para entrenar a agentes a fin de tomar decisiones en entornos complejos. En finanzas, el AR se ha utilizado para resolver una variedad de problemas, incluyendo la ejecución óptima, la optimización de carteras, la valoración y cobertura de opciones, la creación de mercados, el enrutamiento inteligente de órdenes y el robo-asesoramiento. En este artículo revisamos los desarrollos recientes en AR para finanzas. Comenzamos proporcionando una introducción al AR y a los procesos de decisión de Markov (MDP), que es el marco matemático para el AR. Luego discutimos los diversos algoritmos de AR que se han utilizado en finanzas, con un enfoque en métodos basados en valor y políticas. También discutimos el uso de redes neuronales en AR para finanzas. Finalmente, abordamos los resultados de estudios recientes que han utilizado AR para resolver problemas financieros. Concluimos discutiendo los desafíos y las oportunidades para futuras investigaciones en AR para finanzas.This paper provides a comprehensive review of the application of Reinforcement Learning (RL) in the domain of finance, shedding light on the groundbreaking progress achieved and the challenges that lie ahead. We explore how RL, a subfield of machine learning, has been instrumental in solving complex financial problems by enabling decision-making processes that optimize long-term rewards. Reinforcement learning (RL) is a powerful machine learning technique that can be used to train agents to make decisions in complex environments. In finance, RL has been used to solve a variety of problems, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising. In this paper, we review the recent developments in RL for finance. We begin by introducing RL and Markov decision processes (MDPs), which is the mathematical framework for RL. We then discuss the various RL algorithms that have been used in finance, with a focus on value-based and policy-based methods. We also discuss the use of neural networks in RL for finance. Finally, we discuss the results of recent studies that have used RL to solve financial problems. We conclude by discussing the challenges and opportunities for future research in RL for finance.application/pdf10.18601/17941113.n24.022346-21401794-1113https://bdigital.uexternado.edu.co/handle/001/15362https://doi.org/10.18601/17941113.n24.02spaUniversidad Externado de Colombiahttps://revistas.uexternado.edu.co/index.php/odeon/article/download/9072/15142Núm. 24 , Año 2023 : Enero-Junio24247ODEONAndreae, J. H. (1963). STELLA: A scheme for a learning machine. IFAC Proceedings Volumes, 1(2), 497-502. https://doi.org/10.1016/S1474-6670(17)69682-4Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine intelligence, 35(8), 1798-1828. https://doi.org/10.1109/TPAMI.2013.50Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271-1291. https://doi.org/10.1080/14697688.2019.1571683Camerer, C. F. (2003). Behavioural studies of strategic thinking in games. Trends in Cognitive Sciences, 7(5), 225-231. https://doi.org/10.1016/S1364-6613(03)00094-9Cannelli, L., Nuti, G., Sala, M., & Szehr, O. (2020). Hedging using reinforcement learning: Contextual K-armed bandit versus Q-learning. Working paper, arXiv: 2007.01623.Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10–27. https://doi.org/10.3905/jfds.2020.1.052Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). RL2: Fast reinforcement learning via slow reinforcement learning. Working paper, arXiv:1611.02779.Errecalde, M. L., Muchut, A., Aguirre, G., & Montoya, C. I. (2000). Aprendizaje por Refuerzo aplicado a la resolución de problemas no triviales. In II Workshop de Investigadores en Ciencias de la Computación.Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., … & Welty, C. (2010). Building Watson: An Overview of the DeepQA Project. AI Magazine, 31(3), 59-79. https://doi.org/10.1609/aimag.v31i3.2303Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information processing systems, 29, 1-9.Gosavi, A. (2009). Reinforcement learning: A tutorial survey and recent advances. INFORMS Journal on Computing, 21(2), 178-192. https://doi.org/10.1287/ijoc.1080.0305Hambly, B., Xu, R., & Yang, H. (2021). Recent advances in reinforcement learning in finance. arXiv preprint arXiv:2112.04553. https://arxiv.org/abs/2112.04553Halperin, I. (2019). The QLBS Q-learner goes NuQlear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19(9), 1543–1553. https://doi.org/10.1080/14697688.2019.1622302Halperin, I. (2020). QLBS: Q-learner in the Black-Scholes-Merton world. The Journal of Derivatives, 28(1), 99-122. https://doi.org/10.3905/jod.2020.1.108Hu, Y. J., & Lin, S. J. (2019). Deep reinforcement learning for optimizing finance portfolio management. In 2019 Amity International Conference on Artificial Intelligence (AICAI) (pp. 14-20). IEEE. https://doi.org/10.1109/AICAI.2019.8701368Kaelbling, L. P. (1993). Learning in embedded systems. MIT Press.Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285. https://doi.org/10.1613/jair.301Kapoor, A., Gulli, A., Pal, S., & Chollet, F. (2022). Deep Learning with Tensor Flow and Keras: Build and deploy supervised, unsupervised, deep, and reinforcement learning models. Packt Publishing Ltd.Kohl, N., & Stone, P. (2004, April). Policy gradient reinforcement learning for fast quadrupedal locomotion. In IEEE International Conference on Robotics and Automation, 2004. https://doi.org/10.1109/ROBOT.2004.1307456LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436- 444. https://doi.org/10.1038/nature14539Li, Y., Szepesvari, C., & Schuurmans, D. (2009). Learning exercise policies for American options. In Artificial intelligence and statistics (pp. 352–359). PMLR. https://proceedings.mlr.press/v5/li09d.htmlMichie, D. & Chambers, R. A. (1968). BOXES: An experiment in adaptive control. In E. Dale & D. Michie (eds.), Machine Intelligence. Oliver and Boyd.Millea, A., & Edalat, A. (2022). Using deep reinforcement learning with hierarchical risk parity for portfolio optimization. International Journal of Financial Studies, 11(1), 10. https://doi.org/10.3390/ijfs11010010Minsky, M. L. (1954). Theory of neural-analog reinforcement systems and its application to the brain-model problem. Princeton University.Nath, S., Liu, V., Chan, A., Li, X., White, A., & White, M. (2020). Training recurrent neural networks online by learning explicit state variables. In International conference on learning representations.Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489. https://doi.org/10.1038/nature16961Schlegel, M., Chung, W., Graves, D., Qian, J., & White, M. (2019). Importance resampling for off-policy prediction. Advances in Neural Information Processing Systems, 32.Sun, Q., & Si, Y. W. (2022). Supervised actor-critic reinforcement learning with action feedback for algorithmic trading. Applied Intelligence, 53, 16875-16892. https://doi.org/10.1007/s10489-022-04322-5Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990 (pp. 216-224). https://doi.org/10.1016/B978-1-55860-141-3.50030-4Sutton, R. S. (1991). Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4), 160-163. https://doi.org/10.1145/122344.122377Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An introduction. MIT Press.Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68. https://doi.org/10.1145/203330.203343Théate, T., & Ernst, D. (2021). An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 173, 114632. https://doi.org/10.1016/j.eswa.2021.114632Thrun, S. B., & Möller, K. (1991). Active exploration in dynamic environments. Advances in neural information processing systems, 4. https://proceedings.neurips.cc/paper/1991/hash/e5f6ad6ce374177eef023bf5d0c018b 6-Abstract.htmlTaylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 1635-1685. https://doi.org/10.5555/1577069.1755839Thorndike, E. L. (1911). Animal intelligence: Experimental studies. Transaction Publishers.Torres Cortés, L. J., Velázquez Vadillo, F., & Turner Barragán, E. H. (2017). El principio de optimalidad de Bellman aplicado a la estructura financiera corporativa. Caso Mexicano. Análisis Económico, 32(81), 151-181.Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence 2008.Diego Ismael León Nieto - 2023info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.http://creativecommons.org/licenses/by-nc-sa/4.0https://revistas.uexternado.edu.co/index.php/odeon/article/view/9072Reinforcement learning;machine learning;Markov decision process;financeaprendizaje por refuerzo;aprendizaje automático;procesos de decisión de Markov;finanzasReinforcement learning for finance: A reviewReinforcement learning for finance: A reviewArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1http://purl.org/coar/version/c_970fb48d4fbd8a85Textinfo:eu-repo/semantics/articleJournal articlehttp://purl.org/redcol/resource_type/ARTREFinfo:eu-repo/semantics/publishedVersionPublicationOREORE.xmltext/xml2454https://bdigital.uexternado.edu.co/bitstreams/3d76f126-dda0-4139-9c6e-34b6aaf9b010/download817edc58063324f0a09380207d67b493MD51001/15362oai:bdigital.uexternado.edu.co:001/153622024-06-07 02:31:14.574http://creativecommons.org/licenses/by-nc-sa/4.0Diego Ismael León Nieto - 2023https://bdigital.uexternado.edu.coUniversidad Externado de Colombiametabiblioteca@metabiblioteca.org

Reinforcement learning for finance: A review

Publicaciones similares