Gradiente estocástico y aproximación estocástica aplicados a Q-learning

The project is motivated to demonstrate the convergence of Q-learning. This is an algorithm applied to finite Markov decision processes in discrete time, where there is not enough information. Thus, what the algorithm seeks is to solve the optimality equations (or Bellman's equations). With thi...

Full description

Autores:: ñungo Manrique, José Sebastián

Tipo de recurso:: Trabajo de grado de pregrado

Fecha de publicación:: 2020

Institución:: Universidad de los Andes

Repositorio:: Séneca: repositorio Uniandes

Idioma:: spa

id	UNIANDES2_c9c34d1ba7b215d5f69e539e4b6eef33
oai_identifier_str	oai:repositorio.uniandes.edu.co:1992/51295
network_acronym_str	UNIANDES2
network_name_str	Séneca: repositorio Uniandes
repository_id_str
dc.title.spa.fl_str_mv	Gradiente estocástico y aproximación estocástica aplicados a Q-learning
title	Gradiente estocástico y aproximación estocástica aplicados a Q-learning
spellingShingle	Gradiente estocástico y aproximación estocástica aplicados a Q-learning Optimización matemática Funciones convexas Métodos iterativos (Matemáticas) Aproximación estocástica Aprendizaje por refuerzo (Aprendizaje automático) Procesos de Markov Procesos estocásticos Matemáticas
title_short	Gradiente estocástico y aproximación estocástica aplicados a Q-learning
title_full	Gradiente estocástico y aproximación estocástica aplicados a Q-learning
title_fullStr	Gradiente estocástico y aproximación estocástica aplicados a Q-learning
title_full_unstemmed	Gradiente estocástico y aproximación estocástica aplicados a Q-learning
title_sort	Gradiente estocástico y aproximación estocástica aplicados a Q-learning
dc.creator.fl_str_mv	ñungo Manrique, José Sebastián
dc.contributor.advisor.none.fl_str_mv	Junca Peláez, Mauricio José
dc.contributor.author.none.fl_str_mv	ñungo Manrique, José Sebastián
dc.contributor.jury.none.fl_str_mv	Velasco Gregory, Mauricio Fernando
dc.subject.armarc.spa.fl_str_mv	Optimización matemática Funciones convexas Métodos iterativos (Matemáticas) Aproximación estocástica Aprendizaje por refuerzo (Aprendizaje automático) Procesos de Markov Procesos estocásticos
topic	Optimización matemática Funciones convexas Métodos iterativos (Matemáticas) Aproximación estocástica Aprendizaje por refuerzo (Aprendizaje automático) Procesos de Markov Procesos estocásticos Matemáticas
dc.subject.themes.none.fl_str_mv	Matemáticas
description	The project is motivated to demonstrate the convergence of Q-learning. This is an algorithm applied to finite Markov decision processes in discrete time, where there is not enough information. Thus, what the algorithm seeks is to solve the optimality equations (or Bellman's equations). With this purpose in mind, in the project we discussed four main things: 1. Finite Markov decision processes in discrete time, which is the model that interests us from the beginning. 2. Stochastic approximation (SA), which is the algorithm that serves as the general framework for many algorithms, including Q-learning. Under some premises we will be able to establish the convergence of A.E. 3. Stochastic gradient descent method, which is the main tool by which the convergence of the A.E. algorithm can be established. (and many of the Machine Learning algorithms) 4. Reinforcement learning, which is the branch in which the Q-learning algorithm is found.
publishDate	2020
dc.date.issued.none.fl_str_mv	2020
dc.date.accessioned.none.fl_str_mv	2021-08-10T18:19:10Z
dc.date.available.none.fl_str_mv	2021-08-10T18:19:10Z
dc.type.spa.fl_str_mv	Trabajo de grado - Pregrado
dc.type.coarversion.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.spa.fl_str_mv	info:eu-repo/semantics/bachelorThesis
dc.type.coar.spa.fl_str_mv	http://purl.org/coar/resource_type/c_7a1f
dc.type.content.spa.fl_str_mv	Text
dc.type.redcol.spa.fl_str_mv	http://purl.org/redcol/resource_type/TP
format	http://purl.org/coar/resource_type/c_7a1f
dc.identifier.uri.none.fl_str_mv	http://hdl.handle.net/1992/51295
dc.identifier.pdf.none.fl_str_mv	22979.pdf
dc.identifier.instname.spa.fl_str_mv	instname:Universidad de los Andes
dc.identifier.reponame.spa.fl_str_mv	reponame:Repositorio Institucional Séneca
dc.identifier.repourl.spa.fl_str_mv	repourl:https://repositorio.uniandes.edu.co/
url	http://hdl.handle.net/1992/51295
identifier_str_mv	22979.pdf instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.none.fl_str_mv	spa
language	spa
dc.rights.uri.*.fl_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv	http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.extent.none.fl_str_mv	55 hojas
dc.format.mimetype.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidad de los Andes
dc.publisher.program.none.fl_str_mv	Matemáticas
dc.publisher.faculty.none.fl_str_mv	Facultad de Ciencias
dc.publisher.department.none.fl_str_mv	Departamento de Matemáticas
publisher.none.fl_str_mv	Universidad de los Andes
institution	Universidad de los Andes
bitstream.url.fl_str_mv	https://repositorio.uniandes.edu.co/bitstreams/faf92b47-0871-4308-a4b8-2d8cd8edac43/download https://repositorio.uniandes.edu.co/bitstreams/1d2aa866-724b-4766-9f88-000494fb46a5/download https://repositorio.uniandes.edu.co/bitstreams/1363739d-583e-42d7-966e-172fdfeabf16/download
bitstream.checksum.fl_str_mv	c273893031a5e1a1e4db8ab9fff6e6c9 5b576351daaa6cea902579a45dfe11e4 22b9a4c0b9f744f9e9288c8c91ecefe4
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio institucional Séneca
repository.mail.fl_str_mv	adminrepositorio@uniandes.edu.co
_version_	1837005455050670080
spelling	Al consultar y hacer uso de este recurso, está aceptando las condiciones de uso establecidas por los autores.http://creativecommons.org/licenses/by-nc-sa/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Junca Peláez, Mauricio Josévirtual::16940-1ñungo Manrique, José Sebastiánb9935327-afed-4211-a279-abaf8ae814d2500Velasco Gregory, Mauricio Fernando2021-08-10T18:19:10Z2021-08-10T18:19:10Z2020http://hdl.handle.net/1992/5129522979.pdfinstname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/The project is motivated to demonstrate the convergence of Q-learning. This is an algorithm applied to finite Markov decision processes in discrete time, where there is not enough information. Thus, what the algorithm seeks is to solve the optimality equations (or Bellman's equations). With this purpose in mind, in the project we discussed four main things: 1. Finite Markov decision processes in discrete time, which is the model that interests us from the beginning. 2. Stochastic approximation (SA), which is the algorithm that serves as the general framework for many algorithms, including Q-learning. Under some premises we will be able to establish the convergence of A.E. 3. Stochastic gradient descent method, which is the main tool by which the convergence of the A.E. algorithm can be established. (and many of the Machine Learning algorithms) 4. Reinforcement learning, which is the branch in which the Q-learning algorithm is found.El proyecto está motivado en demostrar la convergencia de Q-learning. Este es una algoritmo aplicado a Procesos de decisión de Markov finitos en tiempo discreto, donde no se tiene suficiente información. Así, lo que busca el algoritmo es solucionar las ecuaciones de optimalidad (o ecuaciones de Bellman). Con este propósito en mente, en el proyecto discutimos cuatro cosas principalmente: 1. Procesos de decisión de Markov finitos en tiempo discreto, que es el modelo que desde un principio nos interesa. 2. Aproximación estocástica (A.E.), que es el algoritmo que sirve como marco general de muchos algoritmos, entre ellos Q-learning. Bajo algunas premisas lograremos establecer la convergencia de A.E. 3. Método del descenso del gradiente estocástico, que es la herramienta principal por la cual se puede establecer la convergencia del algoritmo de A.E. ( y de muchos de los algoritmos de Machine Learning ) 4. Reinforcement Learning, que es la rama en la cual se encuentra el algoritmo de Q- learning.MatemáticoPregrado55 hojasapplication/pdfspaUniversidad de los AndesMatemáticasFacultad de CienciasDepartamento de MatemáticasGradiente estocástico y aproximación estocástica aplicados a Q-learningTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesishttp://purl.org/coar/resource_type/c_7a1fhttp://purl.org/coar/version/c_970fb48d4fbd8a85Texthttp://purl.org/redcol/resource_type/TPOptimización matemáticaFunciones convexasMétodos iterativos (Matemáticas)Aproximación estocásticaAprendizaje por refuerzo (Aprendizaje automático)Procesos de MarkovProcesos estocásticosMatemáticas201616131Publicationhttps://scholar.google.es/citations?user=CoIlxH0AAAAJvirtual::16940-10000-0002-5541-0758virtual::16940-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000155861virtual::16940-11e5c3dc6-4d9c-406b-9f99-5c91523b7e49virtual::16940-11e5c3dc6-4d9c-406b-9f99-5c91523b7e49virtual::16940-1THUMBNAIL22979.pdf.jpg22979.pdf.jpgIM Thumbnailimage/jpeg4924https://repositorio.uniandes.edu.co/bitstreams/faf92b47-0871-4308-a4b8-2d8cd8edac43/downloadc273893031a5e1a1e4db8ab9fff6e6c9MD55TEXT22979.pdf.txt22979.pdf.txtExtracted texttext/plain72619https://repositorio.uniandes.edu.co/bitstreams/1d2aa866-724b-4766-9f88-000494fb46a5/download5b576351daaa6cea902579a45dfe11e4MD54ORIGINAL22979.pdfapplication/pdf843026https://repositorio.uniandes.edu.co/bitstreams/1363739d-583e-42d7-966e-172fdfeabf16/download22b9a4c0b9f744f9e9288c8c91ecefe4MD511992/51295oai:repositorio.uniandes.edu.co:1992/512952024-03-13 15:52:10.646http://creativecommons.org/licenses/by-nc-sa/4.0/open.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.co

Gradiente estocástico y aproximación estocástica aplicados a Q-learning

Publicaciones similares