Gradiente estocástico y aproximación estocástica aplicados a Q-learning

The project is motivated to demonstrate the convergence of Q-learning. This is an algorithm applied to finite Markov decision processes in discrete time, where there is not enough information. Thus, what the algorithm seeks is to solve the optimality equations (or Bellman's equations). With thi...

Full description

Autores:
ñungo Manrique, José Sebastián
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2020
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
spa
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/51295
Acceso en línea:
http://hdl.handle.net/1992/51295
Palabra clave:
Optimización matemática
Funciones convexas
Métodos iterativos (Matemáticas)
Aproximación estocástica
Aprendizaje por refuerzo (Aprendizaje automático)
Procesos de Markov
Procesos estocásticos
Matemáticas
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-sa/4.0/
Description
Summary:The project is motivated to demonstrate the convergence of Q-learning. This is an algorithm applied to finite Markov decision processes in discrete time, where there is not enough information. Thus, what the algorithm seeks is to solve the optimality equations (or Bellman's equations). With this purpose in mind, in the project we discussed four main things: 1. Finite Markov decision processes in discrete time, which is the model that interests us from the beginning. 2. Stochastic approximation (SA), which is the algorithm that serves as the general framework for many algorithms, including Q-learning. Under some premises we will be able to establish the convergence of A.E. 3. Stochastic gradient descent method, which is the main tool by which the convergence of the A.E. algorithm can be established. (and many of the Machine Learning algorithms) 4. Reinforcement learning, which is the branch in which the Q-learning algorithm is found.