Gradiente estocástico y aproximación estocástica aplicados a Q-learning

The project is motivated to demonstrate the convergence of Q-learning. This is an algorithm applied to finite Markov decision processes in discrete time, where there is not enough information. Thus, what the algorithm seeks is to solve the optimality equations (or Bellman's equations). With thi...