Implementación y evaluación de algoritmos de aprendizaje por refuerzo para tareas de control de drones en un ambiente de simulación realista

The recent success of the Deep Reinforcement Learning (DRL) algorithms opened its use in different environments and dynamical systems. We present the behavior of a complex dynamic system (quadrotor) in basic tasks as hovering and X-Y displacement in a realistic simulator. The DRL algorithms used wer...

Full description

Autores:
Garzón Albarracin, Juan Felipe
Tipo de recurso:
Fecha de publicación:
2021
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/50934
Acceso en línea:
http://hdl.handle.net/1992/50934
Palabra clave:
Drones
Vehículos piloteados de forma remota
Aprendizaje por refuerzo (Aprendizaje automático)
Redes neurales (Computadores)
Control automático
Ingeniería
Rights
openAccess
License
https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
Description
Summary:The recent success of the Deep Reinforcement Learning (DRL) algorithms opened its use in different environments and dynamical systems. We present the behavior of a complex dynamic system (quadrotor) in basic tasks as hovering and X-Y displacement in a realistic simulator. The DRL algorithms used were for continuous spaces, such as Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO). We probed dense and sparse reward functions and changed the negative component to demonstrate the impact of these parameters on a fast and repeatable learning process. We found that there is a major impact on the agent learning process because of the reward function, a correct selection could make the training times shorter and the repeatability higher. Contrasted with dense rewards, sparse rewards have less repetitive results and have poor results on tasks such as hovering and reaching X-Y points. Negative rewards directly affect the learning process when using PPO.