Descripción de escenas por medio de aprendizaje profundo
This document contains information related with the Project developed for the reproduction of results of image captioning or image description, this previous Works were realized by experts in the field The work was done on the free software platform Python using the TensorFlow library, which ended w...
- Autores:
-
Rincón Núñez, Adalberto
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2018
- Institución:
- Universidad Autónoma de Occidente
- Repositorio:
- RED: Repositorio Educativo Digital UAO
- Idioma:
- spa
- OAI Identifier:
- oai:red.uao.edu.co:10614/10557
- Acceso en línea:
- http://hdl.handle.net/10614/10557
- Palabra clave:
- Ingeniería Mecatrónica
Redes neurales (Computadores)
Descripción de imágenes
- Rights
- openAccess
- License
- Derechos Reservados - Universidad Autónoma de Occidente
Summary: | This document contains information related with the Project developed for the reproduction of results of image captioning or image description, this previous Works were realized by experts in the field The work was done on the free software platform Python using the TensorFlow library, which ended with the training of a recurrent neural network whose function is to receive images as input and output an image file equal to the input with a written description of what you see in the image. For the training, two different types of data sets were required, the first one covering a wide range of images and the second one providing the description of them; for this, the available data sets were taken on the COCO platform (Common Objects in Context), which is a broad database for detection, segmentation and description. The databases used were those provided for the "COCO captioning challenge" a contest held in 2015. These datasets are pre-processed, which for the images is the proper extraction of characteristics of the images, this process is performed by means of a convolutional neural network (CNN), in this case the Inception network was used, the characteristics were subsequently stored in a file that was used for the training of the RNN. The document also shows information about the process realized in the Project that allows the use of the codes to train a red wich one describe images in spanish; following that, the Project conclude with the training of a red that can describes selected enviroments, with the condition of creating new data sets. At the end, tests were carried out on the several networks, where satisfactory results were found as descriptions were provided according to what was supplied, and also in some of the test the description show fails, like the gender of the person or the color of objects that the picture contains |
---|