MINT : Multi Instance Network, an efficient framework for video object segmentation

Video Object Segmentation consists on segmenting an object along all the frames of a video. The use of temporal and spatial cues are essential signals to bear in mind when dealing with this assignment. This thesis proposes MINT, Multi Instance Network, a method that takes into account shape priors,...

Full description

Autores:
Jeanneret Sanmiguel, Guillaume
Tipo de recurso:
Fecha de publicación:
2019
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/43963
Acceso en línea:
http://hdl.handle.net/1992/43963
Palabra clave:
Sistemas multimedia - Investigaciones
Video digital - Investigaciones
Procesamiento de imágenes - Técnicas digitales - Investigaciones
Simulación por computadores - Investigaciones
Ingeniería
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-nd/4.0/
Description
Summary:Video Object Segmentation consists on segmenting an object along all the frames of a video. The use of temporal and spatial cues are essential signals to bear in mind when dealing with this assignment. This thesis proposes MINT, Multi Instance Network, a method that takes into account shape priors, the location and temporal information to create the segmentation of the object of interest while the inference time per frame is beneath state-of-the-art methods. MINT is able to generate segmentations at 50.51 FPS using the complete model and 81.74 FPS for the Fast version. Furthermore, MINT pushes the task of Video Object Segmentation by processing Multiple Instances in a single forward pass without any post-processing. MINT is trained and tested on the Largest Multi-Instance Video Object Segmentation Dataset, Youtube-VOS, achieving an overall performance of 0.592.