Open-ended visual recognition

Visual recognition has traditionally been constrained by closed-set classifications, limiting adaptability to new categories and complex real-world scenarios. This thesis tackles these limitations by leveraging natural language to enhance spatially-aware, open-ended recognition. We first formulate a...

Full description

Autores:
González Osorio, Cristina Isabel
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2024
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/75696
Acceso en línea:
https://hdl.handle.net/1992/75696
Palabra clave:
Artificial Intelligence
Computer Vision
Visual Recognition
Ingeniería
Rights
openAccess
License
Attribution-NonCommercial-NoDerivatives 4.0 International
Description
Summary:Visual recognition has traditionally been constrained by closed-set classifications, limiting adaptability to new categories and complex real-world scenarios. This thesis tackles these limitations by leveraging natural language to enhance spatially-aware, open-ended recognition. We first formulate a task that grounds noun phrases in visual data through panoptic segmentation, achieving fine-grained spatial and semantic alignment. We then explore the generalizability of this approach by developing a multi-modal Transformer model, demonstrating its effectiveness in related tasks such as referring expressions segmentation. Next, we extend these techniques to open-vocabulary recognition, decoupling recognition from segmentation, and aligning regions with semantic concepts to enable recognition of unseen categories. Finally, we address the challenge of open-ended recognition by introducing a robust evaluation framework that captures nuanced semantic relationships. These contributions collectively advance visual recognition by integrating language for enhanced flexibility, adaptability, and comprehensive scene understanding while providing robust tools for evaluation in open-ended tasks.