Open-ended visual recognition
Visual recognition has traditionally been constrained by closed-set classifications, limiting adaptability to new categories and complex real-world scenarios. This thesis tackles these limitations by leveraging natural language to enhance spatially-aware, open-ended recognition. We first formulate a...
- Autores:
-
González Osorio, Cristina Isabel
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2024
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/75696
- Acceso en línea:
- https://hdl.handle.net/1992/75696
- Palabra clave:
- Artificial Intelligence
Computer Vision
Visual Recognition
Ingeniería
- Rights
- openAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 International
Summary: | Visual recognition has traditionally been constrained by closed-set classifications, limiting adaptability to new categories and complex real-world scenarios. This thesis tackles these limitations by leveraging natural language to enhance spatially-aware, open-ended recognition. We first formulate a task that grounds noun phrases in visual data through panoptic segmentation, achieving fine-grained spatial and semantic alignment. We then explore the generalizability of this approach by developing a multi-modal Transformer model, demonstrating its effectiveness in related tasks such as referring expressions segmentation. Next, we extend these techniques to open-vocabulary recognition, decoupling recognition from segmentation, and aligning regions with semantic concepts to enable recognition of unseen categories. Finally, we address the challenge of open-ended recognition by introducing a robust evaluation framework that captures nuanced semantic relationships. These contributions collectively advance visual recognition by integrating language for enhanced flexibility, adaptability, and comprehensive scene understanding while providing robust tools for evaluation in open-ended tasks. |
---|