Large-scale non-linear multimodal semantic embedding

The main goal of this thesis is to investigate effective and efficient methods to combine complementary evidence, and model the relationships between multiple modalities of multimedia data in order to improve the access and analysis of the information, to finally obtain valuable insights about the d...

Full description

Autores:
Vanegas Ramírez, Jorge Andrés
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2018
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/63954
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/63954
http://bdigital.unal.edu.co/64612/
Palabra clave:
0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Multi-modal information
Multimodal Data Analysis
Machine Learning
Latent semantic embedding
Kernel methods
Large-scale datasets
Información multimodal
Análisis de datos multimodales
Aprendizaje de máquina
Indexación semántica latente
Métodos del kernel
Conjuntos de datos a gran escala
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:The main goal of this thesis is to investigate effective and efficient methods to combine complementary evidence, and model the relationships between multiple modalities of multimedia data in order to improve the access and analysis of the information, to finally obtain valuable insights about the data. In this thesis is proposed to use multimodal latent semantic as the strategy that allows us to combine and to exploit the different views from this heterogeneous source of knowledge, by modeling relations between the different modalities and finding a new common low-dimensional semantic representation space. For a richer modeling, it is proposed the usage of kernel-based methods that usually present accurate and robust results. Unfortunately, kernel-based methods present a high computational complexity that makes them infeasible for large data collections. This drawback implies one of the most important challenges addressed in this thesis, which was to investigate alternatives to handle large-scale datasets with modest computational architectures. In this thesis, several kernelized semantic embedding methods based on matrix factorization have been proposed, developed and evaluated. Thanks to the non-linear capabilities of the kernel representations, the proposed methods can model the complex relationships between the different modalities, allowing to construct a richer multimodal representation even when one of the modalities presents incomplete data. Besides, the proposed methods have been designed under a scalable architecture based on two main strategies: online learning and learning-in-a-budget that allow preserving low computational requirements in terms of memory usage and processing time. An extended experimental evaluation shows that the proposed multimodal strategies achieve the state-of-the-art in several data analysis tasks, such as multi-labeling and multi-class classification and cross-modal retrieval and under different learning setups, such as supervised, semi-supervised, and transductive learning. Furthermore, thanks to the online learning and learning-in-a-budget strategies proposed in this thesis, the scalability capabilities are preserved allowing to deal with large-scale multimodal collections.