Large-scale non-linear multimodal semantic embedding
The main goal of this thesis is to investigate effective and efficient methods to combine complementary evidence, and model the relationships between multiple modalities of multimedia data in order to improve the access and analysis of the information, to finally obtain valuable insights about the d...
- Autores:
-
Vanegas Ramírez, Jorge Andrés
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2018
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/63954
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/63954
http://bdigital.unal.edu.co/64612/
- Palabra clave:
- 0 Generalidades / Computer science, information and general works
62 Ingeniería y operaciones afines / Engineering
Multi-modal information
Multimodal Data Analysis
Machine Learning
Latent semantic embedding
Kernel methods
Large-scale datasets
Información multimodal
Análisis de datos multimodales
Aprendizaje de máquina
Indexación semántica latente
Métodos del kernel
Conjuntos de datos a gran escala
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_fa3c9cdf1b1ccdaaa94b1144c1063439 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/63954 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Large-scale non-linear multimodal semantic embedding |
title |
Large-scale non-linear multimodal semantic embedding |
spellingShingle |
Large-scale non-linear multimodal semantic embedding 0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering Multi-modal information Multimodal Data Analysis Machine Learning Latent semantic embedding Kernel methods Large-scale datasets Información multimodal Análisis de datos multimodales Aprendizaje de máquina Indexación semántica latente Métodos del kernel Conjuntos de datos a gran escala |
title_short |
Large-scale non-linear multimodal semantic embedding |
title_full |
Large-scale non-linear multimodal semantic embedding |
title_fullStr |
Large-scale non-linear multimodal semantic embedding |
title_full_unstemmed |
Large-scale non-linear multimodal semantic embedding |
title_sort |
Large-scale non-linear multimodal semantic embedding |
dc.creator.fl_str_mv |
Vanegas Ramírez, Jorge Andrés |
dc.contributor.advisor.spa.fl_str_mv |
Escalante Balderas, Hugo Jair (Thesis advisor) |
dc.contributor.author.spa.fl_str_mv |
Vanegas Ramírez, Jorge Andrés |
dc.contributor.spa.fl_str_mv |
Gonzalez Osorio, Fabio Augusto |
dc.subject.ddc.spa.fl_str_mv |
0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering |
topic |
0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering Multi-modal information Multimodal Data Analysis Machine Learning Latent semantic embedding Kernel methods Large-scale datasets Información multimodal Análisis de datos multimodales Aprendizaje de máquina Indexación semántica latente Métodos del kernel Conjuntos de datos a gran escala |
dc.subject.proposal.spa.fl_str_mv |
Multi-modal information Multimodal Data Analysis Machine Learning Latent semantic embedding Kernel methods Large-scale datasets Información multimodal Análisis de datos multimodales Aprendizaje de máquina Indexación semántica latente Métodos del kernel Conjuntos de datos a gran escala |
description |
The main goal of this thesis is to investigate effective and efficient methods to combine complementary evidence, and model the relationships between multiple modalities of multimedia data in order to improve the access and analysis of the information, to finally obtain valuable insights about the data. In this thesis is proposed to use multimodal latent semantic as the strategy that allows us to combine and to exploit the different views from this heterogeneous source of knowledge, by modeling relations between the different modalities and finding a new common low-dimensional semantic representation space. For a richer modeling, it is proposed the usage of kernel-based methods that usually present accurate and robust results. Unfortunately, kernel-based methods present a high computational complexity that makes them infeasible for large data collections. This drawback implies one of the most important challenges addressed in this thesis, which was to investigate alternatives to handle large-scale datasets with modest computational architectures. In this thesis, several kernelized semantic embedding methods based on matrix factorization have been proposed, developed and evaluated. Thanks to the non-linear capabilities of the kernel representations, the proposed methods can model the complex relationships between the different modalities, allowing to construct a richer multimodal representation even when one of the modalities presents incomplete data. Besides, the proposed methods have been designed under a scalable architecture based on two main strategies: online learning and learning-in-a-budget that allow preserving low computational requirements in terms of memory usage and processing time. An extended experimental evaluation shows that the proposed multimodal strategies achieve the state-of-the-art in several data analysis tasks, such as multi-labeling and multi-class classification and cross-modal retrieval and under different learning setups, such as supervised, semi-supervised, and transductive learning. Furthermore, thanks to the online learning and learning-in-a-budget strategies proposed in this thesis, the scalability capabilities are preserved allowing to deal with large-scale multimodal collections. |
publishDate |
2018 |
dc.date.issued.spa.fl_str_mv |
2018-06-19 |
dc.date.accessioned.spa.fl_str_mv |
2019-07-02T22:19:50Z |
dc.date.available.spa.fl_str_mv |
2019-07-02T22:19:50Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Doctorado |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.coar.spa.fl_str_mv |
http://purl.org/coar/resource_type/c_db06 |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TD |
format |
http://purl.org/coar/resource_type/c_db06 |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/63954 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/64612/ |
url |
https://repositorio.unal.edu.co/handle/unal/63954 http://bdigital.unal.edu.co/64612/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas Ingeniería de Sistemas |
dc.relation.references.spa.fl_str_mv |
Vanegas Ramírez, Jorge Andrés (2018) Large-scale non-linear multimodal semantic embedding. Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/63954/1/doctoral-thesis-jorge.pdf https://repositorio.unal.edu.co/bitstream/unal/63954/2/doctoral-thesis-jorge.pdf.jpg |
bitstream.checksum.fl_str_mv |
99de1dd16e5fcaaaa0c6f929d0c43bb6 f1460fdfab2dcbdcdbb4700932f60926 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089436038692864 |
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Gonzalez Osorio, Fabio AugustoEscalante Balderas, Hugo Jair (Thesis advisor)194622ff-2457-48c7-bd13-7ae4925148ba-1Vanegas Ramírez, Jorge Andrésa07820a9-803e-416f-a53c-885ac6d8ea1e3002019-07-02T22:19:50Z2019-07-02T22:19:50Z2018-06-19https://repositorio.unal.edu.co/handle/unal/63954http://bdigital.unal.edu.co/64612/The main goal of this thesis is to investigate effective and efficient methods to combine complementary evidence, and model the relationships between multiple modalities of multimedia data in order to improve the access and analysis of the information, to finally obtain valuable insights about the data. In this thesis is proposed to use multimodal latent semantic as the strategy that allows us to combine and to exploit the different views from this heterogeneous source of knowledge, by modeling relations between the different modalities and finding a new common low-dimensional semantic representation space. For a richer modeling, it is proposed the usage of kernel-based methods that usually present accurate and robust results. Unfortunately, kernel-based methods present a high computational complexity that makes them infeasible for large data collections. This drawback implies one of the most important challenges addressed in this thesis, which was to investigate alternatives to handle large-scale datasets with modest computational architectures. In this thesis, several kernelized semantic embedding methods based on matrix factorization have been proposed, developed and evaluated. Thanks to the non-linear capabilities of the kernel representations, the proposed methods can model the complex relationships between the different modalities, allowing to construct a richer multimodal representation even when one of the modalities presents incomplete data. Besides, the proposed methods have been designed under a scalable architecture based on two main strategies: online learning and learning-in-a-budget that allow preserving low computational requirements in terms of memory usage and processing time. An extended experimental evaluation shows that the proposed multimodal strategies achieve the state-of-the-art in several data analysis tasks, such as multi-labeling and multi-class classification and cross-modal retrieval and under different learning setups, such as supervised, semi-supervised, and transductive learning. Furthermore, thanks to the online learning and learning-in-a-budget strategies proposed in this thesis, the scalability capabilities are preserved allowing to deal with large-scale multimodal collections.Resumen: El objetivo principal de esta tesis es investigar m´etodos eficaces y eficientes para combinar evidencia complementaria de múltiples modalidades de información multimedia y modelar las relaciones entre éstas, con el fin de mejorar el acceso y el análisis de los datos contenidos. En esta tesis se pretende utilizar la estrategia de semántica latente multimodal, la cual permite combinar y explotar las diferentes vistas de esta fuente de información heterogénea, modelando las relaciones entre las diferentes modalidades y encontrando un nuevo espacio com´un de representación semántica de baja dimensionalidad. Para un modelado más rico, se propone el uso de métodos basados en kernel los cuales usualmente presentan resultados precisos y robustos. Desafortunadamente, los métodos basados en kernel presentan una alta complejidad computacional que los hace inviables para grandes colecciones de datos. Este inconveniente implica uno de los desafíos más importantes abordados en esta tesis, que fue investigar alternativas para manejar conjuntos de datos de gran escala con modestas arquitecturas computacionales. En esta tesis, han sido propuestos, desarrollados y evaluados varios métodos kernelizados de semántica latente basados en factorización de matrices, donde, gracias a las capacidades no lineales de las representaciones basadas en kernels, los métodos propuestos pueden modelar las relaciones complejas entre las diferentes modalidades, lo que permite construir una representación multimodal enriquecida, incluso cuando una de las modalidades presenta datos incompletos. Además, los métodos propuestos han sido diseñaados bajo una arquitectura escalable basada en dos estrategias principales: el aprendizaje en línea y el aprendizaje bajo presupuesto que permiten preservar bajos requerimientos computacionales en términos de uso de memoria y tiempo de procesamiento. Una extensiva evaluación experimental muestra que las estrategias multimodales propuestas logran el estado del arte en varias tareas de análisis de datos, tales como la anotación multi-etiqueta y la clasificación multi-clase, así como la búsqueda y recuperación intermodal, y bajo diferentes configuraciones de aprendizaje, tales como aprendizaje supervisado, semisupervisado y transductivo. Además, gracias a las estrategias de aprendizaje en línea y de aprendizaje bajo presupuesto propuestas en esta tesis, se preservan las capacidades de escalabilidad, lo que permite tratar con colecciones multimodales de gran escala.Doctoradoapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de SistemasVanegas Ramírez, Jorge Andrés (2018) Large-scale non-linear multimodal semantic embedding. Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá.0 Generalidades / Computer science, information and general works62 Ingeniería y operaciones afines / EngineeringMulti-modal informationMultimodal Data AnalysisMachine LearningLatent semantic embeddingKernel methodsLarge-scale datasetsInformación multimodalAnálisis de datos multimodalesAprendizaje de máquinaIndexación semántica latenteMétodos del kernelConjuntos de datos a gran escalaLarge-scale non-linear multimodal semantic embeddingTrabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttp://purl.org/redcol/resource_type/TDORIGINALdoctoral-thesis-jorge.pdfapplication/pdf1315827https://repositorio.unal.edu.co/bitstream/unal/63954/1/doctoral-thesis-jorge.pdf99de1dd16e5fcaaaa0c6f929d0c43bb6MD51THUMBNAILdoctoral-thesis-jorge.pdf.jpgdoctoral-thesis-jorge.pdf.jpgGenerated Thumbnailimage/jpeg4172https://repositorio.unal.edu.co/bitstream/unal/63954/2/doctoral-thesis-jorge.pdf.jpgf1460fdfab2dcbdcdbb4700932f60926MD52unal/63954oai:repositorio.unal.edu.co:unal/639542023-04-24 23:06:02.086Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |