Dimension reduction methods analysis in multilingual context in the educational field

In recent years, technological solutions base in context-aware and attention natural language processing have been a relevant research area. Undoubtedly, the last discoveries and advances on this subject have revolutionized the current landscape and have allowed to build high- performance models wit...

Full description

Autores:
Sánchez Delgado, Juan Sebastián
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2023
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/74209
Acceso en línea:
https://hdl.handle.net/1992/74209
Palabra clave:
NLP
Reduction tecniques
PCA
TSNE
UMAP
Multilanguage
Ingeniería
Rights
embargoedAccess
License
Attribution-NonCommercial-NoDerivatives 4.0 International
Description
Summary:In recent years, technological solutions base in context-aware and attention natural language processing have been a relevant research area. Undoubtedly, the last discoveries and advances on this subject have revolutionized the current landscape and have allowed to build high- performance models with several advanced applications such as conversational agents, translators, etc. The growing necessities of more complex models have caused and staggering increase not only in features size but also in the number of embedding dimensions. The present document pretends to analyze different dimensional reduction techniques applied to various courses translations versions in order to determine which approach lead to obtain the best clustering results. In contrast to previous works, the main focus will be to encounter which technique is the best suited for a small number of dimensions (between 2 and 4), in such a way that through visualization methods can be proved whether the current translations have an optimal level of similarity in terms of semantics and grammar to be considered valid.