Dimension reduction methods analysis in multilingual context in the educational field
In recent years, technological solutions base in context-aware and attention natural language processing have been a relevant research area. Undoubtedly, the last discoveries and advances on this subject have revolutionized the current landscape and have allowed to build high- performance models wit...
- Autores:
-
Sánchez Delgado, Juan Sebastián
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2023
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/74209
- Acceso en línea:
- https://hdl.handle.net/1992/74209
- Palabra clave:
- NLP
Reduction tecniques
PCA
TSNE
UMAP
Multilanguage
Ingeniería
- Rights
- embargoedAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 International
Summary: | In recent years, technological solutions base in context-aware and attention natural language processing have been a relevant research area. Undoubtedly, the last discoveries and advances on this subject have revolutionized the current landscape and have allowed to build high- performance models with several advanced applications such as conversational agents, translators, etc. The growing necessities of more complex models have caused and staggering increase not only in features size but also in the number of embedding dimensions. The present document pretends to analyze different dimensional reduction techniques applied to various courses translations versions in order to determine which approach lead to obtain the best clustering results. In contrast to previous works, the main focus will be to encounter which technique is the best suited for a small number of dimensions (between 2 and 4), in such a way that through visualization methods can be proved whether the current translations have an optimal level of similarity in terms of semantics and grammar to be considered valid. |
---|