Análisis automático de documentos con contenido histórico en español

Documents written in ancient languages present several challenges when processing and extracting information from them, so it is necessary to develop specialized systems for their analysis. This project will be concerned with developing a tool that, through natural language processing techniques, al...

Full description

Autores:
Ocampo Vargas, María José
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2020
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
spa
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/51460
Acceso en línea:
http://hdl.handle.net/1992/51460
Palabra clave:
Procesamiento de lenguaje natural (Computación)
Español antiguo
Historia
Ingeniería
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-nd/4.0/
Description
Summary:Documents written in ancient languages present several challenges when processing and extracting information from them, so it is necessary to develop specialized systems for their analysis. This project will be concerned with developing a tool that, through natural language processing techniques, allows the historian to navigate more easily through the content of the book General and natural history of the Indies, islands and the mainland of the sea. ocean written by Gonzalo Fernández de Oviedo. With this in mind, the construction of: a dictionary between current Spanish and average Spanish to normalize spelling, a model for the recognition of 18 types of entities, a process that allows to extract the syntactic relationships between the entities and a Web page that allows you to view all the results obtained. As a result, 98.39% of volume two of the book written by Oviedo is corrected; Also, a model is successfully trained to recognize the types of entities proposed by the expert, from which a total of 19,496 entities are recognized; Finally, a complete Web page is developed that allows building a graph of relationships between entities and visualizing all the results obtained.