Análisis automático de documentos con contenido histórico en español
Documents written in ancient languages present several challenges when processing and extracting information from them, so it is necessary to develop specialized systems for their analysis. This project will be concerned with developing a tool that, through natural language processing techniques, al...
- Autores:
-
Ocampo Vargas, María José
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2020
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/51460
- Acceso en línea:
- http://hdl.handle.net/1992/51460
- Palabra clave:
- Procesamiento de lenguaje natural (Computación)
Español antiguo
Historia
Ingeniería
- Rights
- openAccess
- License
- http://creativecommons.org/licenses/by-nc-nd/4.0/
Summary: | Documents written in ancient languages present several challenges when processing and extracting information from them, so it is necessary to develop specialized systems for their analysis. This project will be concerned with developing a tool that, through natural language processing techniques, allows the historian to navigate more easily through the content of the book General and natural history of the Indies, islands and the mainland of the sea. ocean written by Gonzalo Fernández de Oviedo. With this in mind, the construction of: a dictionary between current Spanish and average Spanish to normalize spelling, a model for the recognition of 18 types of entities, a process that allows to extract the syntactic relationships between the entities and a Web page that allows you to view all the results obtained. As a result, 98.39% of volume two of the book written by Oviedo is corrected; Also, a model is successfully trained to recognize the types of entities proposed by the expert, from which a total of 19,496 entities are recognized; Finally, a complete Web page is developed that allows building a graph of relationships between entities and visualizing all the results obtained. |
---|