Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental

Actualmente, existen modelos de lenguaje integrados en sistemas que pueden superar las capacidades humanas en una variedad de pruebas. Sin embargo, ¿cómo podemos medir la coherencia de estos modelos? En este trabajo, proponemos un enfoque que utiliza la arquitectura de transformers para abordar el p...

Full description

Autores:
Sánchez Tovar, Edwin Alejandro
Tipo de recurso:
https://purl.org/coar/resource_type/c_7a1f
Fecha de publicación:
2024
Institución:
Universidad El Bosque
Repositorio:
Repositorio U. El Bosque
Idioma:
spa
OAI Identifier:
oai:repositorio.unbosque.edu.co:20.500.12495/13595
Acceso en línea:
https://hdl.handle.net/20.500.12495/13595
Palabra clave:
Axiomas e IA
Implicación lógica
IA en matemáticas
Aprendizaje automático
Aprendizaje profundo
Inteligencia artificial
Modelos de lenguaje
510
Axioms and AI
Logical implication
AI in mathematics
Machine learning
Deep learning
Artificial intelligence
Language model
Rights
openAccess
License
Attribution 4.0 International
id UNBOSQUE2_c344368f440204592b4d09e32c5dd4ae
oai_identifier_str oai:repositorio.unbosque.edu.co:20.500.12495/13595
network_acronym_str UNBOSQUE2
network_name_str Repositorio U. El Bosque
repository_id_str
dc.title.none.fl_str_mv Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
dc.title.translated.none.fl_str_mv Development of a model for measuring logical implication in elementary mathematics problems
title Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
spellingShingle Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
Axiomas e IA
Implicación lógica
IA en matemáticas
Aprendizaje automático
Aprendizaje profundo
Inteligencia artificial
Modelos de lenguaje
510
Axioms and AI
Logical implication
AI in mathematics
Machine learning
Deep learning
Artificial intelligence
Language model
title_short Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
title_full Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
title_fullStr Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
title_full_unstemmed Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
title_sort Desarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elemental
dc.creator.fl_str_mv Sánchez Tovar, Edwin Alejandro
dc.contributor.advisor.none.fl_str_mv González Galeano, Andrei Alain
dc.contributor.author.none.fl_str_mv Sánchez Tovar, Edwin Alejandro
dc.subject.none.fl_str_mv Axiomas e IA
Implicación lógica
IA en matemáticas
Aprendizaje automático
Aprendizaje profundo
Inteligencia artificial
Modelos de lenguaje
topic Axiomas e IA
Implicación lógica
IA en matemáticas
Aprendizaje automático
Aprendizaje profundo
Inteligencia artificial
Modelos de lenguaje
510
Axioms and AI
Logical implication
AI in mathematics
Machine learning
Deep learning
Artificial intelligence
Language model
dc.subject.ddc.none.fl_str_mv 510
dc.subject.keywords.none.fl_str_mv Axioms and AI
Logical implication
AI in mathematics
Machine learning
Deep learning
Artificial intelligence
Language model
description Actualmente, existen modelos de lenguaje integrados en sistemas que pueden superar las capacidades humanas en una variedad de pruebas. Sin embargo, ¿cómo podemos medir la coherencia de estos modelos? En este trabajo, proponemos un enfoque que utiliza la arquitectura de transformers para abordar el problema de la implicación lógica (IL), es decir, determinar qué oraciones se derivan de otras dentro de un texto. Esto se logra mediante el uso de su mecanismo de atención y predicción del siguiente token. Se encontró que, con un modelo muy simple basado en la arquitectura del transformer, es posible la identificación de la IL en problemas de conteo y probabilidad con una precisión del 60 % en una muestra de 95 ejercicios matemáticos de diversos temas. Este método podría contribuir a mejorar la precisión con la que se evalúa la coherencia de los modelos de lenguaje, proporcionando los datos necesarios para realizar un análisis detallado de sus errores y examinar la validez lógica de sus respuestas correctas.
publishDate 2024
dc.date.accessioned.none.fl_str_mv 2024-12-05T14:25:31Z
dc.date.available.none.fl_str_mv 2024-12-05T14:25:31Z
dc.date.issued.none.fl_str_mv 2024-11
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_7a1f
dc.type.local.spa.fl_str_mv Tesis/Trabajo de grado - Monografía - Pregrado
dc.type.coar.none.fl_str_mv https://purl.org/coar/resource_type/c_7a1f
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/bachelorThesis
dc.type.coarversion.none.fl_str_mv https://purl.org/coar/version/c_ab4af688f83e57aa
format https://purl.org/coar/resource_type/c_7a1f
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/20.500.12495/13595
dc.identifier.instname.spa.fl_str_mv instname:Universidad El Bosque
dc.identifier.reponame.spa.fl_str_mv reponame:Repositorio Institucional Universidad El Bosque
dc.identifier.repourl.none.fl_str_mv repourl:https://repositorio.unbosque.edu.co
url https://hdl.handle.net/20.500.12495/13595
identifier_str_mv instname:Universidad El Bosque
reponame:Repositorio Institucional Universidad El Bosque
repourl:https://repositorio.unbosque.edu.co
dc.language.iso.fl_str_mv spa
language spa
dc.relation.references.none.fl_str_mv Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate [arXiv:1409.0473]. Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1409.047
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3,1137-1155.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146. https://doi.org/10.1162/tacl a 00051
Bos, J., Markert, K., & Van Noord, G. (2014). Logical Natural Language Inference. Journal of Logic, Language and Information, 23 (4), 431-445
Bottou, L. (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. En Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT 2010) (pp. 177-186).
Chang, R., & Jungnickel, D. (2008). Matematicas para Ciencias de la Computacion. McGraw-Hill.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002).SMOTE: Synthetic minority over-sampling technique. Journal of artificial intelligence research, 321-357.
Chen, S. F., & Goodman, J. (1999). An Empirical Study of Smoothing Techniques for Language Modeling. Computer Speech & Language, 13 (4),359-394.
de Moura, L., & Ullrich, S. (2023). The Lean Theorem Prover
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. https://arxiv.org/abs/1810.04805
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159
Elkan, C. (2001). The foundations of cost-sensitive learning. Proceedings of the 17th international joint conference on Artificial intelligence, 2,973-978.
Euclides. (1956). The Thirteen Books of Euclid’s Elements (T. L. Heath, Ed.;2nd) [Originally published in 1908]. Dover Publications.
Goldberg, Y. (2016). A Primer on Neural Network Models for Natural Language Processing. Journal of Artificial Intelligence Research, 57, 345-420.https://doi.org/10.1613/jair.4992
Gauss, C. F. (1809). Theoria Motus Corporum Coelestium in Sectionibus Conicis Solum. Publisher Name
Goodfellow, I., Bengio, Y., & Courville, A. (2013). Dropout Training as Adaptive Regularization. MIT Press.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
He, H., & Bai, Y. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322-1328.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21 (9), 1263-1284
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). Measuring Mathematical Problem Solving With the MATH Dataset. arXiv preprint arXiv:2103.03874.
Hernandez Sampieri, R., Fernandez Collado, C., & Baptista Lucio, P. (2018). Metodologıa de la investigacion (6th). McGraw-Hill Education
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 192-200
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9 (8), 1735-1780.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on machine learning, 448-456
Jina Development Team. (2020). Jina: An Open-Source Neural Search Framework.
Jurafsky, D., & Martin, J. H. (2008). Speech and Language Processing (2nd). Pearson Prentice Hall
Karpathy, A. (2023). NanoGPT
Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR)
Kudo, T., & Richardson, J. (2018). SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing. arXiv preprint arXiv:1808.06226. https://arxiv.org/abs/ 1808.06226
Kukar, M., & Kononenko, I. (1998). Cost-sensitive learning with neural networks. European conference on machine learning, 445-452.
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. The MIT Press.
McCulloch, W., & Pitts, W. (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. https://arxiv.org/abs/1301.3781
Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),1532-1543. https://doi.org/10.3115/v1/D14-1162
Powers, D. M. W. (2011). Model Evaluation: From Precision, Recall and FMeasure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2 (1). https://doi.org/10.1.1.189.2560
Price, C. (2023). g4dn.xlarge - AWS EC2 Instance Prices [n.d.]. https : / /cloudprice.net/aws/ec2/instances/g4dn.xlarge
Rawte, V., Islam Tonmoy, S. M. T., Chadha, A., & Sheth, A. (2024). FACtual enTailment fOr hallucInation Detection [Preprint]. https://doi.org/10.13140/RG.2.2.24327.82080
Rocktaschel, T., Grefenstette, E., Hermann, K. M., Koˇcisk´y, T., Blunsom, P., & de Freitas, N. (2015). Reasoning About Entailment with Neural Attention. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), 632-642.
Rojo, A. (2012). Algebra II. Editorial Universitaria.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536. https ://doi.org/10.1038/323533a0
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715-1725. https://doi.org/10.18653/v1/P16-1162
Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27 (3), 379-423
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
Tieleman, T., & Haffner, P. (2012). Lecture 6.5 - RMSProp: Divide the Gradient by a Running Average of its Recent Magnitude. Neural Networks for Machine Learning.
Van den Bosch, A. (2013). A survey of stochastic methods for optimization. Journal of Machine Learning Research, 16, 123-133.
Zhou, Z.-H., & Liu, X.-Y. (2006). Multi-class cost-sensitive neural networks with softmax loss. Neurocomputing, 69 (16-18), 2415-2418
Vapnik, V. N. (1998). Statistical Learning Theory. Wiley.
Vaswani, A., Shard, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Klatz, H., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems, 30.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NeurIPS).
Wang, W., Lan, Z., Tan, W., Li, M., Tur, D., & Liu, P. F. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Findings of EMNLP.
Werbos, P. J. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. ProQuest.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., . . . Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144. https://arxiv.org/abs/1609.08144
Yuan, Y., Liu, X., Dikubab, W., Liu, H., Ji, Z., Wu, Z., & Bai, X. (2022). Syntax-Aware Network for Handwritten Mathematical Expression Recognition. arXiv preprint arXiv:2203.01601.
Fraleigh, J. B. (2003). A First Course in Abstract Algebra. Addison-Wesley.
Manning, C. D., & Sch¨utze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
Zhou, Z.-H., & Liu, X.-Y. (2006). Multi-class cost-sensitive neural networks with softmax loss. Neurocomputing, 69 (16-18), 2415-2418.
dc.rights.en.fl_str_mv Attribution 4.0 International
dc.rights.uri.none.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.local.spa.fl_str_mv Acceso abierto
dc.rights.accessrights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
Acceso abierto
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.program.spa.fl_str_mv Matemáticas
dc.publisher.grantor.spa.fl_str_mv Universidad El Bosque
dc.publisher.faculty.spa.fl_str_mv Facultad de Ciencias
institution Universidad El Bosque
bitstream.url.fl_str_mv https://repositorio.unbosque.edu.co/bitstreams/c67ffc06-d933-4873-a241-2cb4f22ea4cb/download
https://repositorio.unbosque.edu.co/bitstreams/645ea774-75ef-468d-a30f-6f50de54499c/download
https://repositorio.unbosque.edu.co/bitstreams/fe8ab0b3-dcb7-45bd-b4b6-32b02541b64c/download
https://repositorio.unbosque.edu.co/bitstreams/ce20f94b-cc8d-46c6-b0d0-5de0e4690556/download
https://repositorio.unbosque.edu.co/bitstreams/268f8847-d78b-4839-907c-bbb456c6e73d/download
https://repositorio.unbosque.edu.co/bitstreams/cb055def-f8c8-45c3-a1c4-1f6939a51983/download
https://repositorio.unbosque.edu.co/bitstreams/0fda4c5a-b2e3-41a5-81b3-cd58ee06841f/download
bitstream.checksum.fl_str_mv 17cc15b951e7cc6b3728a574117320f9
80a6ec1d0ca4720fbeccf33b9c4a71e7
b0e767d8d958e08b7bcbfbc9242bc5d0
313ea3fe4cd627df823c57a0f12776e5
0092d65aa8f8d560e3203d30d45750dd
5f5cc2bcfe890cb9b2cff2ff12c839de
8ea7bf5df137574032692268fb2cc841
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional Universidad El Bosque
repository.mail.fl_str_mv bibliotecas@biteca.com
_version_ 1828164492975931392
spelling González Galeano, Andrei AlainSánchez Tovar, Edwin Alejandro2024-12-05T14:25:31Z2024-12-05T14:25:31Z2024-11https://hdl.handle.net/20.500.12495/13595instname:Universidad El Bosquereponame:Repositorio Institucional Universidad El Bosquerepourl:https://repositorio.unbosque.edu.coActualmente, existen modelos de lenguaje integrados en sistemas que pueden superar las capacidades humanas en una variedad de pruebas. Sin embargo, ¿cómo podemos medir la coherencia de estos modelos? En este trabajo, proponemos un enfoque que utiliza la arquitectura de transformers para abordar el problema de la implicación lógica (IL), es decir, determinar qué oraciones se derivan de otras dentro de un texto. Esto se logra mediante el uso de su mecanismo de atención y predicción del siguiente token. Se encontró que, con un modelo muy simple basado en la arquitectura del transformer, es posible la identificación de la IL en problemas de conteo y probabilidad con una precisión del 60 % en una muestra de 95 ejercicios matemáticos de diversos temas. Este método podría contribuir a mejorar la precisión con la que se evalúa la coherencia de los modelos de lenguaje, proporcionando los datos necesarios para realizar un análisis detallado de sus errores y examinar la validez lógica de sus respuestas correctas.MatemáticoPregradoToday, there are language models built into systems that can outperform human capabilities in a variety of tests. However, how can we measure the coherence of these models? In this work, we propose an approach that uses the transformer architecture to address the problem of logical implication (LI), that is, determining which sentences are derived from others within a text. This is achieved by using its attention mechanism and predicting the next token. It was found that, with a very simple model based on the transformer architecture, the identification of IL in counting and probability problems is possible with an accuracy of 60% in a sample of 95 mathematical exercises on various topics. This method could help improve the precision with which the consistency of language models is evaluated, providing the data necessary to perform a detailed analysis of their errors and examine the logical validity of their correct answers.application/pdfAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/Acceso abiertoinfo:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Axiomas e IAImplicación lógicaIA en matemáticasAprendizaje automáticoAprendizaje profundoInteligencia artificialModelos de lenguaje510Axioms and AILogical implicationAI in mathematicsMachine learningDeep learningArtificial intelligenceLanguage modelDesarrollo de un modelo para la medición de la implicación lógica en problemas de matemática elementalDevelopment of a model for measuring logical implication in elementary mathematics problemsMatemáticasUniversidad El BosqueFacultad de CienciasTesis/Trabajo de grado - Monografía - Pregradohttps://purl.org/coar/resource_type/c_7a1fhttp://purl.org/coar/resource_type/c_7a1finfo:eu-repo/semantics/bachelorThesishttps://purl.org/coar/version/c_ab4af688f83e57aaBahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate [arXiv:1409.0473]. Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1409.047Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3,1137-1155.Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146. https://doi.org/10.1162/tacl a 00051Bos, J., Markert, K., & Van Noord, G. (2014). Logical Natural Language Inference. Journal of Logic, Language and Information, 23 (4), 431-445Bottou, L. (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. En Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT 2010) (pp. 177-186).Chang, R., & Jungnickel, D. (2008). Matematicas para Ciencias de la Computacion. McGraw-Hill.Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002).SMOTE: Synthetic minority over-sampling technique. Journal of artificial intelligence research, 321-357.Chen, S. F., & Goodman, J. (1999). An Empirical Study of Smoothing Techniques for Language Modeling. Computer Speech & Language, 13 (4),359-394.de Moura, L., & Ullrich, S. (2023). The Lean Theorem ProverDevlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. https://arxiv.org/abs/1810.04805Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121-2159Elkan, C. (2001). The foundations of cost-sensitive learning. Proceedings of the 17th international joint conference on Artificial intelligence, 2,973-978.Euclides. (1956). The Thirteen Books of Euclid’s Elements (T. L. Heath, Ed.;2nd) [Originally published in 1908]. Dover Publications.Goldberg, Y. (2016). A Primer on Neural Network Models for Natural Language Processing. Journal of Artificial Intelligence Research, 57, 345-420.https://doi.org/10.1613/jair.4992Gauss, C. F. (1809). Theoria Motus Corporum Coelestium in Sectionibus Conicis Solum. Publisher NameGoodfellow, I., Bengio, Y., & Courville, A. (2013). Dropout Training as Adaptive Regularization. MIT Press.Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.He, H., & Bai, Y. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322-1328.He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21 (9), 1263-1284Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). Measuring Mathematical Problem Solving With the MATH Dataset. arXiv preprint arXiv:2103.03874.Hernandez Sampieri, R., Fernandez Collado, C., & Baptista Lucio, P. (2018). Metodologıa de la investigacion (6th). McGraw-Hill EducationHinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 192-200Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9 (8), 1735-1780.Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on machine learning, 448-456Jina Development Team. (2020). Jina: An Open-Source Neural Search Framework.Jurafsky, D., & Martin, J. H. (2008). Speech and Language Processing (2nd). Pearson Prentice HallKarpathy, A. (2023). NanoGPTKingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR)Kudo, T., & Richardson, J. (2018). SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing. arXiv preprint arXiv:1808.06226. https://arxiv.org/abs/ 1808.06226Kukar, M., & Kononenko, I. (1998). Cost-sensitive learning with neural networks. European conference on machine learning, 445-452.Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. The MIT Press.McCulloch, W., & Pitts, W. (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity.Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. https://arxiv.org/abs/1301.3781Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),1532-1543. https://doi.org/10.3115/v1/D14-1162Powers, D. M. W. (2011). Model Evaluation: From Precision, Recall and FMeasure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2 (1). https://doi.org/10.1.1.189.2560Price, C. (2023). g4dn.xlarge - AWS EC2 Instance Prices [n.d.]. https : / /cloudprice.net/aws/ec2/instances/g4dn.xlargeRawte, V., Islam Tonmoy, S. M. T., Chadha, A., & Sheth, A. (2024). FACtual enTailment fOr hallucInation Detection [Preprint]. https://doi.org/10.13140/RG.2.2.24327.82080Rocktaschel, T., Grefenstette, E., Hermann, K. M., Koˇcisk´y, T., Blunsom, P., & de Freitas, N. (2015). Reasoning About Entailment with Neural Attention. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), 632-642.Rojo, A. (2012). Algebra II. Editorial Universitaria.Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review.Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536. https ://doi.org/10.1038/323533a0Sennrich, R., Haddow, B., & Birch, A. (2016). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715-1725. https://doi.org/10.18653/v1/P16-1162Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27 (3), 379-423Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.Tieleman, T., & Haffner, P. (2012). Lecture 6.5 - RMSProp: Divide the Gradient by a Running Average of its Recent Magnitude. Neural Networks for Machine Learning.Van den Bosch, A. (2013). A survey of stochastic methods for optimization. Journal of Machine Learning Research, 16, 123-133.Zhou, Z.-H., & Liu, X.-Y. (2006). Multi-class cost-sensitive neural networks with softmax loss. Neurocomputing, 69 (16-18), 2415-2418Vapnik, V. N. (1998). Statistical Learning Theory. Wiley.Vaswani, A., Shard, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Klatz, H., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems, 30.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NeurIPS).Wang, W., Lan, Z., Tan, W., Li, M., Tur, D., & Liu, P. F. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Findings of EMNLP.Werbos, P. J. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. ProQuest.Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., . . . Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144. https://arxiv.org/abs/1609.08144Yuan, Y., Liu, X., Dikubab, W., Liu, H., Ji, Z., Wu, Z., & Bai, X. (2022). Syntax-Aware Network for Handwritten Mathematical Expression Recognition. arXiv preprint arXiv:2203.01601.Fraleigh, J. B. (2003). A First Course in Abstract Algebra. Addison-Wesley.Manning, C. D., & Sch¨utze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.Zhou, Z.-H., & Liu, X.-Y. (2006). Multi-class cost-sensitive neural networks with softmax loss. Neurocomputing, 69 (16-18), 2415-2418.spaLICENSElicense.txtlicense.txttext/plain; charset=utf-82000https://repositorio.unbosque.edu.co/bitstreams/c67ffc06-d933-4873-a241-2cb4f22ea4cb/download17cc15b951e7cc6b3728a574117320f9MD51Anexo 1 Acta de aprobacion.pdfapplication/pdf196447https://repositorio.unbosque.edu.co/bitstreams/645ea774-75ef-468d-a30f-6f50de54499c/download80a6ec1d0ca4720fbeccf33b9c4a71e7MD57Carta de autorizacion.pdfapplication/pdf157106https://repositorio.unbosque.edu.co/bitstreams/fe8ab0b3-dcb7-45bd-b4b6-32b02541b64c/downloadb0e767d8d958e08b7bcbfbc9242bc5d0MD58CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-81019https://repositorio.unbosque.edu.co/bitstreams/ce20f94b-cc8d-46c6-b0d0-5de0e4690556/download313ea3fe4cd627df823c57a0f12776e5MD52ORIGINALTrabajo de grado.pdfTrabajo de grado.pdfapplication/pdf833461https://repositorio.unbosque.edu.co/bitstreams/268f8847-d78b-4839-907c-bbb456c6e73d/download0092d65aa8f8d560e3203d30d45750ddMD53TEXTTrabajo de grado.pdf.txtTrabajo de grado.pdf.txtExtracted texttext/plain100669https://repositorio.unbosque.edu.co/bitstreams/cb055def-f8c8-45c3-a1c4-1f6939a51983/download5f5cc2bcfe890cb9b2cff2ff12c839deMD59THUMBNAILTrabajo de grado.pdf.jpgTrabajo de grado.pdf.jpgGenerated Thumbnailimage/jpeg3536https://repositorio.unbosque.edu.co/bitstreams/0fda4c5a-b2e3-41a5-81b3-cd58ee06841f/download8ea7bf5df137574032692268fb2cc841MD51020.500.12495/13595oai:repositorio.unbosque.edu.co:20.500.12495/135952024-12-06 03:04:31.089http://creativecommons.org/licenses/by/4.0/Attribution 4.0 Internationalopen.accesshttps://repositorio.unbosque.edu.coRepositorio Institucional Universidad El Bosquebibliotecas@biteca.comTGljZW5jaWEgZGUgRGlzdHJpYnVjacOzbiBObyBFeGNsdXNpdmEKClBhcmEgcXVlIGVsIFJlcG9zaXRvcmlvIGRlIGxhIFVuaXZlcnNpZGFkIEVsIEJvc3F1ZSBhIHB1ZWRhIHJlcHJvZHVjaXIgeSBjb211bmljYXIgcMO6YmxpY2FtZW50ZSBzdSBkb2N1bWVudG8gZXMgbmVjZXNhcmlvIGxhIGFjZXB0YWNpw7NuIGRlIGxvcyBzaWd1aWVudGVzIHTDqXJtaW5vcy4gUG9yIGZhdm9yLCBsZWEgbGFzIHNpZ3VpZW50ZXMgY29uZGljaW9uZXMgZGUgbGljZW5jaWE6CgoxLiBBY2VwdGFuZG8gZXN0YSBsaWNlbmNpYSwgdXN0ZWQgKGVsIGF1dG9yL2VzIG8gZWwgcHJvcGlldGFyaW8vcyBkZSBsb3MgZGVyZWNob3MgZGUgYXV0b3IpIGdhcmFudGl6YSBhIGxhIFVuaXZlcnNpZGFkIEVsIEJvc3F1ZSBlbCBkZXJlY2hvIG5vIGV4Y2x1c2l2byBkZSBhcmNoaXZhciwgcmVwcm9kdWNpciwgY29udmVydGlyIChjb21vIHNlIGRlZmluZSBtw6FzIGFiYWpvKSwgY29tdW5pY2FyIHkvbyBkaXN0cmlidWlyIHN1IGRvY3VtZW50byBtdW5kaWFsbWVudGUgZW4gZm9ybWF0byBlbGVjdHLDs25pY28uCgoyLiBUYW1iacOpbiBlc3TDoSBkZSBhY3VlcmRvIGNvbiBxdWUgbGEgVW5pdmVyc2lkYWQgRWwgQm9zcXVlIHB1ZWRhIGNvbnNlcnZhciBtw6FzIGRlIHVuYSBjb3BpYSBkZSBlc3RlIGRvY3VtZW50byB5LCBzaW4gYWx0ZXJhciBzdSBjb250ZW5pZG8sIGNvbnZlcnRpcmxvIGEgY3VhbHF1aWVyIGZvcm1hdG8gZGUgZmljaGVybywgbWVkaW8gbyBzb3BvcnRlLCBwYXJhIHByb3DDs3NpdG9zIGRlIHNlZ3VyaWRhZCwgcHJlc2VydmFjacOzbiB5IGFjY2Vzby4KCjMuIERlY2xhcmEgcXVlIGVsIGRvY3VtZW50byBlcyB1biB0cmFiYWpvIG9yaWdpbmFsIHN1eW8geS9vIHF1ZSB0aWVuZSBlbCBkZXJlY2hvIHBhcmEgb3RvcmdhciBsb3MgZGVyZWNob3MgY29udGVuaWRvcyBlbiBlc3RhIGxpY2VuY2lhLiBUYW1iacOpbiBkZWNsYXJhIHF1ZSBzdSBkb2N1bWVudG8gbm8gaW5mcmluZ2UsIGVuIHRhbnRvIGVuIGN1YW50byBsZSBzZWEgcG9zaWJsZSBzYWJlciwgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIGRlIG5pbmd1bmEgb3RyYSBwZXJzb25hIG8gZW50aWRhZC4KCjQuIFNpIGVsIGRvY3VtZW50byBjb250aWVuZSBtYXRlcmlhbGVzIGRlIGxvcyBjdWFsZXMgbm8gdGllbmUgbG9zIGRlcmVjaG9zIGRlIGF1dG9yLCBkZWNsYXJhIHF1ZSBoYSBvYnRlbmlkbyBlbCBwZXJtaXNvIHNpbiByZXN0cmljY2nDs24gZGVsIHByb3BpZXRhcmlvIGRlIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBwYXJhIG90b3JnYXIgYSBsYSBVbml2ZXJzaWRhZCBFbCBCb3NxdWUgbG9zIGRlcmVjaG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgbGljZW5jaWEsIHkgcXVlIGVzZSBtYXRlcmlhbCBjdXlvcyBkZXJlY2hvcyBzb24gZGUgdGVyY2Vyb3MgZXN0w6EgY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8geSByZWNvbm9jaWRvIGVuIGVsIHRleHRvIG8gY29udGVuaWRvIGRlbCBkb2N1bWVudG8gZW50cmVnYWRvLgoKNS4gU2kgZWwgZG9jdW1lbnRvIHNlIGJhc2EgZW4gdW5hIG9icmEgcXVlIGhhIHNpZG8gcGF0cm9jaW5hZGEgbyBhcG95YWRhIHBvciB1bmEgYWdlbmNpYSB1IG9yZ2FuaXphY2nDs24gZGlmZXJlbnRlIGRlIGxhIFVuaXZlcnNpZGFkIEVsIEJvc3F1ZSwgc2UgcHJlc3Vwb25lIHF1ZSBzZSBoYSBjdW1wbGlkbyBjb24gY3VhbHF1aWVyIGRlcmVjaG8gZGUgcmV2aXNpw7NuIHUgb3RyYXMgb2JsaWdhY2lvbmVzIHJlcXVlcmlkYXMgcG9yIGVzdGUgY29udHJhdG8gbyBhY3VlcmRvLgoKNi4gVW5pdmVyc2lkYWQgRWwgQm9zcXVlIGlkZW50aWZpY2Fyw6EgY2xhcmFtZW50ZSBzdS9zIG5vbWJyZS9zIGNvbW8gZWwvbG9zIGF1dG9yL2VzIG8gcHJvcGlldGFyaW8vcyBkZSBsb3MgZGVyZWNob3MgZGVsIGRvY3VtZW50bywgeSBubyBoYXLDoSBuaW5ndW5hIGFsdGVyYWNpw7NuIGRlIHN1IGRvY3VtZW50byBkaWZlcmVudGUgYSBsYXMgcGVybWl0aWRhcyBlbiBlc3RhIGxpY2VuY2lhLgo=