Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina

La extracción automática de información de documentos de identidad es una tarea fundamental en diferentes procesos digitales como registros, solicitud de productos, validación de identidad, entre otros. La extracción de información consiste en la identificación, ubicación, clasificación y reconocimi...

Full description

Autores:
Márquez Aristizábal, Hugo Alejandro
Tipo de recurso:
Fecha de publicación:
2022
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/82000
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/82000
https://repositorio.unal.edu.co/
Palabra clave:
000 - Ciencias de la computación, información y obras generales::003 - Sistemas
Identidad digital
Reconocimiento óptico de caracteres
Redes neuronales (Computadoers)
Ientidad digital
OCR
Extracción de información
Detección de objetos
Digital identity
Information extraction
Object detection
Rights
openAccess
License
Reconocimiento 4.0 Internacional
id UNACIONAL2_a5ef20270ff23252f268e25a89d661f9
oai_identifier_str oai:repositorio.unal.edu.co:unal/82000
network_acronym_str UNACIONAL2
network_name_str Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
dc.title.translated.eng.fl_str_mv Information extraction from identification documents using machine learning techniques
title Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
spellingShingle Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
000 - Ciencias de la computación, información y obras generales::003 - Sistemas
Identidad digital
Reconocimiento óptico de caracteres
Redes neuronales (Computadoers)
Ientidad digital
OCR
Extracción de información
Detección de objetos
Digital identity
Information extraction
Object detection
title_short Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
title_full Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
title_fullStr Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
title_full_unstemmed Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
title_sort Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
dc.creator.fl_str_mv Márquez Aristizábal, Hugo Alejandro
dc.contributor.advisor.none.fl_str_mv Villa Garzón, Fernán Alonso
dc.contributor.author.none.fl_str_mv Márquez Aristizábal, Hugo Alejandro
dc.subject.ddc.spa.fl_str_mv 000 - Ciencias de la computación, información y obras generales::003 - Sistemas
topic 000 - Ciencias de la computación, información y obras generales::003 - Sistemas
Identidad digital
Reconocimiento óptico de caracteres
Redes neuronales (Computadoers)
Ientidad digital
OCR
Extracción de información
Detección de objetos
Digital identity
Information extraction
Object detection
dc.subject.other.none.fl_str_mv Identidad digital
Reconocimiento óptico de caracteres
dc.subject.lemb.none.fl_str_mv Redes neuronales (Computadoers)
dc.subject.proposal.spa.fl_str_mv Ientidad digital
OCR
Extracción de información
Detección de objetos
dc.subject.proposal.eng.fl_str_mv Digital identity
Information extraction
Object detection
description La extracción automática de información de documentos de identidad es una tarea fundamental en diferentes procesos digitales como registros, solicitud de productos, validación de identidad, entre otros. La extracción de información consiste en la identificación, ubicación, clasificación y reconocimiento del texto de campos clave presentes en un documento, en este caso un documento de identidad. Tratándose de documentos de identidad, los campos clave son aquellos como: nombres, apellidos, números de documento, fechas, entre otros. El problema de extracción de información se ha solucionado tradicionalmente utilizando algoritmos basados en reglas y motores clásicos de OCR. En los últimos años se han realizado implementaciones de modelos de aprendizaje de máquina, utilizando modelos de NLP (procesamiento de lenguaje natural) y CV (visión por computador) para solucionar el problema de una manera más flexible y eficiente (Subramani et al., 2020). En este trabajo se propuso solucionar el problema de extracción de información con una aproximación de detección de objetos. Se implementó, entrenó y evaluó un modelo de detección de objetos basado en transformadores (Carion et al., 2020). Se logró llegar a una solución que alcanza valores de precisión superiores al 95% en la detección de campos clave en documentos de identidad. (Texto tomado de la fuente)
publishDate 2022
dc.date.accessioned.none.fl_str_mv 2022-08-22T21:32:48Z
dc.date.available.none.fl_str_mv 2022-08-22T21:32:48Z
dc.date.issued.none.fl_str_mv 2022-06-23
dc.type.spa.fl_str_mv Trabajo de grado - Maestría
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/masterThesis
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TM
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://repositorio.unal.edu.co/handle/unal/82000
dc.identifier.instname.spa.fl_str_mv Universidad Nacional de Colombia
dc.identifier.reponame.spa.fl_str_mv Repositorio Institucional Universidad Nacional de Colombia
dc.identifier.repourl.spa.fl_str_mv https://repositorio.unal.edu.co/
url https://repositorio.unal.edu.co/handle/unal/82000
https://repositorio.unal.edu.co/
identifier_str_mv Universidad Nacional de Colombia
Repositorio Institucional Universidad Nacional de Colombia
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.references.spa.fl_str_mv Al-Badr, B., & Mahmoud, S. A. (1995). Survey and bibliography of Arabic optical text recognition. Signal Processing, 41(1), 49–77. https://doi.org/10.1016/0165-1684(94)00090-M
Amin, A., & Shiu, R. (2001). Page Segmentation and Classification utilizing Bottom-up Approach. International Journal of Image and Graphics, 01(02), 345–361. https://doi.org/10.1142/S0219467801000219
Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding (arXiv:2106.11539). arXiv. http://arxiv.org/abs/2106.11539
Arlazarov, V. V., Bulatov, K., Chernov, T., & Arlazarov, V. L. (2019). MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics, 43(5), 818–824. https://doi.org/10.18287/2412-6179-2019-43-5-818-824
Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473
Bajaj, R., Dey, L., & Chaudhury, S. (2002). Devnagari numeral recognition by combining decision of multiple connectionist classifiers. Sadhana, 27(1), 59–72. https://doi.org/10.1007/BF02703312
Bello, I., Zoph, B., Vaswani, A., Shlens, J., & Le, Q. V. (2020). Attention Augmented Convolutional Networks. ArXiv:1904.09925 [Cs]. http://arxiv.org/abs/1904.09925
Bertolotti, M. (2011). Optical Pattern Recognition, edited by F.T.S Yu and S. Jutamulia.
Bhavani, S., & Thanushkodi, D. K. (2010). A Survey On Coding Algorithms In Medical Image Compression. 02(05), 7.
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. ArXiv:2005.12872 [Cs]. http://arxiv.org/abs/2005.12872
Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., & Ochoa, M. (2020). Machine Learning Techniques for Identity Document Verification in Uncontrolled Environments: A Case Study. In K. M. Figueroa Mora, J. Anzurez Marín, J. Cerda, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, & J. A. Olvera-López (Eds.), Pattern Recognition (Vol. 12088, pp. 271–281). Springer International Publishing. https://doi.org/10.1007/978-3-030-49076-8_26
Chan, W., Saharia, C., Hinton, G., Norouzi, M., & Jaitly, N. (2020). Imputer: Sequence Modelling via Imputation and Dynamic Programming. ArXiv:2002.08926 [Cs, Eess]. http://arxiv.org/abs/2002.08926
Chaudhuri, A., Mandaviya, K., Badelia, P., & K Ghosh, S. (2017). Optical Character Recognition Systems for Different Languages with Soft Computing (Vol. 352). Springer International Publishing. https://doi.org/10.1007/978-3-319-50252-6
Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1, 886–893. https://doi.org/10.1109/CVPR.2005.177
De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic Instance Segmentation with a Discriminative Loss Function. ArXiv:1708.02551 [Cs]. http://arxiv.org/abs/1708.02551
Delteil, T., Belval, E., Chen, L., Goncalves, L., & Mahadevan, V. (2022). MATrIX -- Modality-Aware Transformer for Information eXtraction (arXiv:2205.08094). arXiv. http://arxiv.org/abs/2205.08094
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). DSSD: Deconvolutional Single Shot Detector. ArXiv:1701.06659 [Cs]. http://arxiv.org/abs/1701.06659
Ghazvininejad, M., Levy, O., Liu, Y., & Zettlemoyer, L. (2019). Mask-Predict: Parallel Decoding of Conditional Masked Language Models. ArXiv:1904.09324 [Cs, Stat]. http://arxiv.org/abs/1904.09324
Girshick, R. (2015). Fast R-CNN. ArXiv:1504.08083 [Cs]. http://arxiv.org/abs/1504.08083
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv:1311.2524 [Cs]. http://arxiv.org/abs/1311.2524
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. 8.
Graves, A., & Schmidhuber, J. (2007). Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. In Artificial Neural Networks – ICANN 2007 (Vol. 4668, pp. 549–558). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-74690-4_56
Gu, J., Bradbury, J., Xiong, C., Li, V. O. K., & Socher, R. (2018). Non-Autoregressive Neural Machine Translation. ArXiv:1711.02281 [Cs]. http://arxiv.org/abs/1711.02281
Gu, J., Kuen, J., Morariu, V. I., Zhao, H., Barmpalios, N., Jain, R., Nenkova, A., & Sun, T. (2022). Unified Pretraining Framework for Document Understanding (arXiv:2204.10939). arXiv. http://arxiv.org/abs/2204.10939
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2018). Mask R-CNN. ArXiv:1703.06870 [Cs]. http://arxiv.org/abs/1703.06870
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385
Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022). LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking (arXiv:2204.08387). arXiv. http://arxiv.org/abs/2204.08387
Islam, N., Islam, Z., & Noor, N. (2016). A Survey on Optical Character Recognition System. Journal of Information, 10(2), 4.
Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., & Qu, R. (2019). A Survey of Deep Learning-Based Object Detection. IEEE Access, 7, 128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201
Katti, A. R., Reisswig, C., Guder, C., Brarda, S., Bickel, S., Höhne, J., & Faddoul, J. B. (2018). Chargrid: Towards Understanding 2D Documents. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4459–4469. https://doi.org/10.18653/v1/D18-1476
Kim, G., Hong, T., Yim, M., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., & Park, S. (2021). Donut: Document Understanding Transformer without OCR (arXiv:2111.15664). arXiv. http://arxiv.org/abs/2111.15664
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
Le, A. D., Pham, D. V., & Nguyen, T. A. (2019). Deep Learning Approach for Receipt Recognition. In T. K. Dang, J. Küng, M. Takizawa, & S. H. Bui (Eds.), Future Data and Security Engineering (Vol. 11814, pp. 705–712). Springer International Publishing. https://doi.org/10.1007/978-3-030-35653-8_50
Lebourgeois, F., Bublinski, Z., & Emptoz, H. (1992). A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents. Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems, 272–276. https://doi.org/10.1109/ICPR.1992.201771
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Li, X., Zheng, Y., Hu, Y., Cao, H., Wu, Y., Jiang, D., Liu, Y., & Ren, B. (2022). Relational Representation Learning in Visually-Rich Documents (arXiv:2205.02411). arXiv. http://arxiv.org/abs/2205.02411
Li, Y., Qian, Y., Yu, Y., Qin, X., Zhang, C., Liu, Y., Yao, K., Han, J., Liu, J., & Ding, E. (2021). StrucTexT: Structured Text Understanding with Multi-Modal Transformers (arXiv:2108.02923). arXiv. http://arxiv.org/abs/2108.02923
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. ArXiv:1612.03144 [Cs]. http://arxiv.org/abs/1612.03144
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal Loss for Dense Object Detection. ArXiv:1708.02002 [Cs]. http://arxiv.org/abs/1708.02002
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollár, P. (2015). Microsoft COCO: Common Objects in Context. ArXiv:1405.0312 [Cs]. http://arxiv.org/abs/1405.0312
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision, 128(2), 261–318. https://doi.org/10.1007/s11263-019-01247-4
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. ArXiv:1512.02325 [Cs], 9905, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. ArXiv:1411.4038 [Cs]. http://arxiv.org/abs/1411.4038
Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. https://doi.org/10.1016/j.imavis.2004.02.006
Mori, S., Nishida, H., & Yamada, H. (1999). Optical Character Recognition.
Namysl, M., & Konya, I. (2019). Efficient, Lexicon-Free OCR using Deep Learning. ArXiv:1906.01969 [Cs]. http://arxiv.org/abs/1906.01969
Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091
Oord, A. van den, Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G. van den, Lockhart, E., Cobo, L. C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., … Hassabis, D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis. ArXiv:1711.10433 [Cs]. http://arxiv.org/abs/1711.10433
Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130
Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., & Tran, D. (2018). Image Transformer. ArXiv:1802.05751 [Cs]. http://arxiv.org/abs/1802.05751
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. ArXiv:1802.05365 [Cs]. http://arxiv.org/abs/1802.05365
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. ArXiv:1506.02640 [Cs]. http://arxiv.org/abs/1506.02640
Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. ArXiv:1612.08242 [Cs]. http://arxiv.org/abs/1612.08242
Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. ArXiv:1804.02767 [Cs]. http://arxiv.org/abs/1804.02767
Ren, M., & Zemel, R. S. (2017). End-to-End Instance Segmentation with Recurrent Attention. ArXiv:1605.09410 [Cs]. http://arxiv.org/abs/1605.09410
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ArXiv:1506.01497 [Cs]. http://arxiv.org/abs/1506.01497
Rezatofighi, S. H., G, V. K. B., Milan, A., Abbasnejad, E., Dick, A., & Reid, I. (2017). DeepSetNet: Predicting Sets with Deep Neural Networks. ArXiv:1611.08998 [Cs]. http://arxiv.org/abs/1611.08998
Romera-Paredes, B., & Torr, P. H. S. (2016). Recurrent Instance Segmentation. ArXiv:1511.08250 [Cs]. http://arxiv.org/abs/1511.08250
Rothe, R., Guillaumin, M., & Van Gool, L. (2015). Non-maximum Suppression for Object Detection by Passing Messages Between Windows. In D. Cremers, I. Reid, H. Saito, & M.-H. Yang (Eds.), Computer Vision – ACCV 2014 (Vol. 9003, pp. 290–306). Springer International Publishing. https://doi.org/10.1007/978-3-319-16865-4_19
Sabu, A. M., & Das, A. S. (2018). A Survey on various Optical Character Recognition Techniques. 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), 152–155. https://doi.org/10.1109/ICEDSS.2018.8544323
Salvador, A., Bellver, M., Campos, V., Baradad, M., Marques, F., Torres, J., & Giro-i-Nieto, X. (2019). Recurrent Neural Networks for Semantic Instance Segmentation. ArXiv:1712.00617 [Cs]. http://arxiv.org/abs/1712.00617
Satti, D. A. (2013). Offline Urdu Nastaliq OCR for Printed Text using Analytical Approach. 161.
Shen, H., & Coughlan, J. M. (2012). Towards a Real-Time System for Finding and Reading Signs for Visually Impaired Users. In K. Miesenberger, A. Karshmer, P. Penaz, & W. Zagler (Eds.), Computers Helping People with Special Needs (Vol. 7383, pp. 41–47). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-31534-3_7
Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306
Shi, B., Bai, X., & Yao, C. (2015). An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. ArXiv:1507.05717 [Cs]. http://arxiv.org/abs/1507.05717
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556
Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
Stewart, R., & Andriluka, M. (2015). End-to-end people detection in crowded scenes. ArXiv:1506.04878 [Cs]. http://arxiv.org/abs/1506.04878
Subramani, N., Matton, A., Greaves, M., & Lam, A. (2020). A Survey of Deep Learning Approaches for OCR and Document Understanding. ArXiv:2011.13534 [Cs]. http://arxiv.org/abs/2011.13534
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv:1409.4842 [Cs]. http://arxiv.org/abs/1409.4842
Tan, M., & Le, Q. V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv:1905.11946 [Cs, Stat]. http://arxiv.org/abs/1905.11946
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 1, I-511-I–518. https://doi.org/10.1109/CVPR.2001.990517
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172
Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2022). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (arXiv:2012.14740). arXiv. http://arxiv.org/abs/2012.14740
Zaidi, S. S. A., Ansari, M. S., Aslam, A., Kanwal, N., Asghar, M., & Lee, B. (2021). A Survey of Modern Deep Learning based Object Detection Models. ArXiv:2104.11892 [Cs, Eess]. http://arxiv.org/abs/2104.11892
Zhang, P., Xu, Y., Cheng, Z., Pu, S., Lu, J., Qiao, L., Niu, Y., & Wu, F. (2021). TRIE: End-to-End Text Reading and Information Extraction for Document Understanding (arXiv:2005.13118). arXiv. http://arxiv.org/abs/2005.13118
Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-Shot Refinement Neural Network for Object Detection. ArXiv:1711.06897 [Cs]. http://arxiv.org/abs/1711.06897
Zhang, Y., Hare, J., & Prügel-Bennett, A. (2020). Deep Set Prediction Networks. ArXiv:1906.06565 [Cs, Stat]. http://arxiv.org/abs/1906.06565
Zhang, Z., Ma, J., Du, J., Wang, L., & Zhang, J. (2022). Multimodal Pre-training Based on Graph Attention Network for Document Understanding (arXiv:2203.13530). arXiv. http://arxiv.org/abs/2203.13530
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. (2019). M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network. ArXiv:1811.04533 [Cs]. http://arxiv.org/abs/1811.04533
Zhao, X., Niu, E., Wu, Z., & Wang, X. (2019). CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor. ArXiv:1903.12363 [Cs]. http://arxiv.org/abs/1903.12363
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021). DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION. 16.
Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object Detection in 20 Years: A Survey. 40.
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv Reconocimiento 4.0 Internacional
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv Reconocimiento 4.0 Internacional
http://creativecommons.org/licenses/by/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.spa.fl_str_mv 67 páginas
dc.format.mimetype.spa.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Universidad Nacional de Colombia
dc.publisher.program.spa.fl_str_mv Medellín - Minas - Maestría en Ingeniería - Analítica
dc.publisher.department.spa.fl_str_mv Departamento de la Computación y la Decisión
dc.publisher.faculty.spa.fl_str_mv Facultad de Minas
dc.publisher.place.spa.fl_str_mv Medellín
dc.publisher.branch.spa.fl_str_mv Universidad Nacional de Colombia - Sede Medellín
institution Universidad Nacional de Colombia
bitstream.url.fl_str_mv https://repositorio.unal.edu.co/bitstream/unal/82000/1/1017231914.2022.pdf
https://repositorio.unal.edu.co/bitstream/unal/82000/2/license.txt
https://repositorio.unal.edu.co/bitstream/unal/82000/3/1017231914.2022.pdf.jpg
bitstream.checksum.fl_str_mv 158554034e1cafc545f3e08c65c9e66d
8153f7789df02f0a4c9e079953658ab2
2e11d994dbf59e37fa0a57813b5f1fd3
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv repositorio_nal@unal.edu.co
_version_ 1814089392781787136
spelling Reconocimiento 4.0 Internacionalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Villa Garzón, Fernán Alonso9c83ea56495b8f17a79c27fd0001bb81600Márquez Aristizábal, Hugo Alejandro7d8af31e6325c06bdfdd23246aa215802022-08-22T21:32:48Z2022-08-22T21:32:48Z2022-06-23https://repositorio.unal.edu.co/handle/unal/82000Universidad Nacional de ColombiaRepositorio Institucional Universidad Nacional de Colombiahttps://repositorio.unal.edu.co/La extracción automática de información de documentos de identidad es una tarea fundamental en diferentes procesos digitales como registros, solicitud de productos, validación de identidad, entre otros. La extracción de información consiste en la identificación, ubicación, clasificación y reconocimiento del texto de campos clave presentes en un documento, en este caso un documento de identidad. Tratándose de documentos de identidad, los campos clave son aquellos como: nombres, apellidos, números de documento, fechas, entre otros. El problema de extracción de información se ha solucionado tradicionalmente utilizando algoritmos basados en reglas y motores clásicos de OCR. En los últimos años se han realizado implementaciones de modelos de aprendizaje de máquina, utilizando modelos de NLP (procesamiento de lenguaje natural) y CV (visión por computador) para solucionar el problema de una manera más flexible y eficiente (Subramani et al., 2020). En este trabajo se propuso solucionar el problema de extracción de información con una aproximación de detección de objetos. Se implementó, entrenó y evaluó un modelo de detección de objetos basado en transformadores (Carion et al., 2020). Se logró llegar a una solución que alcanza valores de precisión superiores al 95% en la detección de campos clave en documentos de identidad. (Texto tomado de la fuente)Automatic information extraction from identity documents is a fundamental task in digital processes such as onboarding, requesting products, identity validation, among others. The information extraction process consists of identifying, locating, classifying and recognizing text of the corresponding key fields that an identity document contains. In the case of identity documents, key fields are: names, last names, document number, dates, among others. The information extraction problem has been traditionally solved using rule based algorithms and classic OCR engines. In the last few years there have been implementations based on machine learning models, using NLP (natural language processing) and CV (computer vision) to solve the problem in a more flexible and efficient way (Subramani et al., 2020). This work proposes to solve the problem of information extraction with an object detection approach. An object detection model based on transformers (Carion et al., 2020) was implemented, trained and evaluated. A solution with above 95% accuracy in detecting key fields on identification documents was achieved.MaestríaMagíster en Ingeniería - AnalíticaÁrea Curricular de Ingeniería de Sistemas e Informática67 páginasapplication/pdfspaUniversidad Nacional de ColombiaMedellín - Minas - Maestría en Ingeniería - AnalíticaDepartamento de la Computación y la DecisiónFacultad de MinasMedellínUniversidad Nacional de Colombia - Sede Medellín000 - Ciencias de la computación, información y obras generales::003 - SistemasIdentidad digitalReconocimiento óptico de caracteresRedes neuronales (Computadoers)Ientidad digitalOCRExtracción de informaciónDetección de objetosDigital identityInformation extractionObject detectionExtracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquinaInformation extraction from identification documents using machine learning techniquesTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMAl-Badr, B., & Mahmoud, S. A. (1995). Survey and bibliography of Arabic optical text recognition. Signal Processing, 41(1), 49–77. https://doi.org/10.1016/0165-1684(94)00090-MAmin, A., & Shiu, R. (2001). Page Segmentation and Classification utilizing Bottom-up Approach. International Journal of Image and Graphics, 01(02), 345–361. https://doi.org/10.1142/S0219467801000219Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding (arXiv:2106.11539). arXiv. http://arxiv.org/abs/2106.11539Arlazarov, V. V., Bulatov, K., Chernov, T., & Arlazarov, V. L. (2019). MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics, 43(5), 818–824. https://doi.org/10.18287/2412-6179-2019-43-5-818-824Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473Bajaj, R., Dey, L., & Chaudhury, S. (2002). Devnagari numeral recognition by combining decision of multiple connectionist classifiers. Sadhana, 27(1), 59–72. https://doi.org/10.1007/BF02703312Bello, I., Zoph, B., Vaswani, A., Shlens, J., & Le, Q. V. (2020). Attention Augmented Convolutional Networks. ArXiv:1904.09925 [Cs]. http://arxiv.org/abs/1904.09925Bertolotti, M. (2011). Optical Pattern Recognition, edited by F.T.S Yu and S. Jutamulia.Bhavani, S., & Thanushkodi, D. K. (2010). A Survey On Coding Algorithms In Medical Image Compression. 02(05), 7.Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. ArXiv:2005.12872 [Cs]. http://arxiv.org/abs/2005.12872Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., & Ochoa, M. (2020). Machine Learning Techniques for Identity Document Verification in Uncontrolled Environments: A Case Study. In K. M. Figueroa Mora, J. Anzurez Marín, J. Cerda, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, & J. A. Olvera-López (Eds.), Pattern Recognition (Vol. 12088, pp. 271–281). Springer International Publishing. https://doi.org/10.1007/978-3-030-49076-8_26Chan, W., Saharia, C., Hinton, G., Norouzi, M., & Jaitly, N. (2020). Imputer: Sequence Modelling via Imputation and Dynamic Programming. ArXiv:2002.08926 [Cs, Eess]. http://arxiv.org/abs/2002.08926Chaudhuri, A., Mandaviya, K., Badelia, P., & K Ghosh, S. (2017). Optical Character Recognition Systems for Different Languages with Soft Computing (Vol. 352). Springer International Publishing. https://doi.org/10.1007/978-3-319-50252-6Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1, 886–893. https://doi.org/10.1109/CVPR.2005.177De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic Instance Segmentation with a Discriminative Loss Function. ArXiv:1708.02551 [Cs]. http://arxiv.org/abs/1708.02551Delteil, T., Belval, E., Chen, L., Goncalves, L., & Mahadevan, V. (2022). MATrIX -- Modality-Aware Transformer for Information eXtraction (arXiv:2205.08094). arXiv. http://arxiv.org/abs/2205.08094Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). DSSD: Deconvolutional Single Shot Detector. ArXiv:1701.06659 [Cs]. http://arxiv.org/abs/1701.06659Ghazvininejad, M., Levy, O., Liu, Y., & Zettlemoyer, L. (2019). Mask-Predict: Parallel Decoding of Conditional Masked Language Models. ArXiv:1904.09324 [Cs, Stat]. http://arxiv.org/abs/1904.09324Girshick, R. (2015). Fast R-CNN. ArXiv:1504.08083 [Cs]. http://arxiv.org/abs/1504.08083Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv:1311.2524 [Cs]. http://arxiv.org/abs/1311.2524Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. 8.Graves, A., & Schmidhuber, J. (2007). Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. In Artificial Neural Networks – ICANN 2007 (Vol. 4668, pp. 549–558). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-74690-4_56Gu, J., Bradbury, J., Xiong, C., Li, V. O. K., & Socher, R. (2018). Non-Autoregressive Neural Machine Translation. ArXiv:1711.02281 [Cs]. http://arxiv.org/abs/1711.02281Gu, J., Kuen, J., Morariu, V. I., Zhao, H., Barmpalios, N., Jain, R., Nenkova, A., & Sun, T. (2022). Unified Pretraining Framework for Document Understanding (arXiv:2204.10939). arXiv. http://arxiv.org/abs/2204.10939He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2018). Mask R-CNN. ArXiv:1703.06870 [Cs]. http://arxiv.org/abs/1703.06870He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022). LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking (arXiv:2204.08387). arXiv. http://arxiv.org/abs/2204.08387Islam, N., Islam, Z., & Noor, N. (2016). A Survey on Optical Character Recognition System. Journal of Information, 10(2), 4.Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., & Qu, R. (2019). A Survey of Deep Learning-Based Object Detection. IEEE Access, 7, 128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201Katti, A. R., Reisswig, C., Guder, C., Brarda, S., Bickel, S., Höhne, J., & Faddoul, J. B. (2018). Chargrid: Towards Understanding 2D Documents. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4459–4469. https://doi.org/10.18653/v1/D18-1476Kim, G., Hong, T., Yim, M., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., & Park, S. (2021). Donut: Document Understanding Transformer without OCR (arXiv:2111.15664). arXiv. http://arxiv.org/abs/2111.15664Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386Le, A. D., Pham, D. V., & Nguyen, T. A. (2019). Deep Learning Approach for Receipt Recognition. In T. K. Dang, J. Küng, M. Takizawa, & S. H. Bui (Eds.), Future Data and Security Engineering (Vol. 11814, pp. 705–712). Springer International Publishing. https://doi.org/10.1007/978-3-030-35653-8_50Lebourgeois, F., Bublinski, Z., & Emptoz, H. (1992). A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents. Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems, 272–276. https://doi.org/10.1109/ICPR.1992.201771LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539Li, X., Zheng, Y., Hu, Y., Cao, H., Wu, Y., Jiang, D., Liu, Y., & Ren, B. (2022). Relational Representation Learning in Visually-Rich Documents (arXiv:2205.02411). arXiv. http://arxiv.org/abs/2205.02411Li, Y., Qian, Y., Yu, Y., Qin, X., Zhang, C., Liu, Y., Yao, K., Han, J., Liu, J., & Ding, E. (2021). StrucTexT: Structured Text Understanding with Multi-Modal Transformers (arXiv:2108.02923). arXiv. http://arxiv.org/abs/2108.02923Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. ArXiv:1612.03144 [Cs]. http://arxiv.org/abs/1612.03144Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal Loss for Dense Object Detection. ArXiv:1708.02002 [Cs]. http://arxiv.org/abs/1708.02002Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollár, P. (2015). Microsoft COCO: Common Objects in Context. ArXiv:1405.0312 [Cs]. http://arxiv.org/abs/1405.0312Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision, 128(2), 261–318. https://doi.org/10.1007/s11263-019-01247-4Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. ArXiv:1512.02325 [Cs], 9905, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. ArXiv:1411.4038 [Cs]. http://arxiv.org/abs/1411.4038Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. https://doi.org/10.1016/j.imavis.2004.02.006Mori, S., Nishida, H., & Yamada, H. (1999). Optical Character Recognition.Namysl, M., & Konya, I. (2019). Efficient, Lexicon-Free OCR using Deep Learning. ArXiv:1906.01969 [Cs]. http://arxiv.org/abs/1906.01969Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091Oord, A. van den, Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G. van den, Lockhart, E., Cobo, L. C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., … Hassabis, D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis. ArXiv:1711.10433 [Cs]. http://arxiv.org/abs/1711.10433Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., & Tran, D. (2018). Image Transformer. ArXiv:1802.05751 [Cs]. http://arxiv.org/abs/1802.05751Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. ArXiv:1802.05365 [Cs]. http://arxiv.org/abs/1802.05365Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. ArXiv:1506.02640 [Cs]. http://arxiv.org/abs/1506.02640Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. ArXiv:1612.08242 [Cs]. http://arxiv.org/abs/1612.08242Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. ArXiv:1804.02767 [Cs]. http://arxiv.org/abs/1804.02767Ren, M., & Zemel, R. S. (2017). End-to-End Instance Segmentation with Recurrent Attention. ArXiv:1605.09410 [Cs]. http://arxiv.org/abs/1605.09410Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ArXiv:1506.01497 [Cs]. http://arxiv.org/abs/1506.01497Rezatofighi, S. H., G, V. K. B., Milan, A., Abbasnejad, E., Dick, A., & Reid, I. (2017). DeepSetNet: Predicting Sets with Deep Neural Networks. ArXiv:1611.08998 [Cs]. http://arxiv.org/abs/1611.08998Romera-Paredes, B., & Torr, P. H. S. (2016). Recurrent Instance Segmentation. ArXiv:1511.08250 [Cs]. http://arxiv.org/abs/1511.08250Rothe, R., Guillaumin, M., & Van Gool, L. (2015). Non-maximum Suppression for Object Detection by Passing Messages Between Windows. In D. Cremers, I. Reid, H. Saito, & M.-H. Yang (Eds.), Computer Vision – ACCV 2014 (Vol. 9003, pp. 290–306). Springer International Publishing. https://doi.org/10.1007/978-3-319-16865-4_19Sabu, A. M., & Das, A. S. (2018). A Survey on various Optical Character Recognition Techniques. 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), 152–155. https://doi.org/10.1109/ICEDSS.2018.8544323Salvador, A., Bellver, M., Campos, V., Baradad, M., Marques, F., Torres, J., & Giro-i-Nieto, X. (2019). Recurrent Neural Networks for Semantic Instance Segmentation. ArXiv:1712.00617 [Cs]. http://arxiv.org/abs/1712.00617Satti, D. A. (2013). Offline Urdu Nastaliq OCR for Printed Text using Analytical Approach. 161.Shen, H., & Coughlan, J. M. (2012). Towards a Real-Time System for Finding and Reading Signs for Visually Impaired Users. In K. Miesenberger, A. Karshmer, P. Penaz, & W. Zagler (Eds.), Computers Helping People with Special Needs (Vol. 7383, pp. 41–47). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-31534-3_7Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306Shi, B., Bai, X., & Yao, C. (2015). An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. ArXiv:1507.05717 [Cs]. http://arxiv.org/abs/1507.05717Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 629–633. https://doi.org/10.1109/ICDAR.2007.4376991Stewart, R., & Andriluka, M. (2015). End-to-end people detection in crowded scenes. ArXiv:1506.04878 [Cs]. http://arxiv.org/abs/1506.04878Subramani, N., Matton, A., Greaves, M., & Lam, A. (2020). A Survey of Deep Learning Approaches for OCR and Document Understanding. ArXiv:2011.13534 [Cs]. http://arxiv.org/abs/2011.13534Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv:1409.4842 [Cs]. http://arxiv.org/abs/1409.4842Tan, M., & Le, Q. V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv:1905.11946 [Cs, Stat]. http://arxiv.org/abs/1905.11946Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 1, I-511-I–518. https://doi.org/10.1109/CVPR.2001.990517Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2022). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (arXiv:2012.14740). arXiv. http://arxiv.org/abs/2012.14740Zaidi, S. S. A., Ansari, M. S., Aslam, A., Kanwal, N., Asghar, M., & Lee, B. (2021). A Survey of Modern Deep Learning based Object Detection Models. ArXiv:2104.11892 [Cs, Eess]. http://arxiv.org/abs/2104.11892Zhang, P., Xu, Y., Cheng, Z., Pu, S., Lu, J., Qiao, L., Niu, Y., & Wu, F. (2021). TRIE: End-to-End Text Reading and Information Extraction for Document Understanding (arXiv:2005.13118). arXiv. http://arxiv.org/abs/2005.13118Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-Shot Refinement Neural Network for Object Detection. ArXiv:1711.06897 [Cs]. http://arxiv.org/abs/1711.06897Zhang, Y., Hare, J., & Prügel-Bennett, A. (2020). Deep Set Prediction Networks. ArXiv:1906.06565 [Cs, Stat]. http://arxiv.org/abs/1906.06565Zhang, Z., Ma, J., Du, J., Wang, L., & Zhang, J. (2022). Multimodal Pre-training Based on Graph Attention Network for Document Understanding (arXiv:2203.13530). arXiv. http://arxiv.org/abs/2203.13530Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. (2019). M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network. ArXiv:1811.04533 [Cs]. http://arxiv.org/abs/1811.04533Zhao, X., Niu, E., Wu, Z., & Wang, X. (2019). CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor. ArXiv:1903.12363 [Cs]. http://arxiv.org/abs/1903.12363Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021). DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION. 16.Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object Detection in 20 Years: A Survey. 40.EstudiantesInvestigadoresORIGINAL1017231914.2022.pdf1017231914.2022.pdfapplication/pdf7517510https://repositorio.unal.edu.co/bitstream/unal/82000/1/1017231914.2022.pdf158554034e1cafc545f3e08c65c9e66dMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-84074https://repositorio.unal.edu.co/bitstream/unal/82000/2/license.txt8153f7789df02f0a4c9e079953658ab2MD52THUMBNAIL1017231914.2022.pdf.jpg1017231914.2022.pdf.jpgGenerated Thumbnailimage/jpeg4276https://repositorio.unal.edu.co/bitstream/unal/82000/3/1017231914.2022.pdf.jpg2e11d994dbf59e37fa0a57813b5f1fd3MD53unal/82000oai:repositorio.unal.edu.co:unal/820002023-10-06 17:38:53.423Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.coUExBTlRJTExBIERFUMOTU0lUTwoKQ29tbyBlZGl0b3IgZGUgZXN0ZSDDrXRlbSwgdXN0ZWQgcHVlZGUgbW92ZXJsbyBhIHJldmlzacOzbiBzaW4gYW50ZXMgcmVzb2x2ZXIgbG9zIHByb2JsZW1hcyBpZGVudGlmaWNhZG9zLCBkZSBsbyBjb250cmFyaW8sIGhhZ2EgY2xpYyBlbiBHdWFyZGFyIHBhcmEgZ3VhcmRhciBlbCDDrXRlbSB5IHNvbHVjaW9uYXIgZXN0b3MgcHJvYmxlbWFzIG1hcyB0YXJkZS4KClBhcmEgdHJhYmFqb3MgZGVwb3NpdGFkb3MgcG9yIHN1IHByb3BpbyBhdXRvcjoKIApBbCBhdXRvYXJjaGl2YXIgZXN0ZSBncnVwbyBkZSBhcmNoaXZvcyBkaWdpdGFsZXMgeSBzdXMgbWV0YWRhdG9zLCB5byBnYXJhbnRpem8gYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBVbmFsIGVsIGRlcmVjaG8gYSBhbG1hY2VuYXJsb3MgeSBtYW50ZW5lcmxvcyBkaXNwb25pYmxlcyBlbiBsw61uZWEgZGUgbWFuZXJhIGdyYXR1aXRhLiBEZWNsYXJvIHF1ZSBsYSBvYnJhIGVzIGRlIG1pIHByb3BpZWRhZCBpbnRlbGVjdHVhbCB5IHF1ZSBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsIFVuYWwgbm8gYXN1bWUgbmluZ3VuYSByZXNwb25zYWJpbGlkYWQgc2kgaGF5IGFsZ3VuYSB2aW9sYWNpw7NuIGEgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIGFsIGRpc3RyaWJ1aXIgZXN0b3MgYXJjaGl2b3MgeSBtZXRhZGF0b3MuIChTZSByZWNvbWllbmRhIGEgdG9kb3MgbG9zIGF1dG9yZXMgYSBpbmRpY2FyIHN1cyBkZXJlY2hvcyBkZSBhdXRvciBlbiBsYSBww6FnaW5hIGRlIHTDrXR1bG8gZGUgc3UgZG9jdW1lbnRvLikgRGUgbGEgbWlzbWEgbWFuZXJhLCBhY2VwdG8gbG9zIHTDqXJtaW5vcyBkZSBsYSBzaWd1aWVudGUgbGljZW5jaWE6IExvcyBhdXRvcmVzIG8gdGl0dWxhcmVzIGRlbCBkZXJlY2hvIGRlIGF1dG9yIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8gY29uZmllcmVuIGEgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgdW5hIGxpY2VuY2lhIG5vIGV4Y2x1c2l2YSwgbGltaXRhZGEgeSBncmF0dWl0YSBzb2JyZSBsYSBvYnJhIHF1ZSBzZSBpbnRlZ3JhIGVuIGVsIFJlcG9zaXRvcmlvIEluc3RpdHVjaW9uYWwsIHF1ZSBzZSBhanVzdGEgYSBsYXMgc2lndWllbnRlcyBjYXJhY3RlcsOtc3RpY2FzOiBhKSBFc3RhcsOhIHZpZ2VudGUgYSBwYXJ0aXIgZGUgbGEgZmVjaGEgZW4gcXVlIHNlIGluY2x1eWUgZW4gZWwgcmVwb3NpdG9yaW8sIHF1ZSBzZXLDoW4gcHJvcnJvZ2FibGVzIGluZGVmaW5pZGFtZW50ZSBwb3IgZWwgdGllbXBvIHF1ZSBkdXJlIGVsIGRlcmVjaG8gcGF0cmltb25pYWwgZGVsIGF1dG9yLiBFbCBhdXRvciBwb2Ryw6EgZGFyIHBvciB0ZXJtaW5hZGEgbGEgbGljZW5jaWEgc29saWNpdMOhbmRvbG8gYSBsYSBVbml2ZXJzaWRhZC4gYikgTG9zIGF1dG9yZXMgYXV0b3JpemFuIGEgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgcGFyYSBwdWJsaWNhciBsYSBvYnJhIGVuIGVsIGZvcm1hdG8gcXVlIGVsIHJlcG9zaXRvcmlvIGxvIHJlcXVpZXJhIChpbXByZXNvLCBkaWdpdGFsLCBlbGVjdHLDs25pY28gbyBjdWFscXVpZXIgb3RybyBjb25vY2lkbyBvIHBvciBjb25vY2VyKSB5IGNvbm9jZW4gcXVlIGRhZG8gcXVlIHNlIHB1YmxpY2EgZW4gSW50ZXJuZXQgcG9yIGVzdGUgaGVjaG8gY2lyY3VsYSBjb24gYWxjYW5jZSBtdW5kaWFsLiBjKSBMb3MgYXV0b3JlcyBhY2VwdGFuIHF1ZSBsYSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgYSB0w610dWxvIGdyYXR1aXRvLCBwb3IgbG8gdGFudG8sIHJlbnVuY2lhbiBhIHJlY2liaXIgZW1vbHVtZW50byBhbGd1bm8gcG9yIGxhIHB1YmxpY2FjacOzbiwgZGlzdHJpYnVjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSB5IGN1YWxxdWllciBvdHJvIHVzbyBxdWUgc2UgaGFnYSBlbiBsb3MgdMOpcm1pbm9zIGRlIGxhIHByZXNlbnRlIGxpY2VuY2lhIHkgZGUgbGEgbGljZW5jaWEgQ3JlYXRpdmUgQ29tbW9ucyBjb24gcXVlIHNlIHB1YmxpY2EuIGQpIExvcyBhdXRvcmVzIG1hbmlmaWVzdGFuIHF1ZSBzZSB0cmF0YSBkZSB1bmEgb2JyYSBvcmlnaW5hbCBzb2JyZSBsYSBxdWUgdGllbmVuIGxvcyBkZXJlY2hvcyBxdWUgYXV0b3JpemFuIHkgcXVlIHNvbiBlbGxvcyBxdWllbmVzIGFzdW1lbiB0b3RhbCByZXNwb25zYWJpbGlkYWQgcG9yIGVsIGNvbnRlbmlkbyBkZSBzdSBvYnJhIGFudGUgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgeSBhbnRlIHRlcmNlcm9zLiBFbiB0b2RvIGNhc28gbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgc2UgY29tcHJvbWV0ZSBhIGluZGljYXIgc2llbXByZSBsYSBhdXRvcsOtYSBpbmNsdXllbmRvIGVsIG5vbWJyZSBkZWwgYXV0b3IgeSBsYSBmZWNoYSBkZSBwdWJsaWNhY2nDs24uIGUpIExvcyBhdXRvcmVzIGF1dG9yaXphbiBhIGxhIFVuaXZlcnNpZGFkIHBhcmEgaW5jbHVpciBsYSBvYnJhIGVuIGxvcyBhZ3JlZ2Fkb3JlcywgaW5kaWNlc3MgeSBidXNjYWRvcmVzIHF1ZSBzZSBlc3RpbWVuIG5lY2VzYXJpb3MgcGFyYSBwcm9tb3ZlciBzdSBkaWZ1c2nDs24uIGYpIExvcyBhdXRvcmVzIGFjZXB0YW4gcXVlIGxhIFVuaXZlcnNpZGFkIE5hY2lvbmFsIGRlIENvbG9tYmlhIHB1ZWRhIGNvbnZlcnRpciBlbCBkb2N1bWVudG8gYSBjdWFscXVpZXIgbWVkaW8gbyBmb3JtYXRvIHBhcmEgcHJvcMOzc2l0b3MgZGUgcHJlc2VydmFjacOzbiBkaWdpdGFsLiBTSSBFTCBET0NVTUVOVE8gU0UgQkFTQSBFTiBVTiBUUkFCQUpPIFFVRSBIQSBTSURPIFBBVFJPQ0lOQURPIE8gQVBPWUFETyBQT1IgVU5BIEFHRU5DSUEgTyBVTkEgT1JHQU5JWkFDScOTTiwgQ09OIEVYQ0VQQ0nDk04gREUgTEEgVU5JVkVSU0lEQUQgTkFDSU9OQUwgREUgQ09MT01CSUEsIExPUyBBVVRPUkVTIEdBUkFOVElaQU4gUVVFIFNFIEhBIENVTVBMSURPIENPTiBMT1MgREVSRUNIT1MgWSBPQkxJR0FDSU9ORVMgUkVRVUVSSURPUyBQT1IgRUwgUkVTUEVDVElWTyBDT05UUkFUTyBPIEFDVUVSRE8uIAoKUGFyYSB0cmFiYWpvcyBkZXBvc2l0YWRvcyBwb3Igb3RyYXMgcGVyc29uYXMgZGlzdGludGFzIGEgc3UgYXV0b3I6IAoKRGVjbGFybyBxdWUgZWwgZ3J1cG8gZGUgYXJjaGl2b3MgZGlnaXRhbGVzIHkgbWV0YWRhdG9zIGFzb2NpYWRvcyBxdWUgZXN0b3kgYXJjaGl2YW5kbyBlbiBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsIFVOKSBlcyBkZSBkb21pbmlvIHDDumJsaWNvLiBTaSBubyBmdWVzZSBlbCBjYXNvLCBhY2VwdG8gdG9kYSBsYSByZXNwb25zYWJpbGlkYWQgcG9yIGN1YWxxdWllciBpbmZyYWNjacOzbiBkZSBkZXJlY2hvcyBkZSBhdXRvciBxdWUgY29ubGxldmUgbGEgZGlzdHJpYnVjacOzbiBkZSBlc3RvcyBhcmNoaXZvcyB5IG1ldGFkYXRvcy4KTk9UQTogU0kgTEEgVEVTSVMgQSBQVUJMSUNBUiBBRFFVSVJJw5MgQ09NUFJPTUlTT1MgREUgQ09ORklERU5DSUFMSURBRCBFTiBFTCBERVNBUlJPTExPIE8gUEFSVEVTIERFTCBET0NVTUVOVE8uIFNJR0EgTEEgRElSRUNUUklaIERFIExBIFJFU09MVUNJw5NOIDAyMyBERSAyMDE1LCBQT1IgTEEgQ1VBTCBTRSBFU1RBQkxFQ0UgRUwgUFJPQ0VESU1JRU5UTyBQQVJBIExBIFBVQkxJQ0FDScOTTiBERSBURVNJUyBERSBNQUVTVFLDjUEgWSBET0NUT1JBRE8gREUgTE9TIEVTVFVESUFOVEVTIERFIExBIFVOSVZFUlNJREFEIE5BQ0lPTkFMIERFIENPTE9NQklBIEVOIEVMIFJFUE9TSVRPUklPIElOU1RJVFVDSU9OQUwgVU4sIEVYUEVESURBIFBPUiBMQSBTRUNSRVRBUsONQSBHRU5FUkFMLiAqTEEgVEVTSVMgQSBQVUJMSUNBUiBERUJFIFNFUiBMQSBWRVJTScOTTiBGSU5BTCBBUFJPQkFEQS4gCgpBbCBoYWNlciBjbGljIGVuIGVsIHNpZ3VpZW50ZSBib3TDs24sIHVzdGVkIGluZGljYSBxdWUgZXN0w6EgZGUgYWN1ZXJkbyBjb24gZXN0b3MgdMOpcm1pbm9zLiBTaSB0aWVuZSBhbGd1bmEgZHVkYSBzb2JyZSBsYSBsaWNlbmNpYSwgcG9yIGZhdm9yLCBjb250YWN0ZSBjb24gZWwgYWRtaW5pc3RyYWRvciBkZWwgc2lzdGVtYS4KClVOSVZFUlNJREFEIE5BQ0lPTkFMIERFIENPTE9NQklBIC0gw5psdGltYSBtb2RpZmljYWNpw7NuIDE5LzEwLzIwMjEK