Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina
La extracción automática de información de documentos de identidad es una tarea fundamental en diferentes procesos digitales como registros, solicitud de productos, validación de identidad, entre otros. La extracción de información consiste en la identificación, ubicación, clasificación y reconocimi...
- Autores:
-
Márquez Aristizábal, Hugo Alejandro
- Tipo de recurso:
- Fecha de publicación:
- 2022
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/82000
- Palabra clave:
- 000 - Ciencias de la computación, información y obras generales::003 - Sistemas
Identidad digital
Reconocimiento óptico de caracteres
Redes neuronales (Computadoers)
Ientidad digital
OCR
Extracción de información
Detección de objetos
Digital identity
Information extraction
Object detection
- Rights
- openAccess
- License
- Reconocimiento 4.0 Internacional
id |
UNACIONAL2_a5ef20270ff23252f268e25a89d661f9 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/82000 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina |
dc.title.translated.eng.fl_str_mv |
Information extraction from identification documents using machine learning techniques |
title |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina |
spellingShingle |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina 000 - Ciencias de la computación, información y obras generales::003 - Sistemas Identidad digital Reconocimiento óptico de caracteres Redes neuronales (Computadoers) Ientidad digital OCR Extracción de información Detección de objetos Digital identity Information extraction Object detection |
title_short |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina |
title_full |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina |
title_fullStr |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina |
title_full_unstemmed |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina |
title_sort |
Extracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquina |
dc.creator.fl_str_mv |
Márquez Aristizábal, Hugo Alejandro |
dc.contributor.advisor.none.fl_str_mv |
Villa Garzón, Fernán Alonso |
dc.contributor.author.none.fl_str_mv |
Márquez Aristizábal, Hugo Alejandro |
dc.subject.ddc.spa.fl_str_mv |
000 - Ciencias de la computación, información y obras generales::003 - Sistemas |
topic |
000 - Ciencias de la computación, información y obras generales::003 - Sistemas Identidad digital Reconocimiento óptico de caracteres Redes neuronales (Computadoers) Ientidad digital OCR Extracción de información Detección de objetos Digital identity Information extraction Object detection |
dc.subject.other.none.fl_str_mv |
Identidad digital Reconocimiento óptico de caracteres |
dc.subject.lemb.none.fl_str_mv |
Redes neuronales (Computadoers) |
dc.subject.proposal.spa.fl_str_mv |
Ientidad digital OCR Extracción de información Detección de objetos |
dc.subject.proposal.eng.fl_str_mv |
Digital identity Information extraction Object detection |
description |
La extracción automática de información de documentos de identidad es una tarea fundamental en diferentes procesos digitales como registros, solicitud de productos, validación de identidad, entre otros. La extracción de información consiste en la identificación, ubicación, clasificación y reconocimiento del texto de campos clave presentes en un documento, en este caso un documento de identidad. Tratándose de documentos de identidad, los campos clave son aquellos como: nombres, apellidos, números de documento, fechas, entre otros. El problema de extracción de información se ha solucionado tradicionalmente utilizando algoritmos basados en reglas y motores clásicos de OCR. En los últimos años se han realizado implementaciones de modelos de aprendizaje de máquina, utilizando modelos de NLP (procesamiento de lenguaje natural) y CV (visión por computador) para solucionar el problema de una manera más flexible y eficiente (Subramani et al., 2020). En este trabajo se propuso solucionar el problema de extracción de información con una aproximación de detección de objetos. Se implementó, entrenó y evaluó un modelo de detección de objetos basado en transformadores (Carion et al., 2020). Se logró llegar a una solución que alcanza valores de precisión superiores al 95% en la detección de campos clave en documentos de identidad. (Texto tomado de la fuente) |
publishDate |
2022 |
dc.date.accessioned.none.fl_str_mv |
2022-08-22T21:32:48Z |
dc.date.available.none.fl_str_mv |
2022-08-22T21:32:48Z |
dc.date.issued.none.fl_str_mv |
2022-06-23 |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/82000 |
dc.identifier.instname.spa.fl_str_mv |
Universidad Nacional de Colombia |
dc.identifier.reponame.spa.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
dc.identifier.repourl.spa.fl_str_mv |
https://repositorio.unal.edu.co/ |
url |
https://repositorio.unal.edu.co/handle/unal/82000 https://repositorio.unal.edu.co/ |
identifier_str_mv |
Universidad Nacional de Colombia Repositorio Institucional Universidad Nacional de Colombia |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.references.spa.fl_str_mv |
Al-Badr, B., & Mahmoud, S. A. (1995). Survey and bibliography of Arabic optical text recognition. Signal Processing, 41(1), 49–77. https://doi.org/10.1016/0165-1684(94)00090-M Amin, A., & Shiu, R. (2001). Page Segmentation and Classification utilizing Bottom-up Approach. International Journal of Image and Graphics, 01(02), 345–361. https://doi.org/10.1142/S0219467801000219 Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding (arXiv:2106.11539). arXiv. http://arxiv.org/abs/2106.11539 Arlazarov, V. V., Bulatov, K., Chernov, T., & Arlazarov, V. L. (2019). MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics, 43(5), 818–824. https://doi.org/10.18287/2412-6179-2019-43-5-818-824 Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473 Bajaj, R., Dey, L., & Chaudhury, S. (2002). Devnagari numeral recognition by combining decision of multiple connectionist classifiers. Sadhana, 27(1), 59–72. https://doi.org/10.1007/BF02703312 Bello, I., Zoph, B., Vaswani, A., Shlens, J., & Le, Q. V. (2020). Attention Augmented Convolutional Networks. ArXiv:1904.09925 [Cs]. http://arxiv.org/abs/1904.09925 Bertolotti, M. (2011). Optical Pattern Recognition, edited by F.T.S Yu and S. Jutamulia. Bhavani, S., & Thanushkodi, D. K. (2010). A Survey On Coding Algorithms In Medical Image Compression. 02(05), 7. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. ArXiv:2005.12872 [Cs]. http://arxiv.org/abs/2005.12872 Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., & Ochoa, M. (2020). Machine Learning Techniques for Identity Document Verification in Uncontrolled Environments: A Case Study. In K. M. Figueroa Mora, J. Anzurez Marín, J. Cerda, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, & J. A. Olvera-López (Eds.), Pattern Recognition (Vol. 12088, pp. 271–281). Springer International Publishing. https://doi.org/10.1007/978-3-030-49076-8_26 Chan, W., Saharia, C., Hinton, G., Norouzi, M., & Jaitly, N. (2020). Imputer: Sequence Modelling via Imputation and Dynamic Programming. ArXiv:2002.08926 [Cs, Eess]. http://arxiv.org/abs/2002.08926 Chaudhuri, A., Mandaviya, K., Badelia, P., & K Ghosh, S. (2017). Optical Character Recognition Systems for Different Languages with Soft Computing (Vol. 352). Springer International Publishing. https://doi.org/10.1007/978-3-319-50252-6 Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1, 886–893. https://doi.org/10.1109/CVPR.2005.177 De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic Instance Segmentation with a Discriminative Loss Function. ArXiv:1708.02551 [Cs]. http://arxiv.org/abs/1708.02551 Delteil, T., Belval, E., Chen, L., Goncalves, L., & Mahadevan, V. (2022). MATrIX -- Modality-Aware Transformer for Information eXtraction (arXiv:2205.08094). arXiv. http://arxiv.org/abs/2205.08094 Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805 Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167 Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). DSSD: Deconvolutional Single Shot Detector. ArXiv:1701.06659 [Cs]. http://arxiv.org/abs/1701.06659 Ghazvininejad, M., Levy, O., Liu, Y., & Zettlemoyer, L. (2019). Mask-Predict: Parallel Decoding of Conditional Masked Language Models. ArXiv:1904.09324 [Cs, Stat]. http://arxiv.org/abs/1904.09324 Girshick, R. (2015). Fast R-CNN. ArXiv:1504.08083 [Cs]. http://arxiv.org/abs/1504.08083 Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv:1311.2524 [Cs]. http://arxiv.org/abs/1311.2524 Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. 8. Graves, A., & Schmidhuber, J. (2007). Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. In Artificial Neural Networks – ICANN 2007 (Vol. 4668, pp. 549–558). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-74690-4_56 Gu, J., Bradbury, J., Xiong, C., Li, V. O. K., & Socher, R. (2018). Non-Autoregressive Neural Machine Translation. ArXiv:1711.02281 [Cs]. http://arxiv.org/abs/1711.02281 Gu, J., Kuen, J., Morariu, V. I., Zhao, H., Barmpalios, N., Jain, R., Nenkova, A., & Sun, T. (2022). Unified Pretraining Framework for Document Understanding (arXiv:2204.10939). arXiv. http://arxiv.org/abs/2204.10939 He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2018). Mask R-CNN. ArXiv:1703.06870 [Cs]. http://arxiv.org/abs/1703.06870 He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385 Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022). LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking (arXiv:2204.08387). arXiv. http://arxiv.org/abs/2204.08387 Islam, N., Islam, Z., & Noor, N. (2016). A Survey on Optical Character Recognition System. Journal of Information, 10(2), 4. Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., & Qu, R. (2019). A Survey of Deep Learning-Based Object Detection. IEEE Access, 7, 128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201 Katti, A. R., Reisswig, C., Guder, C., Brarda, S., Bickel, S., Höhne, J., & Faddoul, J. B. (2018). Chargrid: Towards Understanding 2D Documents. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4459–4469. https://doi.org/10.18653/v1/D18-1476 Kim, G., Hong, T., Yim, M., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., & Park, S. (2021). Donut: Document Understanding Transformer without OCR (arXiv:2111.15664). arXiv. http://arxiv.org/abs/2111.15664 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386 Le, A. D., Pham, D. V., & Nguyen, T. A. (2019). Deep Learning Approach for Receipt Recognition. In T. K. Dang, J. Küng, M. Takizawa, & S. H. Bui (Eds.), Future Data and Security Engineering (Vol. 11814, pp. 705–712). Springer International Publishing. https://doi.org/10.1007/978-3-030-35653-8_50 Lebourgeois, F., Bublinski, Z., & Emptoz, H. (1992). A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents. Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems, 272–276. https://doi.org/10.1109/ICPR.1992.201771 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539 Li, X., Zheng, Y., Hu, Y., Cao, H., Wu, Y., Jiang, D., Liu, Y., & Ren, B. (2022). Relational Representation Learning in Visually-Rich Documents (arXiv:2205.02411). arXiv. http://arxiv.org/abs/2205.02411 Li, Y., Qian, Y., Yu, Y., Qin, X., Zhang, C., Liu, Y., Yao, K., Han, J., Liu, J., & Ding, E. (2021). StrucTexT: Structured Text Understanding with Multi-Modal Transformers (arXiv:2108.02923). arXiv. http://arxiv.org/abs/2108.02923 Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. ArXiv:1612.03144 [Cs]. http://arxiv.org/abs/1612.03144 Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal Loss for Dense Object Detection. ArXiv:1708.02002 [Cs]. http://arxiv.org/abs/1708.02002 Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollár, P. (2015). Microsoft COCO: Common Objects in Context. ArXiv:1405.0312 [Cs]. http://arxiv.org/abs/1405.0312 Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision, 128(2), 261–318. https://doi.org/10.1007/s11263-019-01247-4 Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. ArXiv:1512.02325 [Cs], 9905, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2 Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. ArXiv:1411.4038 [Cs]. http://arxiv.org/abs/1411.4038 Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94 Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. https://doi.org/10.1016/j.imavis.2004.02.006 Mori, S., Nishida, H., & Yamada, H. (1999). Optical Character Recognition. Namysl, M., & Konya, I. (2019). Efficient, Lexicon-Free OCR using Deep Learning. ArXiv:1906.01969 [Cs]. http://arxiv.org/abs/1906.01969 Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091 Oord, A. van den, Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G. van den, Lockhart, E., Cobo, L. C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., … Hassabis, D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis. ArXiv:1711.10433 [Cs]. http://arxiv.org/abs/1711.10433 Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130 Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., & Tran, D. (2018). Image Transformer. ArXiv:1802.05751 [Cs]. http://arxiv.org/abs/1802.05751 Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. ArXiv:1802.05365 [Cs]. http://arxiv.org/abs/1802.05365 Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. ArXiv:1506.02640 [Cs]. http://arxiv.org/abs/1506.02640 Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. ArXiv:1612.08242 [Cs]. http://arxiv.org/abs/1612.08242 Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. ArXiv:1804.02767 [Cs]. http://arxiv.org/abs/1804.02767 Ren, M., & Zemel, R. S. (2017). End-to-End Instance Segmentation with Recurrent Attention. ArXiv:1605.09410 [Cs]. http://arxiv.org/abs/1605.09410 Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ArXiv:1506.01497 [Cs]. http://arxiv.org/abs/1506.01497 Rezatofighi, S. H., G, V. K. B., Milan, A., Abbasnejad, E., Dick, A., & Reid, I. (2017). DeepSetNet: Predicting Sets with Deep Neural Networks. ArXiv:1611.08998 [Cs]. http://arxiv.org/abs/1611.08998 Romera-Paredes, B., & Torr, P. H. S. (2016). Recurrent Instance Segmentation. ArXiv:1511.08250 [Cs]. http://arxiv.org/abs/1511.08250 Rothe, R., Guillaumin, M., & Van Gool, L. (2015). Non-maximum Suppression for Object Detection by Passing Messages Between Windows. In D. Cremers, I. Reid, H. Saito, & M.-H. Yang (Eds.), Computer Vision – ACCV 2014 (Vol. 9003, pp. 290–306). Springer International Publishing. https://doi.org/10.1007/978-3-319-16865-4_19 Sabu, A. M., & Das, A. S. (2018). A Survey on various Optical Character Recognition Techniques. 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), 152–155. https://doi.org/10.1109/ICEDSS.2018.8544323 Salvador, A., Bellver, M., Campos, V., Baradad, M., Marques, F., Torres, J., & Giro-i-Nieto, X. (2019). Recurrent Neural Networks for Semantic Instance Segmentation. ArXiv:1712.00617 [Cs]. http://arxiv.org/abs/1712.00617 Satti, D. A. (2013). Offline Urdu Nastaliq OCR for Printed Text using Analytical Approach. 161. Shen, H., & Coughlan, J. M. (2012). Towards a Real-Time System for Finding and Reading Signs for Visually Impaired Users. In K. Miesenberger, A. Karshmer, P. Penaz, & W. Zagler (Eds.), Computers Helping People with Special Needs (Vol. 7383, pp. 41–47). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-31534-3_7 Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306 Shi, B., Bai, X., & Yao, C. (2015). An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. ArXiv:1507.05717 [Cs]. http://arxiv.org/abs/1507.05717 Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556 Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 629–633. https://doi.org/10.1109/ICDAR.2007.4376991 Stewart, R., & Andriluka, M. (2015). End-to-end people detection in crowded scenes. ArXiv:1506.04878 [Cs]. http://arxiv.org/abs/1506.04878 Subramani, N., Matton, A., Greaves, M., & Lam, A. (2020). A Survey of Deep Learning Approaches for OCR and Document Understanding. ArXiv:2011.13534 [Cs]. http://arxiv.org/abs/2011.13534 Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv:1409.4842 [Cs]. http://arxiv.org/abs/1409.4842 Tan, M., & Le, Q. V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv:1905.11946 [Cs, Stat]. http://arxiv.org/abs/1905.11946 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762 Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 1, I-511-I–518. https://doi.org/10.1109/CVPR.2001.990517 Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172 Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2022). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (arXiv:2012.14740). arXiv. http://arxiv.org/abs/2012.14740 Zaidi, S. S. A., Ansari, M. S., Aslam, A., Kanwal, N., Asghar, M., & Lee, B. (2021). A Survey of Modern Deep Learning based Object Detection Models. ArXiv:2104.11892 [Cs, Eess]. http://arxiv.org/abs/2104.11892 Zhang, P., Xu, Y., Cheng, Z., Pu, S., Lu, J., Qiao, L., Niu, Y., & Wu, F. (2021). TRIE: End-to-End Text Reading and Information Extraction for Document Understanding (arXiv:2005.13118). arXiv. http://arxiv.org/abs/2005.13118 Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-Shot Refinement Neural Network for Object Detection. ArXiv:1711.06897 [Cs]. http://arxiv.org/abs/1711.06897 Zhang, Y., Hare, J., & Prügel-Bennett, A. (2020). Deep Set Prediction Networks. ArXiv:1906.06565 [Cs, Stat]. http://arxiv.org/abs/1906.06565 Zhang, Z., Ma, J., Du, J., Wang, L., & Zhang, J. (2022). Multimodal Pre-training Based on Graph Attention Network for Document Understanding (arXiv:2203.13530). arXiv. http://arxiv.org/abs/2203.13530 Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. (2019). M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network. ArXiv:1811.04533 [Cs]. http://arxiv.org/abs/1811.04533 Zhao, X., Niu, E., Wu, Z., & Wang, X. (2019). CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor. ArXiv:1903.12363 [Cs]. http://arxiv.org/abs/1903.12363 Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021). DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION. 16. Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object Detection in 20 Years: A Survey. 40. |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Reconocimiento 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Reconocimiento 4.0 Internacional http://creativecommons.org/licenses/by/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.spa.fl_str_mv |
67 páginas |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
Universidad Nacional de Colombia |
dc.publisher.program.spa.fl_str_mv |
Medellín - Minas - Maestría en Ingeniería - Analítica |
dc.publisher.department.spa.fl_str_mv |
Departamento de la Computación y la Decisión |
dc.publisher.faculty.spa.fl_str_mv |
Facultad de Minas |
dc.publisher.place.spa.fl_str_mv |
Medellín |
dc.publisher.branch.spa.fl_str_mv |
Universidad Nacional de Colombia - Sede Medellín |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/82000/1/1017231914.2022.pdf https://repositorio.unal.edu.co/bitstream/unal/82000/2/license.txt https://repositorio.unal.edu.co/bitstream/unal/82000/3/1017231914.2022.pdf.jpg |
bitstream.checksum.fl_str_mv |
158554034e1cafc545f3e08c65c9e66d 8153f7789df02f0a4c9e079953658ab2 2e11d994dbf59e37fa0a57813b5f1fd3 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089392781787136 |
spelling |
Reconocimiento 4.0 Internacionalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Villa Garzón, Fernán Alonso9c83ea56495b8f17a79c27fd0001bb81600Márquez Aristizábal, Hugo Alejandro7d8af31e6325c06bdfdd23246aa215802022-08-22T21:32:48Z2022-08-22T21:32:48Z2022-06-23https://repositorio.unal.edu.co/handle/unal/82000Universidad Nacional de ColombiaRepositorio Institucional Universidad Nacional de Colombiahttps://repositorio.unal.edu.co/La extracción automática de información de documentos de identidad es una tarea fundamental en diferentes procesos digitales como registros, solicitud de productos, validación de identidad, entre otros. La extracción de información consiste en la identificación, ubicación, clasificación y reconocimiento del texto de campos clave presentes en un documento, en este caso un documento de identidad. Tratándose de documentos de identidad, los campos clave son aquellos como: nombres, apellidos, números de documento, fechas, entre otros. El problema de extracción de información se ha solucionado tradicionalmente utilizando algoritmos basados en reglas y motores clásicos de OCR. En los últimos años se han realizado implementaciones de modelos de aprendizaje de máquina, utilizando modelos de NLP (procesamiento de lenguaje natural) y CV (visión por computador) para solucionar el problema de una manera más flexible y eficiente (Subramani et al., 2020). En este trabajo se propuso solucionar el problema de extracción de información con una aproximación de detección de objetos. Se implementó, entrenó y evaluó un modelo de detección de objetos basado en transformadores (Carion et al., 2020). Se logró llegar a una solución que alcanza valores de precisión superiores al 95% en la detección de campos clave en documentos de identidad. (Texto tomado de la fuente)Automatic information extraction from identity documents is a fundamental task in digital processes such as onboarding, requesting products, identity validation, among others. The information extraction process consists of identifying, locating, classifying and recognizing text of the corresponding key fields that an identity document contains. In the case of identity documents, key fields are: names, last names, document number, dates, among others. The information extraction problem has been traditionally solved using rule based algorithms and classic OCR engines. In the last few years there have been implementations based on machine learning models, using NLP (natural language processing) and CV (computer vision) to solve the problem in a more flexible and efficient way (Subramani et al., 2020). This work proposes to solve the problem of information extraction with an object detection approach. An object detection model based on transformers (Carion et al., 2020) was implemented, trained and evaluated. A solution with above 95% accuracy in detecting key fields on identification documents was achieved.MaestríaMagíster en Ingeniería - AnalíticaÁrea Curricular de Ingeniería de Sistemas e Informática67 páginasapplication/pdfspaUniversidad Nacional de ColombiaMedellín - Minas - Maestría en Ingeniería - AnalíticaDepartamento de la Computación y la DecisiónFacultad de MinasMedellínUniversidad Nacional de Colombia - Sede Medellín000 - Ciencias de la computación, información y obras generales::003 - SistemasIdentidad digitalReconocimiento óptico de caracteresRedes neuronales (Computadoers)Ientidad digitalOCRExtracción de informaciónDetección de objetosDigital identityInformation extractionObject detectionExtracción de información de documentos de identidad utilizando técnicas de aprendizaje de máquinaInformation extraction from identification documents using machine learning techniquesTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMAl-Badr, B., & Mahmoud, S. A. (1995). Survey and bibliography of Arabic optical text recognition. Signal Processing, 41(1), 49–77. https://doi.org/10.1016/0165-1684(94)00090-MAmin, A., & Shiu, R. (2001). Page Segmentation and Classification utilizing Bottom-up Approach. International Journal of Image and Graphics, 01(02), 345–361. https://doi.org/10.1142/S0219467801000219Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding (arXiv:2106.11539). arXiv. http://arxiv.org/abs/2106.11539Arlazarov, V. V., Bulatov, K., Chernov, T., & Arlazarov, V. L. (2019). MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics, 43(5), 818–824. https://doi.org/10.18287/2412-6179-2019-43-5-818-824Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473Bajaj, R., Dey, L., & Chaudhury, S. (2002). Devnagari numeral recognition by combining decision of multiple connectionist classifiers. Sadhana, 27(1), 59–72. https://doi.org/10.1007/BF02703312Bello, I., Zoph, B., Vaswani, A., Shlens, J., & Le, Q. V. (2020). Attention Augmented Convolutional Networks. ArXiv:1904.09925 [Cs]. http://arxiv.org/abs/1904.09925Bertolotti, M. (2011). Optical Pattern Recognition, edited by F.T.S Yu and S. Jutamulia.Bhavani, S., & Thanushkodi, D. K. (2010). A Survey On Coding Algorithms In Medical Image Compression. 02(05), 7.Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. ArXiv:2005.12872 [Cs]. http://arxiv.org/abs/2005.12872Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., & Ochoa, M. (2020). Machine Learning Techniques for Identity Document Verification in Uncontrolled Environments: A Case Study. In K. M. Figueroa Mora, J. Anzurez Marín, J. Cerda, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, & J. A. Olvera-López (Eds.), Pattern Recognition (Vol. 12088, pp. 271–281). Springer International Publishing. https://doi.org/10.1007/978-3-030-49076-8_26Chan, W., Saharia, C., Hinton, G., Norouzi, M., & Jaitly, N. (2020). Imputer: Sequence Modelling via Imputation and Dynamic Programming. ArXiv:2002.08926 [Cs, Eess]. http://arxiv.org/abs/2002.08926Chaudhuri, A., Mandaviya, K., Badelia, P., & K Ghosh, S. (2017). Optical Character Recognition Systems for Different Languages with Soft Computing (Vol. 352). Springer International Publishing. https://doi.org/10.1007/978-3-319-50252-6Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1, 886–893. https://doi.org/10.1109/CVPR.2005.177De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic Instance Segmentation with a Discriminative Loss Function. ArXiv:1708.02551 [Cs]. http://arxiv.org/abs/1708.02551Delteil, T., Belval, E., Chen, L., Goncalves, L., & Mahadevan, V. (2022). MATrIX -- Modality-Aware Transformer for Information eXtraction (arXiv:2205.08094). arXiv. http://arxiv.org/abs/2205.08094Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). DSSD: Deconvolutional Single Shot Detector. ArXiv:1701.06659 [Cs]. http://arxiv.org/abs/1701.06659Ghazvininejad, M., Levy, O., Liu, Y., & Zettlemoyer, L. (2019). Mask-Predict: Parallel Decoding of Conditional Masked Language Models. ArXiv:1904.09324 [Cs, Stat]. http://arxiv.org/abs/1904.09324Girshick, R. (2015). Fast R-CNN. ArXiv:1504.08083 [Cs]. http://arxiv.org/abs/1504.08083Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv:1311.2524 [Cs]. http://arxiv.org/abs/1311.2524Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. 8.Graves, A., & Schmidhuber, J. (2007). Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. In Artificial Neural Networks – ICANN 2007 (Vol. 4668, pp. 549–558). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-74690-4_56Gu, J., Bradbury, J., Xiong, C., Li, V. O. K., & Socher, R. (2018). Non-Autoregressive Neural Machine Translation. ArXiv:1711.02281 [Cs]. http://arxiv.org/abs/1711.02281Gu, J., Kuen, J., Morariu, V. I., Zhao, H., Barmpalios, N., Jain, R., Nenkova, A., & Sun, T. (2022). Unified Pretraining Framework for Document Understanding (arXiv:2204.10939). arXiv. http://arxiv.org/abs/2204.10939He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2018). Mask R-CNN. ArXiv:1703.06870 [Cs]. http://arxiv.org/abs/1703.06870He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022). LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking (arXiv:2204.08387). arXiv. http://arxiv.org/abs/2204.08387Islam, N., Islam, Z., & Noor, N. (2016). A Survey on Optical Character Recognition System. Journal of Information, 10(2), 4.Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., & Qu, R. (2019). A Survey of Deep Learning-Based Object Detection. IEEE Access, 7, 128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201Katti, A. R., Reisswig, C., Guder, C., Brarda, S., Bickel, S., Höhne, J., & Faddoul, J. B. (2018). Chargrid: Towards Understanding 2D Documents. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4459–4469. https://doi.org/10.18653/v1/D18-1476Kim, G., Hong, T., Yim, M., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., & Park, S. (2021). Donut: Document Understanding Transformer without OCR (arXiv:2111.15664). arXiv. http://arxiv.org/abs/2111.15664Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386Le, A. D., Pham, D. V., & Nguyen, T. A. (2019). Deep Learning Approach for Receipt Recognition. In T. K. Dang, J. Küng, M. Takizawa, & S. H. Bui (Eds.), Future Data and Security Engineering (Vol. 11814, pp. 705–712). Springer International Publishing. https://doi.org/10.1007/978-3-030-35653-8_50Lebourgeois, F., Bublinski, Z., & Emptoz, H. (1992). A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents. Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems, 272–276. https://doi.org/10.1109/ICPR.1992.201771LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539Li, X., Zheng, Y., Hu, Y., Cao, H., Wu, Y., Jiang, D., Liu, Y., & Ren, B. (2022). Relational Representation Learning in Visually-Rich Documents (arXiv:2205.02411). arXiv. http://arxiv.org/abs/2205.02411Li, Y., Qian, Y., Yu, Y., Qin, X., Zhang, C., Liu, Y., Yao, K., Han, J., Liu, J., & Ding, E. (2021). StrucTexT: Structured Text Understanding with Multi-Modal Transformers (arXiv:2108.02923). arXiv. http://arxiv.org/abs/2108.02923Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. ArXiv:1612.03144 [Cs]. http://arxiv.org/abs/1612.03144Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal Loss for Dense Object Detection. ArXiv:1708.02002 [Cs]. http://arxiv.org/abs/1708.02002Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollár, P. (2015). Microsoft COCO: Common Objects in Context. ArXiv:1405.0312 [Cs]. http://arxiv.org/abs/1405.0312Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision, 128(2), 261–318. https://doi.org/10.1007/s11263-019-01247-4Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. ArXiv:1512.02325 [Cs], 9905, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. ArXiv:1411.4038 [Cs]. http://arxiv.org/abs/1411.4038Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. https://doi.org/10.1016/j.imavis.2004.02.006Mori, S., Nishida, H., & Yamada, H. (1999). Optical Character Recognition.Namysl, M., & Konya, I. (2019). Efficient, Lexicon-Free OCR using Deep Learning. ArXiv:1906.01969 [Cs]. http://arxiv.org/abs/1906.01969Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091Oord, A. van den, Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G. van den, Lockhart, E., Cobo, L. C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., … Hassabis, D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis. ArXiv:1711.10433 [Cs]. http://arxiv.org/abs/1711.10433Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., & Tran, D. (2018). Image Transformer. ArXiv:1802.05751 [Cs]. http://arxiv.org/abs/1802.05751Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. ArXiv:1802.05365 [Cs]. http://arxiv.org/abs/1802.05365Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. ArXiv:1506.02640 [Cs]. http://arxiv.org/abs/1506.02640Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. ArXiv:1612.08242 [Cs]. http://arxiv.org/abs/1612.08242Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. ArXiv:1804.02767 [Cs]. http://arxiv.org/abs/1804.02767Ren, M., & Zemel, R. S. (2017). End-to-End Instance Segmentation with Recurrent Attention. ArXiv:1605.09410 [Cs]. http://arxiv.org/abs/1605.09410Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ArXiv:1506.01497 [Cs]. http://arxiv.org/abs/1506.01497Rezatofighi, S. H., G, V. K. B., Milan, A., Abbasnejad, E., Dick, A., & Reid, I. (2017). DeepSetNet: Predicting Sets with Deep Neural Networks. ArXiv:1611.08998 [Cs]. http://arxiv.org/abs/1611.08998Romera-Paredes, B., & Torr, P. H. S. (2016). Recurrent Instance Segmentation. ArXiv:1511.08250 [Cs]. http://arxiv.org/abs/1511.08250Rothe, R., Guillaumin, M., & Van Gool, L. (2015). Non-maximum Suppression for Object Detection by Passing Messages Between Windows. In D. Cremers, I. Reid, H. Saito, & M.-H. Yang (Eds.), Computer Vision – ACCV 2014 (Vol. 9003, pp. 290–306). Springer International Publishing. https://doi.org/10.1007/978-3-319-16865-4_19Sabu, A. M., & Das, A. S. (2018). A Survey on various Optical Character Recognition Techniques. 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), 152–155. https://doi.org/10.1109/ICEDSS.2018.8544323Salvador, A., Bellver, M., Campos, V., Baradad, M., Marques, F., Torres, J., & Giro-i-Nieto, X. (2019). Recurrent Neural Networks for Semantic Instance Segmentation. ArXiv:1712.00617 [Cs]. http://arxiv.org/abs/1712.00617Satti, D. A. (2013). Offline Urdu Nastaliq OCR for Printed Text using Analytical Approach. 161.Shen, H., & Coughlan, J. M. (2012). Towards a Real-Time System for Finding and Reading Signs for Visually Impaired Users. In K. Miesenberger, A. Karshmer, P. Penaz, & W. Zagler (Eds.), Computers Helping People with Special Needs (Vol. 7383, pp. 41–47). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-31534-3_7Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306Shi, B., Bai, X., & Yao, C. (2015). An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. ArXiv:1507.05717 [Cs]. http://arxiv.org/abs/1507.05717Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 629–633. https://doi.org/10.1109/ICDAR.2007.4376991Stewart, R., & Andriluka, M. (2015). End-to-end people detection in crowded scenes. ArXiv:1506.04878 [Cs]. http://arxiv.org/abs/1506.04878Subramani, N., Matton, A., Greaves, M., & Lam, A. (2020). A Survey of Deep Learning Approaches for OCR and Document Understanding. ArXiv:2011.13534 [Cs]. http://arxiv.org/abs/2011.13534Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv:1409.4842 [Cs]. http://arxiv.org/abs/1409.4842Tan, M., & Le, Q. V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv:1905.11946 [Cs, Stat]. http://arxiv.org/abs/1905.11946Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 1, I-511-I–518. https://doi.org/10.1109/CVPR.2001.990517Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2022). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (arXiv:2012.14740). arXiv. http://arxiv.org/abs/2012.14740Zaidi, S. S. A., Ansari, M. S., Aslam, A., Kanwal, N., Asghar, M., & Lee, B. (2021). A Survey of Modern Deep Learning based Object Detection Models. ArXiv:2104.11892 [Cs, Eess]. http://arxiv.org/abs/2104.11892Zhang, P., Xu, Y., Cheng, Z., Pu, S., Lu, J., Qiao, L., Niu, Y., & Wu, F. (2021). TRIE: End-to-End Text Reading and Information Extraction for Document Understanding (arXiv:2005.13118). arXiv. http://arxiv.org/abs/2005.13118Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-Shot Refinement Neural Network for Object Detection. ArXiv:1711.06897 [Cs]. http://arxiv.org/abs/1711.06897Zhang, Y., Hare, J., & Prügel-Bennett, A. (2020). Deep Set Prediction Networks. ArXiv:1906.06565 [Cs, Stat]. http://arxiv.org/abs/1906.06565Zhang, Z., Ma, J., Du, J., Wang, L., & Zhang, J. (2022). Multimodal Pre-training Based on Graph Attention Network for Document Understanding (arXiv:2203.13530). arXiv. http://arxiv.org/abs/2203.13530Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., & Ling, H. (2019). M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network. ArXiv:1811.04533 [Cs]. http://arxiv.org/abs/1811.04533Zhao, X., Niu, E., Wu, Z., & Wang, X. (2019). CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor. ArXiv:1903.12363 [Cs]. http://arxiv.org/abs/1903.12363Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021). DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION. 16.Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object Detection in 20 Years: A Survey. 40.EstudiantesInvestigadoresORIGINAL1017231914.2022.pdf1017231914.2022.pdfapplication/pdf7517510https://repositorio.unal.edu.co/bitstream/unal/82000/1/1017231914.2022.pdf158554034e1cafc545f3e08c65c9e66dMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-84074https://repositorio.unal.edu.co/bitstream/unal/82000/2/license.txt8153f7789df02f0a4c9e079953658ab2MD52THUMBNAIL1017231914.2022.pdf.jpg1017231914.2022.pdf.jpgGenerated Thumbnailimage/jpeg4276https://repositorio.unal.edu.co/bitstream/unal/82000/3/1017231914.2022.pdf.jpg2e11d994dbf59e37fa0a57813b5f1fd3MD53unal/82000oai:repositorio.unal.edu.co:unal/820002023-10-06 17:38:53.423Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.coUExBTlRJTExBIERFUMOTU0lUTwoKQ29tbyBlZGl0b3IgZGUgZXN0ZSDDrXRlbSwgdXN0ZWQgcHVlZGUgbW92ZXJsbyBhIHJldmlzacOzbiBzaW4gYW50ZXMgcmVzb2x2ZXIgbG9zIHByb2JsZW1hcyBpZGVudGlmaWNhZG9zLCBkZSBsbyBjb250cmFyaW8sIGhhZ2EgY2xpYyBlbiBHdWFyZGFyIHBhcmEgZ3VhcmRhciBlbCDDrXRlbSB5IHNvbHVjaW9uYXIgZXN0b3MgcHJvYmxlbWFzIG1hcyB0YXJkZS4KClBhcmEgdHJhYmFqb3MgZGVwb3NpdGFkb3MgcG9yIHN1IHByb3BpbyBhdXRvcjoKIApBbCBhdXRvYXJjaGl2YXIgZXN0ZSBncnVwbyBkZSBhcmNoaXZvcyBkaWdpdGFsZXMgeSBzdXMgbWV0YWRhdG9zLCB5byBnYXJhbnRpem8gYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBVbmFsIGVsIGRlcmVjaG8gYSBhbG1hY2VuYXJsb3MgeSBtYW50ZW5lcmxvcyBkaXNwb25pYmxlcyBlbiBsw61uZWEgZGUgbWFuZXJhIGdyYXR1aXRhLiBEZWNsYXJvIHF1ZSBsYSBvYnJhIGVzIGRlIG1pIHByb3BpZWRhZCBpbnRlbGVjdHVhbCB5IHF1ZSBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsIFVuYWwgbm8gYXN1bWUgbmluZ3VuYSByZXNwb25zYWJpbGlkYWQgc2kgaGF5IGFsZ3VuYSB2aW9sYWNpw7NuIGEgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIGFsIGRpc3RyaWJ1aXIgZXN0b3MgYXJjaGl2b3MgeSBtZXRhZGF0b3MuIChTZSByZWNvbWllbmRhIGEgdG9kb3MgbG9zIGF1dG9yZXMgYSBpbmRpY2FyIHN1cyBkZXJlY2hvcyBkZSBhdXRvciBlbiBsYSBww6FnaW5hIGRlIHTDrXR1bG8gZGUgc3UgZG9jdW1lbnRvLikgRGUgbGEgbWlzbWEgbWFuZXJhLCBhY2VwdG8gbG9zIHTDqXJtaW5vcyBkZSBsYSBzaWd1aWVudGUgbGljZW5jaWE6IExvcyBhdXRvcmVzIG8gdGl0dWxhcmVzIGRlbCBkZXJlY2hvIGRlIGF1dG9yIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8gY29uZmllcmVuIGEgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgdW5hIGxpY2VuY2lhIG5vIGV4Y2x1c2l2YSwgbGltaXRhZGEgeSBncmF0dWl0YSBzb2JyZSBsYSBvYnJhIHF1ZSBzZSBpbnRlZ3JhIGVuIGVsIFJlcG9zaXRvcmlvIEluc3RpdHVjaW9uYWwsIHF1ZSBzZSBhanVzdGEgYSBsYXMgc2lndWllbnRlcyBjYXJhY3RlcsOtc3RpY2FzOiBhKSBFc3RhcsOhIHZpZ2VudGUgYSBwYXJ0aXIgZGUgbGEgZmVjaGEgZW4gcXVlIHNlIGluY2x1eWUgZW4gZWwgcmVwb3NpdG9yaW8sIHF1ZSBzZXLDoW4gcHJvcnJvZ2FibGVzIGluZGVmaW5pZGFtZW50ZSBwb3IgZWwgdGllbXBvIHF1ZSBkdXJlIGVsIGRlcmVjaG8gcGF0cmltb25pYWwgZGVsIGF1dG9yLiBFbCBhdXRvciBwb2Ryw6EgZGFyIHBvciB0ZXJtaW5hZGEgbGEgbGljZW5jaWEgc29saWNpdMOhbmRvbG8gYSBsYSBVbml2ZXJzaWRhZC4gYikgTG9zIGF1dG9yZXMgYXV0b3JpemFuIGEgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgcGFyYSBwdWJsaWNhciBsYSBvYnJhIGVuIGVsIGZvcm1hdG8gcXVlIGVsIHJlcG9zaXRvcmlvIGxvIHJlcXVpZXJhIChpbXByZXNvLCBkaWdpdGFsLCBlbGVjdHLDs25pY28gbyBjdWFscXVpZXIgb3RybyBjb25vY2lkbyBvIHBvciBjb25vY2VyKSB5IGNvbm9jZW4gcXVlIGRhZG8gcXVlIHNlIHB1YmxpY2EgZW4gSW50ZXJuZXQgcG9yIGVzdGUgaGVjaG8gY2lyY3VsYSBjb24gYWxjYW5jZSBtdW5kaWFsLiBjKSBMb3MgYXV0b3JlcyBhY2VwdGFuIHF1ZSBsYSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgYSB0w610dWxvIGdyYXR1aXRvLCBwb3IgbG8gdGFudG8sIHJlbnVuY2lhbiBhIHJlY2liaXIgZW1vbHVtZW50byBhbGd1bm8gcG9yIGxhIHB1YmxpY2FjacOzbiwgZGlzdHJpYnVjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSB5IGN1YWxxdWllciBvdHJvIHVzbyBxdWUgc2UgaGFnYSBlbiBsb3MgdMOpcm1pbm9zIGRlIGxhIHByZXNlbnRlIGxpY2VuY2lhIHkgZGUgbGEgbGljZW5jaWEgQ3JlYXRpdmUgQ29tbW9ucyBjb24gcXVlIHNlIHB1YmxpY2EuIGQpIExvcyBhdXRvcmVzIG1hbmlmaWVzdGFuIHF1ZSBzZSB0cmF0YSBkZSB1bmEgb2JyYSBvcmlnaW5hbCBzb2JyZSBsYSBxdWUgdGllbmVuIGxvcyBkZXJlY2hvcyBxdWUgYXV0b3JpemFuIHkgcXVlIHNvbiBlbGxvcyBxdWllbmVzIGFzdW1lbiB0b3RhbCByZXNwb25zYWJpbGlkYWQgcG9yIGVsIGNvbnRlbmlkbyBkZSBzdSBvYnJhIGFudGUgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgeSBhbnRlIHRlcmNlcm9zLiBFbiB0b2RvIGNhc28gbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgc2UgY29tcHJvbWV0ZSBhIGluZGljYXIgc2llbXByZSBsYSBhdXRvcsOtYSBpbmNsdXllbmRvIGVsIG5vbWJyZSBkZWwgYXV0b3IgeSBsYSBmZWNoYSBkZSBwdWJsaWNhY2nDs24uIGUpIExvcyBhdXRvcmVzIGF1dG9yaXphbiBhIGxhIFVuaXZlcnNpZGFkIHBhcmEgaW5jbHVpciBsYSBvYnJhIGVuIGxvcyBhZ3JlZ2Fkb3JlcywgaW5kaWNlc3MgeSBidXNjYWRvcmVzIHF1ZSBzZSBlc3RpbWVuIG5lY2VzYXJpb3MgcGFyYSBwcm9tb3ZlciBzdSBkaWZ1c2nDs24uIGYpIExvcyBhdXRvcmVzIGFjZXB0YW4gcXVlIGxhIFVuaXZlcnNpZGFkIE5hY2lvbmFsIGRlIENvbG9tYmlhIHB1ZWRhIGNvbnZlcnRpciBlbCBkb2N1bWVudG8gYSBjdWFscXVpZXIgbWVkaW8gbyBmb3JtYXRvIHBhcmEgcHJvcMOzc2l0b3MgZGUgcHJlc2VydmFjacOzbiBkaWdpdGFsLiBTSSBFTCBET0NVTUVOVE8gU0UgQkFTQSBFTiBVTiBUUkFCQUpPIFFVRSBIQSBTSURPIFBBVFJPQ0lOQURPIE8gQVBPWUFETyBQT1IgVU5BIEFHRU5DSUEgTyBVTkEgT1JHQU5JWkFDScOTTiwgQ09OIEVYQ0VQQ0nDk04gREUgTEEgVU5JVkVSU0lEQUQgTkFDSU9OQUwgREUgQ09MT01CSUEsIExPUyBBVVRPUkVTIEdBUkFOVElaQU4gUVVFIFNFIEhBIENVTVBMSURPIENPTiBMT1MgREVSRUNIT1MgWSBPQkxJR0FDSU9ORVMgUkVRVUVSSURPUyBQT1IgRUwgUkVTUEVDVElWTyBDT05UUkFUTyBPIEFDVUVSRE8uIAoKUGFyYSB0cmFiYWpvcyBkZXBvc2l0YWRvcyBwb3Igb3RyYXMgcGVyc29uYXMgZGlzdGludGFzIGEgc3UgYXV0b3I6IAoKRGVjbGFybyBxdWUgZWwgZ3J1cG8gZGUgYXJjaGl2b3MgZGlnaXRhbGVzIHkgbWV0YWRhdG9zIGFzb2NpYWRvcyBxdWUgZXN0b3kgYXJjaGl2YW5kbyBlbiBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsIFVOKSBlcyBkZSBkb21pbmlvIHDDumJsaWNvLiBTaSBubyBmdWVzZSBlbCBjYXNvLCBhY2VwdG8gdG9kYSBsYSByZXNwb25zYWJpbGlkYWQgcG9yIGN1YWxxdWllciBpbmZyYWNjacOzbiBkZSBkZXJlY2hvcyBkZSBhdXRvciBxdWUgY29ubGxldmUgbGEgZGlzdHJpYnVjacOzbiBkZSBlc3RvcyBhcmNoaXZvcyB5IG1ldGFkYXRvcy4KTk9UQTogU0kgTEEgVEVTSVMgQSBQVUJMSUNBUiBBRFFVSVJJw5MgQ09NUFJPTUlTT1MgREUgQ09ORklERU5DSUFMSURBRCBFTiBFTCBERVNBUlJPTExPIE8gUEFSVEVTIERFTCBET0NVTUVOVE8uIFNJR0EgTEEgRElSRUNUUklaIERFIExBIFJFU09MVUNJw5NOIDAyMyBERSAyMDE1LCBQT1IgTEEgQ1VBTCBTRSBFU1RBQkxFQ0UgRUwgUFJPQ0VESU1JRU5UTyBQQVJBIExBIFBVQkxJQ0FDScOTTiBERSBURVNJUyBERSBNQUVTVFLDjUEgWSBET0NUT1JBRE8gREUgTE9TIEVTVFVESUFOVEVTIERFIExBIFVOSVZFUlNJREFEIE5BQ0lPTkFMIERFIENPTE9NQklBIEVOIEVMIFJFUE9TSVRPUklPIElOU1RJVFVDSU9OQUwgVU4sIEVYUEVESURBIFBPUiBMQSBTRUNSRVRBUsONQSBHRU5FUkFMLiAqTEEgVEVTSVMgQSBQVUJMSUNBUiBERUJFIFNFUiBMQSBWRVJTScOTTiBGSU5BTCBBUFJPQkFEQS4gCgpBbCBoYWNlciBjbGljIGVuIGVsIHNpZ3VpZW50ZSBib3TDs24sIHVzdGVkIGluZGljYSBxdWUgZXN0w6EgZGUgYWN1ZXJkbyBjb24gZXN0b3MgdMOpcm1pbm9zLiBTaSB0aWVuZSBhbGd1bmEgZHVkYSBzb2JyZSBsYSBsaWNlbmNpYSwgcG9yIGZhdm9yLCBjb250YWN0ZSBjb24gZWwgYWRtaW5pc3RyYWRvciBkZWwgc2lzdGVtYS4KClVOSVZFUlNJREFEIE5BQ0lPTkFMIERFIENPTE9NQklBIC0gw5psdGltYSBtb2RpZmljYWNpw7NuIDE5LzEwLzIwMjEK |