Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis

Tensors, an extension of arrays, are widely used in a variety of programming tasks. Tensors are the building blocks of many modern machine learning frameworks and are fundamental in the definition of deep learning models. Linters are indispensable tools for today's developers, as they help the...

Full description

Autores:
Cruz Camacho, Elkin Alejandro
Tipo de recurso:
Fecha de publicación:
2019
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/76335
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/76335
http://bdigital.unal.edu.co/72620/
Palabra clave:
Tensors
Lenguage Python
Programming
Tensores
Lenguaje Python
Programación
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
id UNACIONAL2_7fcde0c2049999d95b18c5f19fbcfdee
oai_identifier_str oai:repositorio.unal.edu.co:unal/76335
network_acronym_str UNACIONAL2
network_name_str Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
title Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
spellingShingle Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
Tensors
Lenguage Python
Programming
Tensores
Lenguaje Python
Programación
title_short Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
title_full Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
title_fullStr Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
title_full_unstemmed Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
title_sort Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis
dc.creator.fl_str_mv Cruz Camacho, Elkin Alejandro
dc.contributor.author.spa.fl_str_mv Cruz Camacho, Elkin Alejandro
dc.contributor.spa.fl_str_mv González Osorio, Fabio Augusto
Restrepo Calle, Felipe
dc.subject.proposal.spa.fl_str_mv Tensors
Lenguage Python
Programming
Tensores
Lenguaje Python
Programación
topic Tensors
Lenguage Python
Programming
Tensores
Lenguaje Python
Programación
description Tensors, an extension of arrays, are widely used in a variety of programming tasks. Tensors are the building blocks of many modern machine learning frameworks and are fundamental in the definition of deep learning models. Linters are indispensable tools for today's developers, as they help the developers to check code before executing it. Despite the popularity of tensors, linters for Python that check and flag code with tensors are nonexistent. Given the tremendous amount of work done in Python with (and without) tensors, it is quite baffling that little work has been done in this regard. Abstract Interpretation is a methodology/framework for statically analysing code. The idea of Abstract Interpretation is to soundly overapproximate the result of running a piece of code over all possible inputs of the program. A sound overapproximation ensures that the Abstract Interpreter will never omit a true negative, i.e.~if a piece of code is not flagged by the Abstract Interpreter, then it can be safely assumed that the code will not fail. The Abstract Interpreter can be modified so that it only outputs true positives, although losing soundness, i.e.~the interpreter can flag which parts of the code are going to fail no matter how the code is run. In this work, we specify a subset of Python with emphasis on tensor operations. Our operational Python semantics is based on The Python Language Reference. We define an Abstract Interpreter and present its implementation. We show how each part of the Abstract Interpreter was built: the Abstract Domains defined and the abstract semantics. We present the structure of Pytropos, the Abstract Interpreter implemented. Pytropos is able to check NumPy array operations taking into account broadcasting and complex NumPy functions as \texttt{array} and \texttt{dot}. We constructed 74 unit test cases checking the capabilities of Pytropos and 20 property test cases checking compliance with the official Python implementation. We show what and how many bugs the Abstract Interpreter was able to find.
publishDate 2019
dc.date.issued.spa.fl_str_mv 2019
dc.date.accessioned.spa.fl_str_mv 2020-03-30T06:18:19Z
dc.date.available.spa.fl_str_mv 2020-03-30T06:18:19Z
dc.type.spa.fl_str_mv Trabajo de grado - Maestría
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/masterThesis
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TM
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://repositorio.unal.edu.co/handle/unal/76335
dc.identifier.eprints.spa.fl_str_mv http://bdigital.unal.edu.co/72620/
url https://repositorio.unal.edu.co/handle/unal/76335
http://bdigital.unal.edu.co/72620/
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.ispartof.spa.fl_str_mv Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas
Ingeniería de Sistemas
dc.relation.haspart.spa.fl_str_mv 0 Generalidades / Computer science, information and general works
51 Matemáticas / Mathematics
62 Ingeniería y operaciones afines / Engineering
dc.relation.references.spa.fl_str_mv Cruz Camacho, Elkin Alejandro (2019) Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.
dc.rights.spa.fl_str_mv Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv Atribución-NoComercial 4.0 Internacional
Derechos reservados - Universidad Nacional de Colombia
http://creativecommons.org/licenses/by-nc/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.spa.fl_str_mv application/pdf
institution Universidad Nacional de Colombia
bitstream.url.fl_str_mv https://repositorio.unal.edu.co/bitstream/unal/76335/1/ElkinCruz.2019.pdf
https://repositorio.unal.edu.co/bitstream/unal/76335/2/pytropos.zip
https://repositorio.unal.edu.co/bitstream/unal/76335/3/ElkinCruz.2019.pdf.jpg
bitstream.checksum.fl_str_mv 8e99ad38089d82738643e96f0ca2d5df
4adffa7f6a6b355ddccae96a75c5e25f
f8441b1bfa3b51e5797fca41dd6c72a6
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv repositorio_nal@unal.edu.co
_version_ 1814090196149338112
spelling Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2González Osorio, Fabio AugustoRestrepo Calle, FelipeCruz Camacho, Elkin Alejandro9b89c809-e46d-465e-9c8b-2672f0860eca3002020-03-30T06:18:19Z2020-03-30T06:18:19Z2019https://repositorio.unal.edu.co/handle/unal/76335http://bdigital.unal.edu.co/72620/Tensors, an extension of arrays, are widely used in a variety of programming tasks. Tensors are the building blocks of many modern machine learning frameworks and are fundamental in the definition of deep learning models. Linters are indispensable tools for today's developers, as they help the developers to check code before executing it. Despite the popularity of tensors, linters for Python that check and flag code with tensors are nonexistent. Given the tremendous amount of work done in Python with (and without) tensors, it is quite baffling that little work has been done in this regard. Abstract Interpretation is a methodology/framework for statically analysing code. The idea of Abstract Interpretation is to soundly overapproximate the result of running a piece of code over all possible inputs of the program. A sound overapproximation ensures that the Abstract Interpreter will never omit a true negative, i.e.~if a piece of code is not flagged by the Abstract Interpreter, then it can be safely assumed that the code will not fail. The Abstract Interpreter can be modified so that it only outputs true positives, although losing soundness, i.e.~the interpreter can flag which parts of the code are going to fail no matter how the code is run. In this work, we specify a subset of Python with emphasis on tensor operations. Our operational Python semantics is based on The Python Language Reference. We define an Abstract Interpreter and present its implementation. We show how each part of the Abstract Interpreter was built: the Abstract Domains defined and the abstract semantics. We present the structure of Pytropos, the Abstract Interpreter implemented. Pytropos is able to check NumPy array operations taking into account broadcasting and complex NumPy functions as \texttt{array} and \texttt{dot}. We constructed 74 unit test cases checking the capabilities of Pytropos and 20 property test cases checking compliance with the official Python implementation. We show what and how many bugs the Abstract Interpreter was able to find.Resumen: Los tensores, una extensión de los arrays, se usan de manera extensiva en una gran variedad de problemas en programación. Los tensores son los bloques básicos de construcción de múltiples frameworks de Aprendizaje de Máquina y son fundamentales en la definición de modelos de Aprendizaje Profundo. Los linters son herramientas indispensables para los programadores de hoy en día, ya que estos ayudan a los desarrolladores a revisar el código antes de ejecutarlo. Aunque los tensores son muy populares no existen tensores que revisen código con operaciones tensoriales. Dada la gran cantidad de trabajo hecho en Python con (y sin) tensores, es sorprendente el poco trabajo que se ha hecho en esta área. La Interpretación Abstracta es una metodología/framework diseñada para analizar código de forma estática. La idea de Interpretación Abstracta es sobreaproximar de manera "sound" el resultado de ejecutar una pieza de código sobre todos las posibles entradas del programa. Una sobreaproximación "sound" asegura que el Interprete Abstracto nunca omitirá un verdadero negativo, es decir, si una pieza de código no es señalada como incorrecta por el Interprete Abstracto entonces se puede asumir con seguridad que el código nunca fallará. El Interprete Abstracto puede ser modificado para que sólo informe acerca de verdaderos positivos, aunque se pierda la propiedad de "soundness", es decir, el interprete sólo informa acerca de las partes de código que fallarán sin importar que suceda. En este trabajo, formalizamos un subconjunto de Python con énfasis en operaciones con tensores. Nuestra formalización de la semántica de Python está basada en la Referencia oficial del Lenguage Python. Definimos un Interprete Abstracto y presentamos su implementación. Mostramos como cada parte del Interprete Abstracto fué definido: su Dominio Abstracto y semántica abstracta. Presentamos la estructura de Pytropos, la implementación del Interprete Abstracto. Pytropos es capaz de revisar las operaciones de arreglos de NumPy teniendo en cuenta broadcasting y algunas funciones complejas de NumPy como \texttt{array} y \texttt{dot}. Construimos 74 casos de prueba unitaros, los cuales chequean la capacidad de Pytropos, además de 20 casos de prueba de propiedades, los cuales chequean que las reglas semánticas de Pytropos correspondan con la forma en la que Python es ejecutado. Mostramos las capacidades del Interprete Abstracto por medio de ejemplos de los test unitarios.Maestríaapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de SistemasIngeniería de Sistemas0 Generalidades / Computer science, information and general works51 Matemáticas / Mathematics62 Ingeniería y operaciones afines / EngineeringCruz Camacho, Elkin Alejandro (2019) Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape Analysis. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.Static Analysis of Python Programs using Abstract Interpretation: An Application to Tensor Shape AnalysisTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMTensorsLenguage PythonProgrammingTensoresLenguaje PythonProgramaciónORIGINALElkinCruz.2019.pdfapplication/pdf840142https://repositorio.unal.edu.co/bitstream/unal/76335/1/ElkinCruz.2019.pdf8e99ad38089d82738643e96f0ca2d5dfMD51pytropos.zipapplication/octet-stream605345https://repositorio.unal.edu.co/bitstream/unal/76335/2/pytropos.zip4adffa7f6a6b355ddccae96a75c5e25fMD52THUMBNAILElkinCruz.2019.pdf.jpgElkinCruz.2019.pdf.jpgGenerated Thumbnailimage/jpeg4413https://repositorio.unal.edu.co/bitstream/unal/76335/3/ElkinCruz.2019.pdf.jpgf8441b1bfa3b51e5797fca41dd6c72a6MD53unal/76335oai:repositorio.unal.edu.co:unal/763352024-07-12 23:31:49.578Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co