The Graph Pattern Matching Problem through Parameterized Matching

We propose a new approach to solve graph isomorphism using parameterized matching. Parameterized matching is a string matching problem where two strings parameterized-match if there exists a bijective function, on the symbols of the alphabet, that maps one of the strings into the other. Given that p...

Full description

Autores:
Mendivelso Moreno, Juan Carlos
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2015
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/52980
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/52980
http://bdigital.unal.edu.co/47461/
Palabra clave:
0 Generalidades / Computer science, information and general works
Parameterized matching
Graph theory
Graph algorithms
Graph matching
Pattern matching
Graph isomorphism
Subgraph isomorphism
Attributed graphs
Graph queries
Social networks
Búsqueda parametrizada
TeorÍa de grafos
Algoritmos de grafos
Búsquedas en grafos
Búsqueda de patrones
Isomorphismo de grafos
Isomorphismo de subgrafos
Grafos sem´anticos
Redes sociales
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
id UNACIONAL2_9dd13c333fdeafd920fab4d85d70cd1e
oai_identifier_str oai:repositorio.unal.edu.co:unal/52980
network_acronym_str UNACIONAL2
network_name_str Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv The Graph Pattern Matching Problem through Parameterized Matching
title The Graph Pattern Matching Problem through Parameterized Matching
spellingShingle The Graph Pattern Matching Problem through Parameterized Matching
0 Generalidades / Computer science, information and general works
Parameterized matching
Graph theory
Graph algorithms
Graph matching
Pattern matching
Graph isomorphism
Subgraph isomorphism
Attributed graphs
Graph queries
Social networks
Búsqueda parametrizada
TeorÍa de grafos
Algoritmos de grafos
Búsquedas en grafos
Búsqueda de patrones
Isomorphismo de grafos
Isomorphismo de subgrafos
Grafos sem´anticos
Redes sociales
title_short The Graph Pattern Matching Problem through Parameterized Matching
title_full The Graph Pattern Matching Problem through Parameterized Matching
title_fullStr The Graph Pattern Matching Problem through Parameterized Matching
title_full_unstemmed The Graph Pattern Matching Problem through Parameterized Matching
title_sort The Graph Pattern Matching Problem through Parameterized Matching
dc.creator.fl_str_mv Mendivelso Moreno, Juan Carlos
dc.contributor.author.spa.fl_str_mv Mendivelso Moreno, Juan Carlos
dc.contributor.spa.fl_str_mv Pinzón Ardila, Yoan José
dc.subject.ddc.spa.fl_str_mv 0 Generalidades / Computer science, information and general works
topic 0 Generalidades / Computer science, information and general works
Parameterized matching
Graph theory
Graph algorithms
Graph matching
Pattern matching
Graph isomorphism
Subgraph isomorphism
Attributed graphs
Graph queries
Social networks
Búsqueda parametrizada
TeorÍa de grafos
Algoritmos de grafos
Búsquedas en grafos
Búsqueda de patrones
Isomorphismo de grafos
Isomorphismo de subgrafos
Grafos sem´anticos
Redes sociales
dc.subject.proposal.spa.fl_str_mv Parameterized matching
Graph theory
Graph algorithms
Graph matching
Pattern matching
Graph isomorphism
Subgraph isomorphism
Attributed graphs
Graph queries
Social networks
Búsqueda parametrizada
TeorÍa de grafos
Algoritmos de grafos
Búsquedas en grafos
Búsqueda de patrones
Isomorphismo de grafos
Isomorphismo de subgrafos
Grafos sem´anticos
Redes sociales
description We propose a new approach to solve graph isomorphism using parameterized matching. Parameterized matching is a string matching problem where two strings parameterized-match if there exists a bijective function, on the symbols of the alphabet, that maps one of the strings into the other. Given that parameterized matching is defined for linear structures, we define the concept of graph linearization to represent the topology of a graph as a walk on it. Then, our approach to determine whether two graphs are isomorphic consists of determining whether there exists a walk in one of the graphs that parameterized-matches a linearization of the other graph. Our solution has two main steps: linearization and matching. We develop an efficient linearization algorithm, that generates short linearizations with an approximation guarantee, and develop a graph matching algorithm. We show that this solution also works for subgraph isomorphism, which is the problem of determining whether an input graph H is isomorphic to a subgraph of another input graph G. We evaluate our approach experimentally on graphs of different types and sizes, and compare to the performance of VF2, which is a prominent algorithm for graph isomorphism. Our empirical measurements show that graph linearization finds a matching graph faster than VF2 in many cases, especially in Miyazaki-constructed graphs which are known to be one of the hardest cases for graph isomorphism algorithms. We extend this approach to query attributed graphs. An attributed graph is a graph data structure, in which nodes and edges may have identifiers, types and other attributes. Attributed graphs are used in many application domains, for example to model social networks in which nodes represent people, photos, and postings and edges represent friendship, person-tagged-in-photo and mentioned-in-post relationships. Queries are used to extract information from such graphs. Several graph queries are expressed as graph pattern matching, which is the problem of finding all instances of pattern match query P in a larger attributed graph G. A pattern match query may specify both a graph structure and predicates on the attributes of the graph elements. Clearly, this problem is associated to subgraph isomorphism. Furthermore, we define a more general class of graph queries called generalized pattern queries on attributed multigraphs. The goal of this class is to find paths and subgraphs that satisfy query reachability and predicates. The query language is expressive: It allows (i) using regular expression operators (e.g., Kleene star and union); (ii) specifying structural predicates on graph nodes and edges; and (iii) using attribute predicates on nodes and edges. Pattern match queries, reachability queries, their combination, and even more queries can be expressed through generalized pattern queries. We use our approach to solve this new type of queries. The proposed technique has two phases. First, the query is linearized, i.e., represented as a graph walk that covers all nodes and edges. There are several linearizations for a given query; we derive heuristics to produce a good linearization that is short and places selective predicates early in the linearization. Second, we search for a bijective function that maps each element of the query to an element of the attributed multigraph that satisfies the reachability requirements and the predicates. Specifically, we develop an algorithm that matches the linearization by traversing the attributed graph in a manner similar to a breadth first traversal constrained by the linearization. We evaluate our solution experimentally using a real graph (the DBLP citation network) to assess its practicality and efficiency. Our results show that our techniques and optimizations are effective in querying attributed graphs, offering several factors of reduction in query response time when graph statistics are utilized.
publishDate 2015
dc.date.issued.spa.fl_str_mv 2015-03-19
dc.date.accessioned.spa.fl_str_mv 2019-06-29T15:54:32Z
dc.date.available.spa.fl_str_mv 2019-06-29T15:54:32Z
dc.type.spa.fl_str_mv Trabajo de grado - Doctorado
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/doctoralThesis
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.coar.spa.fl_str_mv http://purl.org/coar/resource_type/c_db06
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TD
format http://purl.org/coar/resource_type/c_db06
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://repositorio.unal.edu.co/handle/unal/52980
dc.identifier.eprints.spa.fl_str_mv http://bdigital.unal.edu.co/47461/
url https://repositorio.unal.edu.co/handle/unal/52980
http://bdigital.unal.edu.co/47461/
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.ispartof.spa.fl_str_mv Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial
Departamento de Ingeniería de Sistemas e Industrial
dc.relation.references.spa.fl_str_mv Mendivelso Moreno, Juan Carlos (2015) The Graph Pattern Matching Problem through Parameterized Matching. Doctorado thesis, Universidad Nacional de Colombia.
dc.rights.spa.fl_str_mv Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv Atribución-NoComercial 4.0 Internacional
Derechos reservados - Universidad Nacional de Colombia
http://creativecommons.org/licenses/by-nc/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.spa.fl_str_mv application/pdf
institution Universidad Nacional de Colombia
bitstream.url.fl_str_mv https://repositorio.unal.edu.co/bitstream/unal/52980/1/80851676.2015.pdf
https://repositorio.unal.edu.co/bitstream/unal/52980/2/80851676.2015.pdf.jpg
bitstream.checksum.fl_str_mv 03d64beeb2b6ddb415e0dc3aae71f12c
e934af73a362a38172dd97048e041d6a
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv repositorio_nal@unal.edu.co
_version_ 1814089871046737920
spelling Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Pinzón Ardila, Yoan JoséMendivelso Moreno, Juan Carlos9853244c-7db1-49ee-97fa-cecbf67845a83002019-06-29T15:54:32Z2019-06-29T15:54:32Z2015-03-19https://repositorio.unal.edu.co/handle/unal/52980http://bdigital.unal.edu.co/47461/We propose a new approach to solve graph isomorphism using parameterized matching. Parameterized matching is a string matching problem where two strings parameterized-match if there exists a bijective function, on the symbols of the alphabet, that maps one of the strings into the other. Given that parameterized matching is defined for linear structures, we define the concept of graph linearization to represent the topology of a graph as a walk on it. Then, our approach to determine whether two graphs are isomorphic consists of determining whether there exists a walk in one of the graphs that parameterized-matches a linearization of the other graph. Our solution has two main steps: linearization and matching. We develop an efficient linearization algorithm, that generates short linearizations with an approximation guarantee, and develop a graph matching algorithm. We show that this solution also works for subgraph isomorphism, which is the problem of determining whether an input graph H is isomorphic to a subgraph of another input graph G. We evaluate our approach experimentally on graphs of different types and sizes, and compare to the performance of VF2, which is a prominent algorithm for graph isomorphism. Our empirical measurements show that graph linearization finds a matching graph faster than VF2 in many cases, especially in Miyazaki-constructed graphs which are known to be one of the hardest cases for graph isomorphism algorithms. We extend this approach to query attributed graphs. An attributed graph is a graph data structure, in which nodes and edges may have identifiers, types and other attributes. Attributed graphs are used in many application domains, for example to model social networks in which nodes represent people, photos, and postings and edges represent friendship, person-tagged-in-photo and mentioned-in-post relationships. Queries are used to extract information from such graphs. Several graph queries are expressed as graph pattern matching, which is the problem of finding all instances of pattern match query P in a larger attributed graph G. A pattern match query may specify both a graph structure and predicates on the attributes of the graph elements. Clearly, this problem is associated to subgraph isomorphism. Furthermore, we define a more general class of graph queries called generalized pattern queries on attributed multigraphs. The goal of this class is to find paths and subgraphs that satisfy query reachability and predicates. The query language is expressive: It allows (i) using regular expression operators (e.g., Kleene star and union); (ii) specifying structural predicates on graph nodes and edges; and (iii) using attribute predicates on nodes and edges. Pattern match queries, reachability queries, their combination, and even more queries can be expressed through generalized pattern queries. We use our approach to solve this new type of queries. The proposed technique has two phases. First, the query is linearized, i.e., represented as a graph walk that covers all nodes and edges. There are several linearizations for a given query; we derive heuristics to produce a good linearization that is short and places selective predicates early in the linearization. Second, we search for a bijective function that maps each element of the query to an element of the attributed multigraph that satisfies the reachability requirements and the predicates. Specifically, we develop an algorithm that matches the linearization by traversing the attributed graph in a manner similar to a breadth first traversal constrained by the linearization. We evaluate our solution experimentally using a real graph (the DBLP citation network) to assess its practicality and efficiency. Our results show that our techniques and optimizations are effective in querying attributed graphs, offering several factors of reduction in query response time when graph statistics are utilized.Resumen. En esta tesis se propone un nuevo enfoque de solución para resolver el problema de isomorfismo de grafos usando búsqueda parametrizada. La búsqueda parametrizada es un problema de búsqueda de cadenas de texto en el cual dos cadenas coinciden si existe una biyección que mapee los símbolos de una cadena en los símbolos de la otra. Dado que la búsqueda parametrizada está definida para estructuras lineales, se define el concepto de linearización de grafos para representar la topología de un grafo como un camino sobre este. Entonces, la solución para determinar si dos grafos son isomorfos consiste en determinar si existe un camino en uno de los grafos que haga coincidencia parametrizada con la linearización del otro grafo. La solución propuesta tiene dos pasos: linearización y búsqueda. Se presenta un algoritmo eficiente que genera linearizaciones aproximadamente óptimas en longitud, y un algoritmo de búsqueda. Se demuestra que esta solución también resuelve el problema de isomorfismo de subgrafos, en el cual se determina si un grafo H es isomorfo a un subgrafo de otro grafo G. Se evaluó experimentalmente la solución con grafos de diferentes tipos y tamaños. Se comparó su desempeño con el de VF2, el cual es un algoritmo competitivo de isomorfismo de grafos. Los resultados experimentales muestran que la solución propuesta es más eficiente que VF2 en varios casos, en especial en grafos Miyazaki, los cuales se caracterizan por constituir uno de los casos más difíciles para los algoritmos de isomorfismo de grafos. Este enfoque de solución se extiende para resolver consultas sobre grafos semánticos. Un grafo semántico es un grafo en el cual los nodes y arcos pueden tener identificadores, tipos y otros atributos. Estos grafos tienen aplicaciones importantes en diversas áreas, como por ejemplo para modelar redes sociales en las que los nodos representan personas, fotos y publicaciones y los arcos representan relaciones de amistad, etiquetado y mención. Se usan consultas para extraer información de estos grafos. Varias de estas consultas se expresan como búsqueda de patrones, la cual consiste en encontrar las coincidencias del grafo patrón P en un grafo semántico G. El grafo patrón especifica tanto la estructura de las coincidencias a encontrar, como los predicados sobre los atributos que deben cumplir los nodos y los arcos de las mismas. Claramente, este problema está asociado al isomorfismo de subgrafos. Además, se define un tipo de consultas más general sobre grafos semánticos. Estos nuevos patrones se denominan grafos patrón generalizados. El objetivo de estos es encontrar caminos y subgrafos que satisfagan ciertos requisitos semánticos, de estructura y de alcance. Estos patrones son expresivos, pues permiten (i) usar operadores de expresiones regulares (e.g., la estrella de Kleene y la unión); (ii) especificar predicados estructurales en los nodos y arcos; y (iii) evaluar predicados sobre los atributos de los nodos y arcos. Los patrones grafo tradicionales, las consultas de alcance, la combinación de estos y más se pueden representar a través de grafos patrón generalizados. Se usa el enfoque de solución propuesto para resolver los grafos patrón generalizados. La solución tiene dos fases. Primero, el patrón es linearizado, i.e., representado como un camino que incluye todos sus nodos y arcos. Hay muchas linearizaciones para un patrón dado; se proponen heurísticas para producir linearizaciones cortas que ubican los predicados selectivos al comienzo. Segundo, se busca una función biyectiva que mapee cada nodo en el patrón a un nodo en el grafo semántico que satisfaga los requisitos de alcance y los predicados. Específicamente, se propone un algoritmo de búsqueda que recorre el grafo semántico siguiendo una búsqueda en amplitud restringida por la linearización. La solución se evaluó experimentalmente usando un grafo semántico real (la red de citaciones DBLP) para evaluar su practicidad y eficiencia. Los resultados experimentales muestran que las técnicas y optimizaciones propuestas son efectivas en consultar grafos semánticos, ofreciendo un alto factor de reducción en el tiempo de ejecución cuando se utilizan las estadísticas del grafo semántico.Doctoradoapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e IndustrialDepartamento de Ingeniería de Sistemas e IndustrialMendivelso Moreno, Juan Carlos (2015) The Graph Pattern Matching Problem through Parameterized Matching. Doctorado thesis, Universidad Nacional de Colombia.0 Generalidades / Computer science, information and general worksParameterized matchingGraph theoryGraph algorithmsGraph matchingPattern matchingGraph isomorphismSubgraph isomorphismAttributed graphsGraph queriesSocial networksBúsqueda parametrizadaTeorÍa de grafosAlgoritmos de grafosBúsquedas en grafosBúsqueda de patronesIsomorphismo de grafosIsomorphismo de subgrafosGrafos sem´anticosRedes socialesThe Graph Pattern Matching Problem through Parameterized MatchingTrabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttp://purl.org/redcol/resource_type/TDORIGINAL80851676.2015.pdfapplication/pdf2317386https://repositorio.unal.edu.co/bitstream/unal/52980/1/80851676.2015.pdf03d64beeb2b6ddb415e0dc3aae71f12cMD51THUMBNAIL80851676.2015.pdf.jpg80851676.2015.pdf.jpgGenerated Thumbnailimage/jpeg4428https://repositorio.unal.edu.co/bitstream/unal/52980/2/80851676.2015.pdf.jpge934af73a362a38172dd97048e041d6aMD52unal/52980oai:repositorio.unal.edu.co:unal/529802024-03-04 23:09:30.84Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co