The Graph Pattern Matching Problem through Parameterized Matching
We propose a new approach to solve graph isomorphism using parameterized matching. Parameterized matching is a string matching problem where two strings parameterized-match if there exists a bijective function, on the symbols of the alphabet, that maps one of the strings into the other. Given that p...
- Autores:
-
Mendivelso Moreno, Juan Carlos
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2015
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/52980
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/52980
http://bdigital.unal.edu.co/47461/
- Palabra clave:
- 0 Generalidades / Computer science, information and general works
Parameterized matching
Graph theory
Graph algorithms
Graph matching
Pattern matching
Graph isomorphism
Subgraph isomorphism
Attributed graphs
Graph queries
Social networks
Búsqueda parametrizada
TeorÍa de grafos
Algoritmos de grafos
Búsquedas en grafos
Búsqueda de patrones
Isomorphismo de grafos
Isomorphismo de subgrafos
Grafos sem´anticos
Redes sociales
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
Summary: | We propose a new approach to solve graph isomorphism using parameterized matching. Parameterized matching is a string matching problem where two strings parameterized-match if there exists a bijective function, on the symbols of the alphabet, that maps one of the strings into the other. Given that parameterized matching is defined for linear structures, we define the concept of graph linearization to represent the topology of a graph as a walk on it. Then, our approach to determine whether two graphs are isomorphic consists of determining whether there exists a walk in one of the graphs that parameterized-matches a linearization of the other graph. Our solution has two main steps: linearization and matching. We develop an efficient linearization algorithm, that generates short linearizations with an approximation guarantee, and develop a graph matching algorithm. We show that this solution also works for subgraph isomorphism, which is the problem of determining whether an input graph H is isomorphic to a subgraph of another input graph G. We evaluate our approach experimentally on graphs of different types and sizes, and compare to the performance of VF2, which is a prominent algorithm for graph isomorphism. Our empirical measurements show that graph linearization finds a matching graph faster than VF2 in many cases, especially in Miyazaki-constructed graphs which are known to be one of the hardest cases for graph isomorphism algorithms. We extend this approach to query attributed graphs. An attributed graph is a graph data structure, in which nodes and edges may have identifiers, types and other attributes. Attributed graphs are used in many application domains, for example to model social networks in which nodes represent people, photos, and postings and edges represent friendship, person-tagged-in-photo and mentioned-in-post relationships. Queries are used to extract information from such graphs. Several graph queries are expressed as graph pattern matching, which is the problem of finding all instances of pattern match query P in a larger attributed graph G. A pattern match query may specify both a graph structure and predicates on the attributes of the graph elements. Clearly, this problem is associated to subgraph isomorphism. Furthermore, we define a more general class of graph queries called generalized pattern queries on attributed multigraphs. The goal of this class is to find paths and subgraphs that satisfy query reachability and predicates. The query language is expressive: It allows (i) using regular expression operators (e.g., Kleene star and union); (ii) specifying structural predicates on graph nodes and edges; and (iii) using attribute predicates on nodes and edges. Pattern match queries, reachability queries, their combination, and even more queries can be expressed through generalized pattern queries. We use our approach to solve this new type of queries. The proposed technique has two phases. First, the query is linearized, i.e., represented as a graph walk that covers all nodes and edges. There are several linearizations for a given query; we derive heuristics to produce a good linearization that is short and places selective predicates early in the linearization. Second, we search for a bijective function that maps each element of the query to an element of the attributed multigraph that satisfies the reachability requirements and the predicates. Specifically, we develop an algorithm that matches the linearization by traversing the attributed graph in a manner similar to a breadth first traversal constrained by the linearization. We evaluate our solution experimentally using a real graph (the DBLP citation network) to assess its practicality and efficiency. Our results show that our techniques and optimizations are effective in querying attributed graphs, offering several factors of reduction in query response time when graph statistics are utilized. |
---|