The present invention, in some embodiments thereof, relates to processing a query to a graph database and, more specifically, but not exclusively, to processing a query to a graph database that is performed using multiple processors. The methods and systems described herein take advantage of multiprocessing hardware that is nowadays highly accessible and available and harnessing the multiprocessing platforms to processing queries to the graph database.
Graph databases have become popular as structures for data storage. The graph database provides a visually coherent and intuitive presentation of the data it holds allowing a human to easily follow a data pattern and interactions between a plurality of data items and/or properties. Data items (also referred to as members) are stored as nodes in the graph database while relationships between various data items are presented as directional edges connecting the data nodes.
Reference is now made to FIG. 1 which is a schematic illustration of an exemplary graph database and an exemplary query to an exemplary graph database. A graph database 101 may include a plurality of nodes 103, each containing one or more data items and/or properties describing the data item. The nodes 103 are connected between themselves with a plurality of directional edges 104 describing the relationships between the plurality of nodes 103. Each of the nodes 103 includes a node identifier and one or more data items and/or properties. Each of the edges 104 includes an identifier and is associated with an edge group (edge type). The edge 104 may include additional information with respect to the relationships between the nodes 103. A query 102 to the graph database 101 (usually expressed as a query applied to graph and/or a query against a graph) is also structured of nodes 103 and edges 104 and it basically asks for a sub-graph of the graph database 101 rooted at a root node 105. During processing of the query 102 a sub-graph within the graph database 101 which is isomorphic (i.e., same pattern indicated by the structure of the query 102) is searched for throughout the graph database 101 to identify a match. A match is identified when the structure of the query 102 is found within the graph database 101 with respect to the nodes 103 and the edges 104. Identifying a match of the query 102 against the graph database 101 may include a Boolean match, a specific node(s) (also referred to as target query node(s)) match and/or a complete match of the whole structure of the query 102 within the graph database 101. For Boolean match, the result of processing the query 102 produces a Boolean indication of a match—match or absence of match. Processing a target query node refers to identifying a match of the complete query sub-graph but reporting one or more of the nodes of the query 102 within the graph database 101, and a complete match describes a match of the complete query 102 against the graph database 101.
Currently processing the query 102 to the graph database 101 is mainly performed sequentially. A single search is performed at a time in which a specific sub-graph is searched for. In certain sub-domains, for example, XML processing there exist some parallel processing but not on a large scale.
As technology advances, multiprocessing hardware is becoming available, for example, multi core processors and/or hardware based on single instruction multi data (SIMD) architecture that are capable of simultaneously executing one or more threads. A thread is the smallest sequence of programmed instructions that can be managed independently by an operating system scheduler. SIMD platforms employ processor arrays in which a single instruction or operation may be processed in parallel over data arrays containing multiple data items which are mostly independent of each other. The combination of a multithreading platform coupled with a SIMD architecture allows for massive vector processing enabling parallelization in processing large data arrays containing data items that are mostly independent of each other. An example of SIMD platforms is a graphic processor unit (GPU) which is very wide spread in processing stations, for example desktop computers, laptop computers and/or servers. GPUs are designed to process display data and have evolved to include massive arrays of processors to effectively and quickly process high resolution, high definition display data for fast moving scenes, for example, motion pictures and/or for gaming applications.
Multiprocessing platforms may be used for many other applications other than graphic and video processing. Applications which may have no and/or limited dependency between data items which are involved in the processing may employ a vector processing approach using SIMD platforms in order to reduce processing time and support low latency systems. In order to execute applications using SIMD platforms, it is possible that the algorithms embodied within the applications, may require some modifications in order to execute on SIMD hardware.