One or more example embodiments relate to a method and apparatus for generating a graph.
Graphs are widely used to model real-world objects in many domains such as social networks, web, business intelligence, biology, and neuroscience, due to their simplicity and generality. As many applications such as graph-based distributed on-line transaction processing (OLTP) query processing, the Internet of Thing (IoT), and the human connectome encounter exponential growth in graph sizes, both fast and scalable graph processing methods and synthetic graph generation methods have become more important than ever before. There are two strong motivations for studying a fast and scalable graph generator: (1) a lack of large-scale realistic graphs for evaluating the performance of graph processing methods; and (2) a lack of low-level core techniques that may be used for generating a benchmark database for rich graph model.
For the first motivation, a number of graph processing systems that may process a graph of one trillion edges have already been developed. All of the graph processing systems have used synthetic graphs for evaluating their performance since sharing and utilizing large-scale real-world graphs is very limited due to their being proprietary, or being practically impossible to collect. However, most trillion-scale synthetic graphs used so far are unrealistic synthetic graphs that have a huge number of repeated edges and do not follow the power-law degree distribution, which are considered as important properties of “realistic” graphs. Thus, a scalable realistic synthetic graph generator is critical for more accurate evaluation of graph processing methods.
The existing synthetic graph generators may not generate a graph of trillion edges by using a cluster of commodity machines, or require a supercomputer for generating graphs. There have been proposed a number of models or methods to generate realistic synthetic graphs, and recursive MATrix (RMAT) and Kronecker are most commonly used among the models or methods. RMAT is based on a recursive matrix model that recursively selects a quadrant on an adjacency matrix to generate an edge and repeats the same procedure until a total of |E| edges are generated for a graph G=(V, E). Kronecker has two models for graph generation: stochastic Kronecker graph (SKG) and deterministic Kronecker graph (DKG). SKG is a generalized model of RMAT in terms of the number of probability parameters.
Hereinafter, RMAT and Kronecker will be described.
“RMAT”
RMAT stands for recursive MATrix (RMAT) graph generator and may be the most commonly used model to generate realistic synthetic scale-free graphs. The basic idea behind RMAT is recursive quadrant selection on an adjacency matrix for edge generation, and RMAT repeats such edge generation until a whole graph is generated. Conceptually, RMAT partitions an adjacency matrix into four quadrants, which have probability parameters α, β, γ and δ, as in FIG. 1A. A higher parameter value indicates a higher probability for the corresponding quadrant to be selected, and the sum of four parameters should be 1.0. For a graph G=(V, E), V denotes a vertex, E denotes an edge, and the size of the adjacency matrix is |V|×|V|, where a row indicates a source vertex and a column indicates a destination vertex. Each quadrant is recursively partitioned into four sub-quadrants of the same probability parameters α, β, γ and δ until the size of a sub-quadrant becomes 1×1. Therefore, the number of recursive selection operations required for generating an edge is log |V|. FIG. 1B shows an example of generation of an edge in RMAT, where the operations of recursive guardant selection are β→δ→γ→β (|V|=16). When the recursive quadrant selection reaches the 1×1 cell of the adjacent matrix, for example, (x, y) cell, RMAT appends an edge (x, y) to the resulting graph.
“Kronecker”
SKG stands for stochastic Kronecker graph generator and is a generalized model of RMAT. While RMAT considers only 2×2 probability parameters, SKG considers n×n probability parameters. That is RMAT is a special case of SKG (n=2). Both RMAT and SKG are similar to each other in terms of using probability matrix, but are dissimilar to each other in terms of the stochastic process. While RMAT generates an edge via a series of recursive operations in a dynamic manner, SKG generates an edge in a static manner. In detail, SKG checks, for every cell of the adjacency matrix, whether the corresponding edge is generated or not with respect to the probability of the cell. FIG. 1C shows an example of generation of an edge in SKG, where a black cell has the probability of β×δ×γ×β for generation. When a randomly generated value for the cell is within the probability, the corresponding edge is generated.
The entire probability matrix of |V|×|V| of SKG may be presented using the Kronecker product. The Kronecker product is an operator to calculate an outer product between given two arbitrary-size matrices.
Definition 1. Kronecker Product:
Given two matrices A and B, where A is of size n×m, and B is of size p×q, the Kronecker product between A and B is defined as
         [                                                      a                              1                ,                1                                      ⁢            B                                    ⋯                                                    a                              1                ,                m                                      ⁢            B                                                ⋮                          ⋱                          ⋮                                                                a                              n                ,                1                                      ⁢            B                                    ⋯                                                    a                              n                ,                m                                      ⁢            B                                ]  and expressed by Equation 1 in more detail.
                                                                                                                                                  A                ⊗                B                            =                              [                                                                                                                              a                                                      1                            ,                            1                                                                          ⁢                                                  b                                                      1                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      1                            ,                            1                                                                          ⁢                                                  b                                                      1                            ,                            q                                                                                                                                                                                                                                                                                                    a                                                      1                            ,                            m                                                                          ⁢                                                  b                                                      1                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      1                            ,                            m                                                                          ⁢                                                  b                                                      1                            ,                            q                                                                                                                                                                          ⋮                                                              ⋱                                                              ⋮                                                              ⋯                                                              ⋮                                                              ⋱                                                              ⋮                                                                                                                                                    a                                                      1                            ,                            1                                                                          ⁢                                                  b                                                      p                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      1                            ,                            1                                                                          ⁢                                                  b                                                      p                            ,                            q                                                                                                                                                                                                                                                                                                    a                                                      1                            ,                            m                                                                          ⁢                                                  b                                                      p                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      1                            ,                            m                                                                          ⁢                                                  b                                                      p                            ,                            q                                                                                                                                                                                                                                                                                      ⋮                                                                                                                                                                          ⋱                                                                                                                                                                          ⋮                                                                                                                                                                                                                                                                a                                                      n                            ,                            1                                                                          ⁢                                                  b                                                      1                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      n                            ,                            1                                                                          ⁢                                                  b                                                      1                            ,                            q                                                                                                                                                                                                                                                                                                    a                                                      n                            ,                            m                                                                          ⁢                                                  b                                                      1                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      n                            ,                            m                                                                          ⁢                                                  b                                                      1                            ,                            q                                                                                                                                                                          ⋮                                                              ⋱                                                              ⋮                                                              ⋯                                                              ⋮                                                              ⋱                                                              ⋮                                                                                                                                                    a                                                      n                            ,                            1                                                                          ⁢                                                  b                                                      p                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      n                            ,                            1                                                                          ⁢                                                  b                                                      p                            ,                            q                                                                                                                                                                                                                                                                                                    a                                                      n                            ,                            m                                                                          ⁢                                                  b                                                      p                            ,                            1                                                                                                                                      ⋯                                                                                                                a                                                      n                            ,                            m                                                                          ⁢                                                  b                                                      p                            ,                            q                                                                                                                                              ]                                                                        [                  Equation          ⁢                                          ⁢          1                ]            
By using Definition 1, the probability matrix of SKG  may be presented as Equation 2.=K⊗K⊗K . . . ⊗K=K⊗logn|V|  [Equation 2]
In Equation 2, K is a given seed probability n×n matrix.
Although both RMAT and Kronecker (that is, SKG) are effective for generating synthetic graphs that are realistic, RMAT and Kronecker are not very efficient for generating large-scale graphs in terms of space and time complexities. RMAT has the space complexity of O(|E|), while Kronecker has the time complexity of O(|V|2). RMAT has a relatively high space complexity, and Kronecker has a relatively high time complexity. Thus, RMAT tends to have a limit on the size of a graph to generate due to its high memory requirement, and Kronecker tends to have a limit due to its high computational overhead. There is a benchmark called Graph500. Graph500 generates a trillion-scale graph and runs a simple query on it for measuring the performance of supercomputers. This benchmark uses the SKG model of Kronecker for graph generation. A number of supercomputers may generate a trillion-scale graph through the benchmark. However, the benchmark uses a huge amount of computing resources, typically several thousand server computers connected via a high speed network, for example, Infiniband. To most researchers, it may be practically impossible to use such equipment for graph generation. Therefore, it is a challenging problem to generate a trillion-scale synthetic graph efficiently only using a small amount of computing resource.
For the second motivation, there have been a lot of efforts to automatically generate a benchmark database for semantically rich graph models similar to the transaction processing performance (TPC) benchmark for the relational model. With the proliferation of linked data in the real world, managing graph-structured linked data becomes more and more important in many domains. There have been proposed a number of performance benchmarks for linked data management systems. One of representative methods, gMark may generate a graph having various properties such as multiple node types and edge predicates. Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB) Datagen is another representative method in social network benchmark.
These methods have focused on rich semantics of a graph rather than the size of the graph so far. As a result, the sizes of resulting databases tend to be very small. However, both rich semantics and scalability are important for an ideal benchmark. Thus, it is another challenging problem to develop scalability techniques that may be used for generating such a benchmark.