Graph analysis is a recent methodology for data analysis that represents a data set as a graph so that fine-grained relationships between data entities are captured as edges between nodes that represent records or objects. Graph analysis provides many benefits. It enables consideration of relationships between data entities in natural ways. This is especially useful for analysis of indirect, multi-hop relationships such as graph traversal paths.
By running graph analysis algorithms on top of a graph representation, valuable non-obvious insights into the data set may emerge. Suitable graph-analysis frameworks and systems deliver this information faster than can be achieved by analyzing the data directly in its relational form and according to traditional database tools such as structured query language (SQL). Other data analysis methodologies, such as certain kernels in machine learning or statistical analysis, can be formulated as graph problems.
In typical enterprise systems, however, most data sets are already maintained in relational database systems. This is not accidental. Relational (or tabular) representation of data provides many proven benefits in terms of performance and convenience for maintaining and querying business-critical data. Therefore, there exists a gap between how data is stored and maintained relationally and how the data needs to be analyzed as a graph.
Industry is solving this issue with either of the following two approaches. The first approach creates a database system that directly manages data as a graph model, such as with a graph database. Neo4J is a popular implementation of a graph database. However, graph databases do not perform well with analytic workloads or many clients. This is partially because graph databases need to provide the same data maintenance features as have long ago been optimized for relational database. For example, graph databases do not perform as well for transactional workloads as relational databases perform.
The second approach uses a framework that is specialized for graph analysis. GraphX and GraphLab are examples of such frameworks. However, users must manually provide a graph representation of the data to these systems. Consequently with either approach, it takes much time and effort to configure a graph representation of a data set, which makes it harder and more error prone to apply graph analysis on a data set.