1. Field
The present disclosure relates to graph queries and other analytics computations. More specifically, this disclosure relates to a method and system for scalable processing of graph queries and other analytics applications.
2. Related Art
Analytics algorithms and applications often have to deal with graphs, a general data structure ideally suited for modeling various real-world objects, events, facts and their relations. With big data analytics being at the forefront of algorithm research and business innovations, the ability to process big graph data becomes increasingly important, yet standard approaches to big data such as Hadoop do not scale well on graphs. This is because graphs usually do not fit nicely into the same map-reduce pattern of computation assumed by Hadoop or similar big data platforms. Such an “impedance mismatch” motivated the development of dedicated analytics packages or libraries specifically designed for graphs, such as Giraph, GraphLab, Boost Graph Library (BGL), and Neo4j.
Open-source graph tools like BGL and Neo4j do not scale well in comparison to other high-performance graph engines. Giraph is built on top of Hadoop's map-reduce framework, and it remains to be seen whether Giraph can meet the speed requirements for big graphs. GraphLab is an open-source package for machine learning with a parallel programming abstraction targeted for sparse iterative graph algorithms. In the original C/C++ implementation, the inventors of GraphLab benchmarked its performance against a comparable Hadoop implementation with the following results: with 16 processors, GraphLab completed a Co-Expectation-Maximization (Co-EM) task in less than 30 minutes. The same task took Hadoop 7.5 hours, using an average of 95 central processing units (CPUs). Wikipedia states that GraphLab is about 50× faster than Mahout, a Hadoop-based machine learning implementation. Although GraphLab shows improvements over previous tools, yet more scalable and extensible tools are needed for analyzing big graph data.