Exponential gains in hardware technology have enabled sophisticated machine learning and data mining techniques to be applied to increasingly challenging real-world problems. While high-level parallel frameworks simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms. Efficient distributed parallel algorithms for handling large scale data are required. However, designing and implementing efficient and provably correct parallel algorithms is extremely challenging.
In recent years, large-scale distributed graph-structured computation has been central to tasks ranging from targeted advertising to natural language processing. However, for efficient use of large-scale graphs there is a need for scalable analytics processing capabilities.