Graph analysis is a form of data analytics where the underlying dataset is represented as a graph. Graph databases are rapidly emerging to support graph analysis.
In order to process huge data sets that do not fit within the memory of a single computer, academia and industry use distributed graph processing systems. In these systems, graph data is partitioned over multiple computers of a cluster, and the computation is performed in a distributed manner. Several distributed systems for large graph analysis have been developed that emphasize scalability.
A common usage pattern of such systems is a three-step approach:                1. Explore and validate the results of different types of analyses on a small dataset using only a laptop        2. Once a satisfactory analysis result was found, run the analysis chosen in step 1 on a real dataset using a big server-class machine        3. Once the dataset has grown too big and cannot be analyzed on a single machine anymore, distribute the workload over various machines        
There are several reasons why this three-step approach is so common. Data scientists evaluating different graph analytic systems do not want to go through the process of installing and configuring the system on a server or cluster before they are able to try it out. The very same system should run on a laptop, offering the same functionality, only with worse performance.
Graph analytic systems often offer many variations of the same analysis technique with different precision and performance trade-offs. The most pragmatic way to figure out which variation gives the best result is to try out all of them and compare. If the dataset is really big, this process might be very time consuming. It is easier to run all variations on a much smaller dataset first, which has the same characteristics as the real dataset.
Computation time on big server-class machines or clusters is usually expensive, both from a money and energy perspective. Knowing which analysis gives the best result in advance allows users to plan and therefore use computing time more efficiently.