Graph analysis is a recently popularized methodology in data analysis that considers fine-grained relationships between data entities. Traditional data processing systems (e.g. databases) do not handle graph analysis well, especially for large graphs. Distributed frameworks have emerged that are specialized for graph processing, including GraphX and GraphLab.
However, being designed for batch-processing in mind, these distributed frameworks have inefficiencies for interactive use cases where concurrent clients of a distributed system want to use the system in ad hoc and/or interactive ways and from remote machines.
Because the existing distributed systems are designed for batch processing, they only expose a rudimentary application program interface (API) to the users. Consequently, the user needs to either use a dedicated (inflexible) client program provided by the distributed framework, or develop a custom batch program that uses the low-level API for controlling the distributed system. As such, interactive remote control of these distributed systems is difficult.
These distributed systems also do not fully support cancellation of already running jobs. Thus, a client must either wait until a submitted job or query finishes, or else shut down the distributed system to cancel the job. This may be problematic, because a distributed graph processing system may handle huge graphs that take several hours or days to load into memory and prepare for subsequent analysis.
For distributed graph processing systems, canceling a running job is a complicated task because existing mechanisms for distributed cancellation are cumbersome, unreliable, destructive of state, or otherwise inadequate. Furthermore, existing distributed systems do not gracefully handle exceptions. Thus, existing distributed job control is suboptimal.