Graph analysis recently became important for big data analysis. Two systems for graph processing are established among data scientists.
The first is graph database management systems (DBMS's). Graph DBMS's focus on persisting graphs. They provide special functionality for storing, retrieving, and modifying graphs in a transactional manner. Examples are neo4j and Oracle big data appliance (BDA).
Graph DBMS's store the data on nonvolatile media such as hard disk drives (HDDs) or solid state drives (SSDs). These drives, such as used by Oracle Exadata, can hold huge amounts of data on the order of a petabyte.
The second solution is graph analytic engines. These engines focus on analyzing graphs. They specialize on analytic speed and retrieval of nontrivial information. Examples are giraph, GraphX, Turi, and Oracle parallel graph analytics (PGX).
Graph analytic engines typically load the whole data into the main memory of the computer. The amount of available memory on modern computers is only on the order of ten terabytes. The benefit of memory is that it has much faster access than HDDs and SSDs, which makes memory better for analysis workloads.
A typical work flow may be:                1. Store/update the original graph in the graph database;        2. Load a snapshot of the graph from the graph database into the graph analytic engine, perhaps on a separate computer than the graph database;        3. Run analyses on the graph stored in memory using the graph analytic engine; and        4. Send the results back to the graph DBMS for persistence        
The lower capacity of memory creates a problem when a graph needs to be loaded into the graph analytic engine for analysis purposes. If the graph is too big for main memory, engine throughput may degrade due to swapping.
Often only a specific subset of the original graph, i.e. a subgraph, may be of interest to the user. For example, only the subgraph containing information from a specific time period might be interesting for analysis. In such a case, the user wants to specify the interesting subgraph which becomes small enough to fit within the size constraints of the main memory.
Furthermore, composing a graph query is typically done manually. The user may need to compose a different query for each different graph storage system. This may require knowledge of data formats, text files, and databases such as relational, NoSQL, and HBASE. Accepting and evaluating ad hoc queries from users could pose a security risk, for example through SQL injection and related attacks.