Graph analysis is a recently popularized way of analyzing data that may consider not only the properties of the entities but also relationships between them. Most algorithms for graph analysis are not very computation-intensive, but are heavily memory-bound. Also graph analysis algorithms are inherently parallel and graph analysis systems exploit this parallelism, for example by executing code with multiple threads.
In graph analysis, the original dataset is represented as a graph. Each data entity may be represented as a vertex within the graph. A relationship between two entities may be represented as an edge between corresponding vertices. Other information in the original dataset may be encoded as vertex or edge properties, which are values associated with vertices and edges. For example, a vertex that represents a person in the original dataset can be associated with extra properties such as a serial number, personal name, or street address.
Graph analysis may be used by many types of applications such as semantic webs, molecular networks, and social networks. A real-world graph dataset may be huge in size and time consuming to analyze. In order to accelerating large graph analysis, multiple central processing unit (CPU) cores are exploited by multithreaded computation. As such, an analytic computation may be simultaneously performed on different portions of a large graph. A large-scale server system may support hundreds or thousands of threads in hardware.
Multithreaded execution, however, may have drawbacks, such as increased memory consumption. Some graph analysis algorithms, when implemented with multithreading, need thread-local data structures for each thread to hold temporary or other local data. In some cases, each thread needs to keep a thread-local version of a vertex or edge property having a separate value for many or all vertices or edges of a graph. When using a large number of threads, this memory overhead may become problematic. For example, throughput may decline if accessing a huge graph causes thrashing of one or more memory tiers, such as static random access memory (RAM) of an on-chip cache or off-chip memory such as dynamic RAM or flash memory.
There are two ways of dealing with this problem. One way is to limit how many threads run concurrently. In other words, fewer threads are used so memory usage by thread locals does not exceed physical RAM capacity. The obvious downside is that the use of fewer threads (e.g. CPU cores) may increase analysis execution time.
Another way is to limit the amount of memory that each thread uses. This is applicable for graph algorithms that dispatch units of work. One example is multisource breadth-first search (MS-BFS), where each thread processes one batch of vertices at a time.
Various combinations of above two approaches may prevent memory exhaustion, but with different time-space tradeoffs. A user may apply either or both of the above two approaches. However, deciding on a good combination of approaches is challenging even for experienced users.
When too many threads are used, memory is exhausted. When too few threads are used, throughput declines. A user might not know how much thread-local memory is required by an analysis algorithm. A user who runs the analysis on a dataset might not have the design knowledge of a person who developed the analysis algorithm.
As such, optimal performance may be elusive. Hence, a user may go through several tuning iterations of trial-and-error until finding an acceptable balance between memory usage and execution speed. Often a risk of exhausting memory outweighs a need for maximum speed. As such, manual tuning tends to be conservative and suboptimal.