Big data analysis, applied in many applications, including machine learning, deep learning, and social network analysis, to name a few, often operate on very large data sets with sparse connections. Indirect memory accesses, which are becoming more important in big data analysis, are memory accesses in the form of A[B[i]], i.e., the content of one array (B) is used to index a second array (A). Accesses to the B-Array are sequential, meaning that loading the contents of B[i] is handled well by the conventional cache and prefetchers. However, the data in B is irregular, meaning that the access pattern to A is irregular.
This access pattern is common for applications with sparse data, e.g., sparse matrices and graphs, which are used in big data analysis. For example, the neighbors of a vertex in a graph are stored as an array of vertex IDs. A common operation in graph applications is to fetch data from all neighbors of a vertex, and combine this to calculate its own data, e.g., the page rank of a vertex is determined by the page rank value of its neighbors. The list of neighbors is therefore used to index the A-Array, i.e., an indirect memory access pattern.
Because of the irregularity and sparsity of the A-Array, existing prefetchers cannot predict this address stream. Furthermore, the accessed data structure is often too big to be cached, meaning that the fetched data and its corresponding cache line is likely to be evicted before being accessed again. This makes inefficient use of caches, and wastes memory bandwidth, as only one element on a cache line is effectively used. As a result, threads and cores are often halted, waiting on data to be fetched from main memory.