Field of the Disclosure
This disclosure relates generally accessing large data sets and more specifically to non-sequential accesses to large in-memory data sets.
Description of the Related Art
Non-sequential access to large data sets may cause significant performance issues, such as due to poor load balancing of data accesses across multiple memory channels. In general, various workloads, such as in-memory graph analytics may make large volumes of non-sequential memory accesses, which may cause “hot spots” in the memory system if frequently accessed data are located near each other. For instance, according to one example a single iteration of a network mapping and ranking algorithm, such as the PageRank™ algorithm, on a benchmark input of size 540 GB may iterate sequentially over approximately 540 GB of data and may perform a further 512 GB of non-sequential accesses into around 12.5 GB of data. The non-sequential accesses may be skewed disproportionately toward hot data: some of the 12.5 GB may be accessed many times, and some may be accessed only once (if at all).
Such hot spots may cause a disproportionate number of accesses to be directed to the same memory channel and may further result in frequent misses in the last level cache. If one memory channel becomes saturated before others, the entire progress of the workload may slow. Additionally, translation look-ahead buffer (TLB) space may be limited in size and it may be advantageous to reduce the number of TLB entries used. Furthermore, after TLB translation, each physical page of memory is typically located on a single socket (and hence on a specific memory channel or set of memory channels on that socket). For instance, on a SPARC system, physical memory for a given TLB mapping is always held on a single socket as a contiguous block aligned to a multiple of the page size. Traditionally, when attempting to distribute memory system load more evenly, systems have used smaller page sizes, which may increase TLB pressure, or use hardware support for per-cache-line interleaving, which may preclude the use of other optimizations on multi-socket machines, or may explicitly randomize data layouts on loading, frequently requiring additional processing time and/or temporary storage, as well as possibly requiring different layouts to be selected for different computations and/or workloads.