The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Conventional breadth-first traversal methods for traversing a graph for a breadth-first search typically employ a small-sized auxiliary structure, such as a bit-vector, which is assumed to fit in a last level cache (LLC) to check whether a vertex has already been assigned a depth, to reduce external memory traffic. Further, the conventional methods typically employ atomic operations to avoid race conditions. However, as the graph size increases, the assumption that the bit-vector will fit the LLC may no longer be correct. Once the size of the bit-vector is larger than the LLC size, performance of conventional methods tends to degenerate. Additionally, the use of atomic operations may lead to increased latency in computation.
Further, with advances in integrated circuit technology, increasing number of processor cores are being integrated into a processor, offering substantial increase in computing capability. In turn, powerful computing systems with multiple multi-core processors are being built. Typically, the multi-core processors are distributed over a number of sockets. As a result, performance gains through parallel execution by multiple processor cores may be offset by the latency incurred by inter-socket communications.