Graph applications have become ubiquitous in the present age. Social networking programs, online purchasing programs, and map programs, for example, utilize graph applications to provide searching ability, recommendation analytics, and the like. One type of searching algorithm for a graph application is Breadth-First Search (BFS).
BFS is a fundamental primitive used in several graph applications and hence, accelerating it may be useful. The conventional algorithm for BFS traversal manifests only one degree of freedom, which is a traversal algorithm that can be either top-down or bottom-up. These search algorithms may be performed on a processor, such as a central processing unit (CPU), or a graphics processing unit (GPU), and each of these algorithms have advantages and disadvantages depending on the type of graph applications being searched.
For example, the bottom-up algorithm works well for graphs with a large average degree, where the degree of a vertex in the graph may be defined as the number of incident edges to that vertex. The bottom-up algorithm may be efficient for the intermediate iterations during the life-cycle of BFS when the number of visited vertices is substantially large. The reverse is true for the top-down algorithm. Hence, the optimal algorithm and platform for a BFS traversal may depend on the characteristics of the input graph.
GPUs have gained popularity as an accelerator platform in recent years, but GPUs have not been conventionally utilized to perform BFS because the traditional top-down BFS algorithm oftentimes includes a lack of locality, irregular memory access patterns and load imbalance. Recently, however, a bottom-up BFS algorithm has been developed which mitigates the challenges of the top-down algorithm on GPUs. The bottom-up algorithm proceeds by finding the parents of unvisited vertices as compared to the top-down algorithm which finds the children of visited vertices in a graph. Current techniques have implemented the BFS algorithm, with top-down and bottom-up characteristics, on homogeneous processors like CPUs and GPUs. Such methods, however do not utilize the heterogeneous capabilities that are becoming increasingly important to maximize performance under restrictive thermal budgets.
Accelerated processing units (APUs), however, include both a CPU and GPU and accordingly, either processing unit may be utilized to perform the BFS algorithm. It would therefore be beneficial to provide a method and apparatus for performing a BFS that can partition the execution between the top-down and bottom-up algorithms as well as select the appropriate processing unit for every iteration of BFS.