The present invention relates to the field of digital computer systems, and more specifically, to a method for searching a graph to identify cliques.
The traditional way of scaling up the performance of “Big Data” analytics applications is to deploy them on a MapReduce cluster. MapReduce was designed with scalability, fault tolerance and ease of programming in mind, and achieves a near linear scaling in performance for tasks that require a brute force scan of the input. However, its raw performance for service analytics applications is a matter of debate. Approaches that use application specific data access schemes often outperform the brute force scan strategy of MapReduce.
Large data sets can be viewed as very large graphs since the data entries relate to a subset of other data entries, like the webpages do link to related webpages. For analysis of such “Big Data” one is often interested in finding highly linked subsets or hot-spot. For example, random graphs processed within a standard processor environment require random access to the main memory. This limits in the end attempts to optimize algorithms such as the Bron-Kerbosch algorithm within a standard processor environment. The irregular memory accesses and the limited single instruction multiple data (SIMD) parallelism exhibited by these algorithms, combined with a need for dynamic parallelization and load balancing, create a significant mismatch with the computation, memory access, and communication capabilities of the graphics processor unit (GPU) architecture.