The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
As social networks have gained in popularity, maintaining and processing the social network graph information using graph algorithms has become an essential source for discovering potential features of the graph. In general, a graph is a mathematical structure comprising an ordered pair G=(V, E), where V is the set of vertices or nodes represent objects, and the elements in set E are edges or lines which represent relationships among different objects. Many real world problems can be abstracted into graph problems, such as, social networks and traffic networks. The great increase in the size and scope of social networks and other similar applications has made it virtually impossible to process huge graphs on a single machine in a “real-time” level of execution.
Distributed computing techniques have been applied to graph computations in order to more efficiently process graph data. One example is Map-Reduce, which is a distributed computing model introduced by Google® that processes large data sets on clusters of computers in parallel using the principles of map and reduce functions commonly used in functional programming. Although many real world problems can be modeled using Map-Reduce, there are still many that cannot be presented very well using this framework. Furthermore, the Map-Reduce model has certain weaknesses that limit its effectiveness with regard to certain important applications, such as cloud computing and social network environments. For example, Map Reduce cannot share information among different slave machines when running map or reduce functions, and not all graph-based algorithms can be mapped onto Map-Reduce; and for certain graph related problems that can be solved by Map-Reduce, the solutions may not be optimum for certain applications (e.g., cloud computing). Increased scalability is another key concern in the development and application of graph processing systems.
What is needed is an effective and efficient way to decompose and reformulate the density-based clustering problem, and make it possible to be solved on Map-Reduce platforms efficiently. Concurrent with this objective is the need to provide a scalable algorithm that will perform faster when there are more machines in a Map-Reduce machine cluster; perform faster merging operations, since with results being calculated on multiple machines, the speed of merging these results is critical; maintain low network traffic by ensuring that the number of messages generated is not high; maintain good load balance by ensuring that all machines in a cluster have similar workloads, and maintain result accuracy.