1. Field of the Invention
This invention relates generally to graph analysis and more particularly to bit-vector algorithms for graph processing.
2. Description of the Related Art
Graph analysis plays a critical role in data mining that involves transactional activity, social networks, and communications. Much of this analysis uses connectivity-metrics such as betweenness and closeness to characterize highly connected nodes, intermediaries, or hubs of activity. Calculating these metrics requires finding such graph properties as connected components or shortest paths. The processing required to find all connected components or shortest paths for a large graph is very expensive, making real-time (interactive) analysis infeasible.
One connectivity-metric is referred to as “connected components”, which may be determined using recursive functions in conventional systems. An unrestricted recursive process commences by setting all of the nodes in a graph to UNVISITED, and selecting any UNVISITED node as a starting point. The selected node is then marked as VISITED and a depth-first search of all connected nodes that are also UNVISITED is performed, marking each node encountered in the search as VISITED. This searching is continued until no more UNVISITED nodes can be found in this component. The next component's starting point is found by searching the graph for remaining UNVISITED node(s). The process is then repeated until all nodes in the graph are marked VISITED. The depth-first search is implemented as a recursive function. This implementation runs very quickly but may cause a stack overflow if the graph is too large.
An alternative implementation of the unrestricted recursive algorithm is one that restricts the recursive depth to avoid stack overflow errors. This implementation is considerably slower than the unrestricted version but will produce the same results as long as the recursive depth allowed is greater than or equal to the graph's diameter. A graph's diameter is defined as the number of edges in the longest shortest-path of the graph. This definition guarantees a finite diameter even in graphs with unconnected components. If the restricted recursive depth is shorter than the diameter the algorithm will fail, not being able to connect these components.
Accordingly, existing algorithms for determining connected components are variously inefficient, particularly with regard to graphs of relatively large diameter. Other connectivity-metrics include shortest paths, betweenness, clustering, and Steiner Tree. Techniques are also known for determining these connectivity-metrics, but they too are variously deficient for larger graphs.
Accordingly, what is needed are techniques for determining connectivity-metrics that are less computationally expensive than existing techniques, and that are more readily extensible to graphs of relatively large diameter.