The present invention pertains to the field of data storage. Specifically, the present invention provides for the enumeration of components in a graph without explicitly defining the edges in the graph.
A graph is a collection of “vertices” (points or nodes) and “edges” (lines connecting points). The graph can be representative of any set of data, such as those related to travel, biological samples, and chip design, to name a few. Points in the graph represent an individual collection of data, and edges between two points can represent data that is shared between the two points. For instance, in the travel industry a graph may represent a grid of airline flights between numerous cities regardless of which airline is used. Each node in the graph can represent a city to which a flight is possibly directed. In one case, connected points form an edge and are related in that those two points share the same flight. As another edge, the connected points may represent a flight between two cities for a particular airline.
In conventional techniques, a graph is typically represented in memory as a list of all pairs of vertices that share an edge. In addition, a “connected component” of a graph is any subset of vertices all connected by some sequence of edges. Enumerating the connected components of a graph is a problem in classical computer science. Traditional methods include Kosaraju's algorithm, Tarjan's algorithm, and Gabow's algorithm.
However, for each of these techniques enumerating connected components, execution time and space in memory are proportional to the total number of vertices and edges, or O(V+E). In more simplistic terms, the entire graph and all the edges in the graphs need to be evaluated in order to enumerate the connected components. While this may seem like a straightforward technique, as the number of points in the graph increases, the time to enumerate the graph also increases. For graphs that include points that are heavily connected, the execution time may increase to the square of the number of points in the graph. As such, for large amounts of data, traditional techniques for component enumeration fall short of providing real-time analysis of the graphical data.