The present invention relates generally to the field of data structures, and more particularly to probabilistically finding the connected components of an undirected graph.
A graph is a representation of a set of objects where some objects are connected by links, which can be used to model real-life networks, such as social media interactions. In a graph environment, a vertex represents a node, and an edge is a link between two vertices that may be directed or undirected (i.e., oriented in a certain direction or no orientation, respectively). A connected component of an undirected graph is a maximal set of connected nodes, in which any two nodes are connected to each other by a set of interactions, and are not connected to any additional nodes in the graph. Computing the connected components of a graph is a well-known task in graph theory and may be applied to real world tasks, for example, finding all the connected groups in a social network.
Multiple algorithms have been created to determine such events as when and how components are formed over time. Some algorithms may be implemented with systems in which the interactions are persistent, and there are no insertions and deletions of an edge or a node. Such algorithms usually require at least |E| space, where |E| is the number of edges, making the algorithm more suitable for smaller graph structures. Other algorithms, such as the union find algorithm, are suited for implementation in a growing graph environment in which edges are being added, and the graph must be continually updated.
A bloom filter is a probabilistic data structure that is designed to determine whether an element is likely present in a set, or not present in the set. Elements may be added to a set, but may not be removed unless an additional structure is implemented with the bloom filter, such as a counting bloom filter, which is a variant of a bloom filter that allows for deletions of elements from the bloom filter. A bloom filter is a space and time efficient structure for determining membership in a set; however, because of its probabilistic nature, there is a possibility that a false positive match may be returned.