In large and distributed computer networks, there are various applications for which it is useful to have some subset of networked nodes cooperate to perform a common task. For example, nodes may cooperate to implement a distributed web caching protocol, to form a distributed file system, to coordinate routing or forwarding of network traffic, to provide a shared memory or data storage, or to perform some distributed computation. Generally, as a first step for the nodes to cooperate, they must have some knowledge about each other, because, notwithstanding special cases like broadcast or multicast, a node generally can not communicate with a node on the network that it does not know about. If the cooperating nodes do not initially have information about all the other cooperating nodes, as is often the case for distributed cooperation, they typically attempt to “discover” other cooperating computers.
Frequently, each cooperating node has knowledge about one or more other cooperating nodes, but not all of the cooperating nodes. The problem of finding out about the other cooperating nodes is referred to as “resource discovery.” It is useful for resource discovery techniques to be efficient both in terms of time to complete discovery and in use of communications resources, where communications resources includes such resources as network bandwidth and node processing and memory. Generally, resource discovery is considered efficient in time if the nodes learn about each other quickly, and considered efficient in resources if the nodes learn about each other without using an inordinate amount of the network's or the nodes' communication resources. This is particularly important in applications where a method may repeatedly be used to obtain updated information about the cooperating nodes.
For descriptive purposes, resource discovery methods are frequently said to be performed in “rounds,” where a round is the time that it takes for each cooperating node to complete a task. For example, a round can be the time that it takes for each cooperating node to contact some number of cooperating nodes. For the purposes of explanation, a resource discovery method is said to be “complete” when every cooperating node has information about every other cooperating node. In practice, resource discovery may never be complete, because the cooperating node information may change with such parameters as user load, as well as with the addition and removal of cooperating nodes from the network or the cooperation scheme.
One example of a resource discovery technique involves use of the “flooding” algorithm. In the flooding algorithm, each node is initially configured to communicate with a fixed set of cooperating nodes, and direct communication is only allowed with nodes in this set. In each round, a system implementing the flooding algorithm contacts all of the nodes in the fixed set and transmits to them cooperating node information updates. The updates are the cooperating node information that has changed since the last time the node provided information to its set of cooperating nodes. A cooperating node receives the updates, then communicates the updates to its fixed set of communication partners by passing on any new information.
Generally, the flooding algorithm is not efficient both in terms of time and resources, as each node repeatedly contacts each of the nodes in its fixed set. As the number of nodes the set grows, the flooding algorithm can be somewhat efficient in terms of time, but will be inefficient in terms of resources. The flooding algorithm is used by Internet routers today, with the variation that Internet routers are designed with the capability of opening connections to all machines they have information about, not just machines in the “initial set.”
For example, such a resource discovery technique involves use of the “swamping” algorithm. The swamping algorithm is similar to the flooding algorithm except that, in each round, each node opens connections with all of the cooperating nodes that it has information about and communicates the updates to each such node that it has information about. An advantage of the swamping algorithm is that resource discovery is completed quite quickly compared to other techniques, so that it is efficient in terms of time. A disadvantage of the swamping algorithm is inefficient in terms of resources because the communication resources required are significant. The speed of the swamping algorithm comes at the cost of wasted communications resources where many nodes are provided information about nodes that they already have information about. In the final rounds, each node communicates with almost every other cooperating node.