Throughout the instant disclosure, numerals in brackets—[ ]—are keyed to the list of numbered references towards the end of the disclosure.
Scalable and robust distributed data dissemination requires scalable control networks to efficiently handle control information. Distributed data indexing techniques normally partition the data index set, and assign partitions to a set of nodes distributed across a network. Examples of applications which use distributed data indexing are: distributed directories, content distribution networks and web caching.
Overlay networks provide application level data routing and control functionality needed to support distributed data indexing. Overlays have the advantage of higher manageability and adaptability required by applications which require dynamic scaling of distributed data; they can be more efficient for applications with such dynamics despite the overhead introduced by providing network functionality at the application level.
Several interconnection topologies have scalability and resiliency properties required to support distributed data indexing. While designed primarily for parallel computing systems, interconnection topologies can be used as routing networks by providing an efficient mapping of the interconnection topology to the network overlay. After overlay construction, a distributed hashing method assigns keys to network overlay nodes so as to optimize the computation and communication cost for data retrieval. Typical metrics used for optimal distribution of data indexes are: average key look-up delay, the average forwarding overhead of look-up queries, etc.
An application of distributed hashing is scalable control of group communication. The control topology for scalable group communication has several desirable properties: the control node degree (number of connections per control node) grows logarithmically with the size of the control topology, the average number of hops required to locate any item (communication group) is logarithmic with the size of the network, per node control overhead is load balanced and the overhead of control topology dynamics scale with the logarithm of its size. Historical methods of constructing network overlays for distributed indexing applications do not optimize network quality of service metrics; the overhead metric is computed as the average number of hops required for key look-up, regardless of the network delay. The routing cost on the underlying network can be quite large if the topology of the constructed overlay network is not correlated with the underlying network distances. By providing an optimal (delay minimized) configuration of the network overlay, the quality of service offered to delay sensitive distributed indexing applications is substantially increased.
Several methods of constructing overlay topologies for distributed indexing applications have been proposed recently: Chord, Can, hypercube, and deBruijn structures etc. [6,7]. Each of these exhibits a subset of the features mentioned above. However, the construction of network overlay for the above DHT's does not take into account the quality of the service offered by the underlying network to the distributed indexing application.
In view of the foregoing, an evolving need has been recognized in connection with improving upon the shortcomings and disadvantages encountered in historical efforts towards distributed data dissemination and indexing.