A network overlay is an abstraction of a physical network that identifies a subset of network nodes and maintains a set of logical links between them. The software that implements a network overlay must maintain node membership and communication, mapping the logical links to actual physical connections. In peer-to-peer overlays, all nodes participate in overlay management, maintaining a number of the logical links between them; structured peer-to-peer overlays in particular have clearly defined and enforced link topologies and are often used to support distributed data structures, such as distributed hash tables. A number of structured overlay designs exist, of which the most cited include CHORD, CAN, Pastry, and Tapestry.
Following the concept of peer-to-peer, the overlays above have flat designs that do not distinguish differences between nodes or links. In the case of nodes, this is the ideal from the point of view of a peer-to-peer design, and is advantageous because distinguished or specialized nodes are potential bottlenecks or singular points of failure.
However, since the realization of logical links depends on the underlying physical connections between nodes, ignoring differences between links can lead to poor performance. For example, two nodes that are neighbors (share a link) in the overlay may in fact be separated by a large geographical distance and/or network hops.
Introducing a hierarchy into an overlay design is a way to incorporate the differences between links. For example, nodes that are close according to some locality metric can be grouped or clustered at a low level so that interactions between local nodes do not leave the cluster. Clusters can then be linked at higher levels so that nodes in different clusters can interact. It can be proven that a two-level overlay has better average search times than a flat overlay, as follows.
Let the average communication latency between two nodes in the same cluster be t, and that between two nodes in different clusters be T, such that t<<T. Let f(x) be the average number of overlay hops to resolve a query in an overlay with x nodes. Assuming n nodes per cluster and k clusters (N=nk total nodes) and that all clusters are connected, the average search times in a one level and two level overlay are obtained as follows, given that the query can be resolved in only one cluster.
The probability that a hop is to a node on the same cluster is (n−1)/(nk−1)≈1/k, given that there are n nodes per cluster. Based on this probability, the average hop latency (h) is the sum of the time for each kind of hop, weighted by the probability of that hop, namely, h=(1/k)t+(1−1/k)T.
Thus, the average search latency for the overlay is given by the product of the average hop latency and the average number of hops per search, which in this case is in the full overlay of N nodes,
  h  ⁣            ·              f        ⁡                  (          N          )                      =                            t          ·                      f            ⁡                          (              N              )                                      k            +                                                  (                              k                -                1                            )                        ·            T            ·                          f              ⁡                              (                N                )                                              k                .            
A sequential search is where each cluster is queried until the required data is found. In this case, the probability that a query is resolved in the jth cluster is needed. Since for this analysis any cluster is equally likely to contain the result, this probability is 1/k. Now, the search latency if query resolved in jth cluster (lj) is given by j·t·f(n)+(j−1)T.
The above was obtained from j searches within local clusters and the long distance jumps between them. Finally, the probability of each latency is used to obtain the average search latency
            (              1        /        k            )        ·                  ∑        j                                      ⁢              l        j              =                              t          ⁡                      (                          k              +              1                        )                          ·                  f          ⁡                      (            n            )                              2        +                            T          ⁡                      (                          k              -              1                        )                          2            .      
If the number of clusters k is constant, the search latency given by
  h  ⁣            ·              f        ⁡                  (          N          )                      =                            t          ·                      f            ⁡                          (              N              )                                      k            +                                    (                          k              -              1                        )                    ·          T          ·                      f            ⁡                          (              N              )                                      k            is dominated by the product of the large time T and the average search time for the total number of nodes N, whereas for
            (              1        /        k            )        ·                  ∑        j                                      ⁢              l        j              =                              t          ⁡                      (                          k              +              1                        )                          ·                  f          ⁡                      (            n            )                              2        +                  T        ⁡                  (                      k            -            1                    )                    2      the term for T is linear and the search latency is dominated by the search time within clusters given by t·f(n).
The designs of most structured peer-to-peer overlays are flat and do not reflect the underlying physical or logical organization or grouping of nodes. Ignoring this underlying infrastructure leads to inefficient or unwanted performance because there is no control over the communication of nodes between different groups, which is considered to be more expensive than intra-group communication.
Hierarchical overlay designs incorporate this knowledge into the overlay by clustering nodes based on some locality metric and interconnecting clusters at a higher level. These hierarchical overlays have been constructed using specialized nodes from the lower level clusters, which connect to form the higher level overlay. Designs exist for hierarchical overlays that connect clusters without cluster heads, but theses designs still use converging inter-cluster paths that are potential bottlenecks and do not fully exploit locality for inter-cluster searches.
The above analysis does not consider how clusters are interconnected. Most existing designs for hierarchical overlays use the concept of cluster-head, so that one node or subset of nodes in each cluster are used to form the higher level overlay(s) that connect the lower level clusters. Cluster-heads are not necessarily unique or static. Different techniques, such as voting, rotation, replication, etc., can be used to minimize the potential of the cluster-head as a bottleneck or single point of failure.
For example, links between nodes in different clusters can be constructed as if constructing a single layer cluster, but keeping intra-cluster and inter-cluster links separate and limiting the number of inter-cluster links to bound the average number of links per node. In this example, the inter-cluster paths still converge so that inter-cluster paths are potential bottlenecks.
It desirable to provide a two-level case that has less storage overhead while allowing all nodes access to remote clusters, and better exploits locality for search optimization.