Computer networking continues to evolve. The earliest computer networks connected dumb terminals to monolithic centralized computers. Each terminal was limited to displaying only those services provided by the centralized computer. Later, personal computers revolutionized computing by enabling individual users to execute applications independently. Local area networks formed from interconnected personal computers facilitated intercomputer communication and resource sharing. Wide area networks combining diverse computing platforms, including personal computers through legacy mainframes, have enabled access to information and computing services worldwide through interconnectivity to internetworks, such as the Internet.
Conventional local area and wide area network services typically include a centralized server to manage and coordinate network service activities for subscribing peer nodes. The use of such centralized servers results in the creation of two de facto “classes” of computer systems, whereby computational services are provided by one or more powerful server nodes, while various capable, but underutilized, client nodes are relegated to consuming information and services. Recent advances in peer-to-peer networking design attempt to rebalance these computational inequities by better utilizing the idle computational, storage and bandwidth resources found in the client nodes. When coupled with the services provided by conventional server nodes, peer-to-peer networking seeks to provide a higher level of network services at a lower overall cost.
Certain types of network services that are generally provided in a server-centric fashion, however, must be redefined when moving from the conventional client-server network model to a peer-to-peer network model. For example, information discovery and retrieval that is provided through on-line searching tools, such as those services provided by MSN and Google, have become increasingly popular among Internet users. These tools rely on a centrally located and managed indexing database and information requests are resolved by performing a query against the indexing database. Building and efficiently accessing the indexing database remains an essential aspect of these tools, although recent efforts at distributing indexing databases amongst peer nodes have suffered in terms of scalability, success rate and real-time performance.
Providing remote access to distributed indexing information in both conventional IP subdomains and within peer-to-peer networks poses challenges with respect to availability and scalability. First, peer nodes frequently include local file storage and locally stored information that can be made available to other nodes over the network through various types of network file systems and file sharing arrangements. However, access to such information requires that the storing node be available. Most file access schemes fail when the storing node is unavailable either due to being off-line or inactive.
In a peer-to-peer system, the key can be used to select a node to store the key and value pair. Preferably, the key maps to the node in a deterministic fashion and any node in possession of the key is able to readily find the node storing the value. Popular or frequently recurring keys tend to create a logical “hotspot” within a network that overtaxes the node associated with the key. The node receives a disproportionate amount of query traffic and must provide extra processing and network bandwidth and additional storage capacity. Hotspots can be minimized through the use of non-deterministic key assignments, but ensuring consistent usage at every node in possession of a potential key in a distributed computing environment can be difficult or impracticable to manage.
There is a need for an approach to provide deterministic storage of indexing information for key and value pairs in a distributed peer-to-peer network. Preferably, such an approach would be scalable to support indexing through a wide area network scale, including on the Internet. To support such high scalability, such an approach would properly distribute information to avoid any hotspots and offer close-to-real-time performance. Preferably, such an approach would ensure the accessibility of indexing information at all levels through a combination of neighboring peer nodes and duplication.