The number of information systems that provide the capability to store and retrieve large amounts of data and other resources continues to grow. For such systems, the architecture and methodologies employed by the system have a significant impact on the performance and reliability of the system.
In a non-networked system, resource lookup and addressing is simple. For example, to locate a printer the system simply checks its configuration for a directly connected printer. In networks of computer systems, resource location becomes much more difficult.
A common model for resource lookup is one in which a requesting computer system asks all computer systems on the network if they hold the required resource (where “resource” encompasses data, programs, hardware, and so on). For example, a computer requiring a printer on a local area network (LAN) may broadcast a request to all nodes on the LAN when it requires a printer, following which systems offering printing services will reply to the originating node. However, this approach is not scalable to large networks because it would be inconceivable to ask every computer on the Internet for the location of a particular data file, for example.
Many information systems utilize a centralized storage model in which all the available resources are listed in a single index, which is stored on a single centralized storage system (“server”), with which all other systems (“clients'”) must communicate in order to access the information. For example, to locate a printer in a local area network (LAN) a client may contact the master server which has knowledge of all of the printing resources within the LAN.
The use of such a centralized model is not, however, desirable under all circumstances. If the centralized storage system is networked to the client systems by a small number of relatively narrow communication channels, the amount of data being transferred to and from the storage system may exceed the capacity of the communication channels. Another difficulty often encountered in a network environment is low network performance (a high “latency”, or information transit time) as data traverses the network when traveling between the client systems and the centralized storage system. Another difficulty arises from the need to provide a storage system having sufficient capacity to store all of the resource locations. Yet another difficulty arises from the decreased reliability which results from storing all of the resource locations on a single system, i.e. the central system is a single point of failure.
Such deficiencies of resource lookup in large networks, such as the World Wide Web (WWW), has led to the creation of search engines, web indexes, portals, and so forth. Web indexes and search engines operate much as the previously described central index, and rely on a process whereby resource locations are inserted (manually or automatically) into the index. However, these still suffer from the deficiency of being a central index as described earlier. In summary, the centralized model is not scalable to large networks.
Current approaches to solving this problem for networks such as the Internet involve replicating the centralized index across a plurality of servers, but this has the deficiency that the indices must be kept synchronized, which is not scalable to vast resource sets or large networks. In addition, the replication approach typically entails replication of hardware sufficient to host the entire index at each replicated location. For large indices this may imply a significant additional cost burden that further impairs scalability.
Although the inadequacies of existing resource lookup methods have been previously recognized, and various solutions have been attempted, there has been and continues to be a need for improved resource lookup systems. Of particular interest here is a solution enabling efficient location and retrieval of an item from a resource set which is vastly distributed.