1. Field of the Invention
This invention relates to peer-to-peer networking, and more particularly to distributed indexes in peer-to-peer networks.
2. Description of the Related Art
The Internet has three valuable fundamental assets—information, bandwidth, and computing resources—all of which are vastly underutilized, partly due to the traditional client-server computing model. No single search engine or portal can locate and catalog the ever-increasing amount of information on the Web in a timely way. Moreover, a huge amount of information is transient and not subject to capture by techniques such as Web crawling. For example, research has estimated that the world produces two exabytes or about 2×1018 bytes of information every year, but only publishes about 300 terabytes or about 3×1012 bytes. In other words, for every megabyte of information produced, only one byte is published. Moreover, Google claims that it searches about only 1.3×10^8 web pages. Thus, finding useful information in real time is increasingly difficult.
Although miles of new fiber have been installed, the new bandwidth gets little use if everyone goes to one site for content and to another site for auctions. Instead, hot spots just get hotter while cold pipes remain cold. This is partly why most people still feel the congestion over the Internet while a single fiber's bandwidth has increased by a factor of 10^6 since 1975, doubling every 16 months.
New processors and storage devices continue to break records in speed and capacity, supporting more powerful end devices throughout the network. However, computation continues to accumulate around data centers, which have to increase their workloads at a crippling pace, thus putting immense pressure on space and power consumption.
Finally, computer users in general are accustomed to computer systems that are deterministic and synchronous in nature, and think of such a structure as the norm. For example, when a browser issues a URL (Uniform Resource Locator) request for a Web page, the output is typically expected to appear shortly afterwards. It is also typically expected that everyone around the world will be able to retrieve the same page from the same Web server using the same URL.
The term peer-to-peer networking or computing (often referred to as P2P) may be applied to a wide range of technologies that greatly increase the utilization of information, bandwidth, and computing resources in the Internet. Frequently, these P2P technologies adopt a network-based computing style that neither excludes nor inherently depends on centralized control points. Apart from improving the performance of information discovery, content delivery, and information processing, such a style also can enhance the overall reliability and fault-tolerance of computing systems.
FIGS. 1A and 1B are examples illustrating the peer-to-peer model. FIG. 1A shows two peer devices 104A and 104B that are currently connected. Either of the two peer devices 104 may serve as a client of or a server to the other device. FIG. 1B shows several peer devices 104 connected over the network 106 in a peer group. In the peer group, any of the peer devices 104 may serve as a client of or a server to any of the other devices.
An inverted index is an index into a set of documents using one or more of the terms or other content in the documents. An inverted index may be accessed by some search method. Each index entry gives a term and a list of texts, possibly with locations within the text, where the term occurs. A full inverted index is an inverted index that includes the exact location within documents, in addition to the document(s) in which the word appears.
XPath is a language that describes a way to locate and process items in Extensible Markup Language (XML) documents by using an addressing syntax based on a path through the document's logical structure or hierarchy. This makes writing programming expressions easier than if each expression had to understand typical XML markup and its sequence in a document. XPath also allows the programmer to deal with the document at a higher level of abstraction. The key difference between XPath and earlier languages is that XPath specifies a route, rather than pointing to a specific set or sequence of characters, words, or other elements.
A content-addressable network is a peer-to-peer network of nodes that implement a distributed hash table. Each node in the context-addressable network stores a piece (a zone) of the hash table, and routing information for one or more other nodes in the content-addressable network. Given a key, the key may be mapped to a node in the context-addressable network.
Typically, presence information is stored centrally on a single server or a cluster of servers. Prior art architectures for presence typically have the problems commonly associated with centralized architectures—lack of scalability and lack of reliability. Therefore, a mechanism for distributing presence information among a plurality of nodes in a network is desired.