Peer-to-peer (P2P) internet applications have recently been popularized through file sharing applications like Napster™ (Napster is a registered trademark for Napster, Inc, and/or for Roxio, Inc.), Gnutella, and Freenet. Peer-to-peer systems have many interesting technical aspects like decentralized control, self organization and adaptation. They can be characterized as distributed systems in which network nodes have identical capabilities and responsibilities and communication is symmetric. Considering all the network nodes as peers results in a completely flat structure of the network.
Peer-to-peer designs harness huge amounts of resources—the content advertised through Napster™ has been observed to exceed 7 TByte of storage on a single day, without requiring centralized planning or huge investments in hardware, bandwidth, or rack space. As such, peer-to-peer file sharing may lead to new content distribution models for applications such as software distribution, document sharing, and static web content delivery.
Just as important as storing data is the mechanism for searching data, e.g. for searching a document. In a method described in the publication “Querying the Internet with PIER”, Ryan Huebsch, Joseph Hellerstein, Nick Lanham, Boon Loo, Scott Shenker and Ion Stoica, Proceedings of the 29th VLDB Conference, Berlin, Germany 2003, database tables are distributed amongst multiple hosts and then retrieved in parallel. However, this method does not integrate the database into a distributed hash table (DHT) environment. Therefore, the described method needs considerable storage capacities.
In the IBM patent application EP 03405516.0, filed on Jul. 10, 2003, a method for locating documents in a network with distributed storage nodes is described. According to this method, for example, a query for a document showing a thumbnail picture of a sunflowers painting by van Gogh can be performed. Therefore, at a requesting entity, the following keywords are generated:(“Artist=Van”, “Artist=Gogh”, “Title=Sunflowers”, “Thumbnail”),                wherein        keyword1=“Artist=Van”,        keyword2=“Artist=Gogh”,        keyword3=“Title=Sunflowers”, and        keyword4=“Thumbnail”.        
When inserting a document into the database of the distributed storage system, a hash function h( ) is applied to each keyword of the document. The keywords of a document may be extracted by way of methods known to those skilled in the art. The result of such a hash function h( ) can also be called index identifier or indexID. Such index identifier might already be present in the DHT system, in particular when other documents stored in the distributed storage system comprise the same keyword. A document identifier—also called docID—of the document to be inserted is either additionally assigned to an existing index identifier or assigned to a new index identifier. A list comprising a mapping of document identifiers to index identifiers is also called index document. Such index document might also comprise more than one index identifier and thus deliver information with regard to documents comprising one or more other keywords.
A set of index identifiers extracted from a query for n keywords might look like:                indexID1=h(keyword1),        indexID2=h(keyword2),         . . .        indexIDn=h(keywordn)        
An individual keyword can apply to one or more docIDs. As a consequence, each indexID is associated with a set of docIDs to which new docIDs representing documents comprising this keyword are added.
According to the patent application referenced above, a complex query such as a query for several keywords is split into several subqueries. Each subquery, which can also be called atomic query, is a request for document IDs that match a given keyword. For a complex query, several atomic queries are combined in a distributed fashion using boolean operators. An example of a complex query might look as follows:“Artist=Van” and “Artist=Gogh” and not (“Title=Sunflower” or “Type=Letter”)The resulting complex query is assembled by the appropriate subqueries linked by set operators:V intersect G intersect complement (S union L)                wherein:V, G, S and L are atomic queries for the following keywords:        V=“Artist=Van”;        G=“Artist=Gogh”;        S=“Title=Sunflower”;        L=“Type=Letter” and“intersect”, “union”, and “complement” are set operations.        
This information is then preferentially encoded, e.g., using reverse polish notation (RPN) as instructions for a stack machine.
However, often not only exact queries for a keyword or queries for text containing words or matching keywords are required. It may be necessary to query for value ranges, e.g.: search all files smaller than 100 Kbytes; search all documents newer than Jan. 1st, 2001; search articles offered for a range between 10 and 20 Euro; or search cities located between 45 and 49 degrees northern latitude and 50 and 55 degrees eastern longitude. The last example shows that it might also be necessary to search for a two dimensional range, wherein the search for a range “between 45 and 49 degrees northern latitude” represents a search in a first dimension and the search for a range “between 50 and 55 degrees eastern longitude” represents a search in a second dimension. Of course, also an n-dimensional range might be subject to a query.
However, when searching ranges in general, two different approaches are possible:                In a first approach, it envisaged to grant only one shot or access to a single network entity storing an index document (i.e. a set of docIDs linked to an indexID) within the distributed storage system. This approach will result in up to V*(V−1)/2 index documents, each containing a subset of the docIDs, to allow arbitrary range queries, wherein V is the size of the value space, e.g. 232.        In another approach, it is envisaged to grant only one single copy of a docID in the distributed storage system. This will result in (U−L+1) requests for index documents, where L and U are the lower and upper bound of the searched range, respectively. This mechanism can be run on top of the mechanism described in document EP 03405516.0.        
These two approaches show that either a lot of memory space or a lot of queries are required to perform a range search. This is not beneficial for many environments, especially, when the ranges searched do have arbitrary granularity.
Optimization criteria can be: that the amount of required memory space (storage) should be as small as possible. This is a static requirement. The number of entities in the network to visit or index documents to retrieve should also be as small as possible for typical queries. This means that the number of queries for searching the desired range should be as small as possible. This is a dynamic requirement. Depending on a usage scenario, these two requirements may have different priorities.