The storing of information in a network has traditionally followed the client-server model, i.e. the information is stored centrally in servers which are accessible by a number of clients. Typical examples are web servers that are accessible over the Internet from clients (home computers, mobile devices etc) located all over the world. The client-server model has more and more been challenged by the peer-to-peer (P2P) model. In contrast to the client-server model the peer-to-peer model has no distinction between clients and servers in the network. A node (also called a peer) can be both a client and a server at the same time and can access information stored in other nodes and store information accessible by other nodes. A network comprising these nodes is consequently called a peer-to-peer (P2P) network. P2P networks are usually overlay networks on top on an existing IP network such as the Internet. A well known example of a P2P network is the set of nodes (such as personal computers) connected to each other using the P2P protocol BitTorrent.
One advantage with P2P networks is that information (here also called objects) can be distributed and not located in a single point of failure such as the server in a client-server network. P2P networks are also more scalable than client-server networks. On the other hand, a search for an object in a client-server network is relatively easy whereas a search for an object in a P2P network is more complex. The problem is to find out in which node the requested object is located. For this reason, the BitTorrent network also comprises a centralized server called a BitTorrent tracker. This tracker keeps information about where (in which nodes) the objects are located. Again, if only one tracker is used it becomes a single point of failure. This means that these trackers need to be very reliable.
To overcome this, a flat structured overlay network has been proposed where the algorithm to locate objects in the network is based on key-based routing, also called Distributed Hash Tables (DHT). In DHT the nodes are organized in a ring or a so called identifier circle. Different DHT algorithms have been devised such as Chord, Pastry and Kademlia. Chord is for example described more in detail in the paper ‘Chord: A scalable Peer-to-peer Lockup Protocol for Internet Applications’ by Ian Stoica et al published in 2001 in relation to the SIGCOMM '01 conference. One overlay network that relies on the Chord DHT algorithm is the Peer-to-Peer Session Initation Protocol (P2PSIP) as suggested by the IETF papers draft-ietf-p2psip-concepts-02, Jul. 7, 2008 and draft-ietf-p2psip-base-02 (RELOAD), Mar. 7, 2009. P2PSIP/RELOAD allows data to be stored on peers and retreived in an efficient manner.
US patent application 2005/0080858 discloses a system and a method for searching in an unstructured P2P network. In this application multicast request messages are sent to the neighboring peers that in turn may multicast the request messages to other peers until a search radius is reached.
The paper ‘Scalable blind search and broadcasting over Distributed Hash Tables’ published Aug. 15, 2007 discloses a framework named Recursive Partitioning Search (RPS) for blind search over a structured P2P networks. Here, the node sends queries to all of its fingers where each query comprises a tag that contains a value specifying the endpoint of a recipients search region.
The paper ‘Efficient broadcast in P2P grids’ published in May 2005 discloses an algorithm to perform broadcast in P2P grids and to reach as many nodes as possible by regular non-redundant distribution.
Structured overlay networks using DHT provides an efficient way for performing exact searches as for example: ‘do you have an object corresponding to the key “Ericsson”?’. A problem with structured overlay networks is however that they are not well suited for wild card searches. An example of a wild card search is: ‘do you have an object corresponding to the key “Eri*”?’. Many applications and in particular users of the P2PSIP protocol would benefit from having possibilities to do wild card searches.
The paper Wildcard Search in Structured P2P Networks' published November 2007 discloses keytoken-based index and search scheme for wildcard searches in structured P2P networks. In this scheme each keyword is tokenized and hashed into an r-bit vector representing a node in an r-dimensional hyper-cube. This scheme does however require very high-dimensional hyper-cubes and to overcome this problem, additional measures need to be taken that increases the complexity.