1. Technical Field
The present invention relates to an improved data processing system and, in particular, to a method and system for multi-computer data transferring. Still more particularly, the present invention provides a method, apparatus, and computer implemented instructions for restricting a fan-out search in a peer-to-peer network based on accessibility of nodes.
2. Description of Related Art
The amount of Internet content continues to grow rapidly and to outpace the ability of search engines to index the exploding amount of information. The largest search engines cannot keep up with the growth as it has been estimated that search engines only index about 5% to 30% of the information content on the Web. Hence, at the current time, the majority of Web content is not classified or indexed by any search engine.
There are currently two broad categories of systems which provide the service of categorizing and locating information on the Web: (1) search engines that return direct hits to sites containing data that match inputted queries, such as AltaVista; (2) Web portals that organize the information into categories and directories, such as Yahoo!. These systems operate using a traditional client-server model with packet-switched data interchange.
Recently, the traditional Web client-server paradigm has been challenged by distributed content-sharing or file-sharing systems that support a peer-to-peer model for exchanging data. In peer-to-peer networks, each computer platform, or node, can operate as a hub, i.e., each node has both client functionality and server functionality. Each node has a list of addresses, most commonly Internet Protocol (IP) addresses, of several other nodes, or “peer nodes”. These nodes can directly communicate with each other without a central or intermediate server.
Nodes within a peer-to-peer network form a distributed file-sharing system in which the nodes act cooperatively to form a distributed search engine. When a user at a node enters a search query, the search query is copied and sent to its list of peer nodes. Each peer node searches its own databases in an attempt to satisfy the search query. Each node copies the query to each node in its list of peer nodes while observing a time-to-live value in the query message. If a resulting query hit is made, then the node returns some type of query results to the originating node. The search quickly fans out amongst a large number of nodes, which provides a useful manner for finding new content that has not yet been indexed by the large search engines.
In a peer-to-peer data sharing network, each node participates in a process of connecting and disconnecting with other nodes. When a connection is established with another node, a user cannot quickly determine whether or not it is worth browsing the content of the newly connected peer node. Since the search might fan out within a widely distributed network, the search can often reach nodes that do not contain any content that would be of interest to the user.
In a peer-to-peer network, a search originated by one node fans out to other nodes. This fan out may occur in an exponential manner depending on the connectivity of the nodes to which the message for the search reaches. In some cases, a target node reached in a search may be behind a firewall. Such a target node is not of any use to the node originating the search request because even if the node contains data corresponding to the search, the originating node may be unable to pull that data from the target node behind the firewall. Data is pulled when a first node requests the data from a second node to cause the second node to send the data to the first node.