1. Field of Technology
The field of technology relates generally to distributed resource searching.
2. Description of Related Art
In many operational environments, resources are distributed rather than being contained at one central depository.
For example, among many advancements to the computing field, the increasing preference for sharing computer resources and information, the decreasing cost of powerful computers and workstations, the widespread use of networks, the Internet, and the maturity of software technologies is increasing the demand for more efficient information retrieval mechanisms.
In general, the handling of queries with respect to specified topics is inefficient. For example, “Peer-to-Peer” (P2P) communications as a form of networking is becoming increasingly popular because P2P offers significant advantages in simplicity, ease of use, scalability, and robustness. P2P systems are communications networks where any currently connected computing device (also referred to as an Internet “edge node” or “fringe node”) can take the role of both a client and a server. Generally, P2P systems are networked personal computing devices (e.g., personal computer (PC), personal digital assistant (PDA), Internet-capable wireless telephones, and the like), where each network node has no fixed Internet Protocol (IP) address and therefore is outside the Internet's Domain Name System (DNS; viz., where an IP address like “232.452.120.54” can be something like “xyz.com”). P2P is a way of decentralizing not just features, but costs and administration as well.
P2P computer applications are a class of applications that takes advantage of resources (e.g., storage, cycles, content, human presence, and the like) available on the fringe of the Internet. However, accessing such decentralized resources means operating in an environment of unstable connectivity and unpredictable location since the nodes operate outside the DNS, having significant or total autonomy from central servers. At the same time, it is an advantage of such systems that communications can be established while tolerating and working with the variable connectivity of hundreds of millions of such fringe nodes. There is therefore a requirement for P2P system designers to solve connectivity problems. A true P2P system must (1) treat variable connectivity and temporary network addresses as the norm, and (2) give the Internet fringe nodes involved in the network some significant autonomy.
One known P2P network protocol, known as GNUTELLA™ network protocol, is a file sharing technology, offering an alternative to web search engines used in the Internet, with a fully distributed mini-search engine and a file serving system for media and archive files that operates on an open-source policy of file sharing. Another commercial example is the MORPHEUS™ file sharing system. FIG. 1 (Prior Art) illustrates a P2P structure and searching in the GNUTELLA™ P2P network. In essence, each node (each circle symbol represents a computing device). Individual host nodes 101, 102, 103, and the like, store resources, e.g., a database of documents or other content. Moreover, each peer uses its own local directory structure to store its copy of each of the resources. Any peer can propagate a search request, or “query,” illustrated in FIG. 1 by arrows as broadcast by a first “Querying Peer” 101 to all of its “Neighbor Peers” 102. Note that a neighbor peer becomes the querying peer when it passes a search request on to its neighbors which are not in direct communication with the first Querying Peer 101, e.g., node 103. In other words, each peer not only searches its own directory for the resource-of-interest of the query, but broadcasts the query to each of its neighbor peers. While individual hosts are generally unreliable with respect to availability at any given moment, the resources themselves, i.e., the content being sought, tend to be highly available because resources are replicated and widely distributed in proportion to demand. Generally, however individual preferences are identified only by file name and file names are subject to the individual preferences of each node for its local directory structure. Thus, one specific problem is how to search intelligently and efficiently for relevant resources in a P2P network.
Again, it is common to store content data files at each peer's local directory structure simply by the given file name. For example, web sites such as NAPSTER™ music download site simply store data by a file name associated with the artist or specific song title, e.g., “artist name”, to facilitate searching. Simple descriptor queries thus get a very large number of unranked returns. In fact, even a web site search engine in a non-P2P system, such as the commercial GOOGLE™, ALTA VISTA™ and the like internet search engines, provides all return links potentially relevant to a query-namely, each and every file found which has a match to the query —which the user must then study for relevance to the actual interest intended and then visit serially those which actually may be authoritative. That is, all of these web search engines rely upon the existence of user information in the form of web pages containing links. Web search engines may provide ranking algorithms by which they measure the degree to which a web page answers a query (the authority of a given web page). All of these web search engines rely on the existence of user information to measure the authority of a given web page —for example, web pages containing links to a given web page, or terms occurring within the content of the web page, or web page links contained by the web page. This form of evaluation will not work for P2P systems that, due to the transient nature of the P2P network, do not support the concept of a link.
Another method, storage at a given node by random names in order to hide actual file identity, raises the problem of need for some form of central index that can be searched.
Another method is collaborative filtering where patterns of searches by like-minded searchers are analyzed and leveraged to produce allegedly more relevant results in response to a specific query. Such analysis inherently requires real time delays in providing an answer message to the query.
In general, existing solutions focus on locating every specific instance of each of the resources that is a potential match to the query. Thus, a replicated resource is likely to appear multiple times in responses to a specific query.
Moreover, none of these methods provide any ranking of the resources. In other words, there is no measure of authority as to how authoritative any particular peer is as to the resource-of-interest, e.g., what is the peer's resource capability with respect to the topic of “jazz music.”