In recent years, computer search systems have become heavily utilized and various search systems compete to provide relevant and rapid results. Since user satisfaction depends upon both speed and relevance, search system developers strive to improve search system speed and performance.
Currently, search engines formulate an estimate of a document's relevance to any arbitrary query. Search engines strive to show relevant documents and eliminate irrelevant documents. The ordering of documents by relevance in a searchable index improves the performance of the search system. With currently implemented search systems, when implementing a searchable index, the search engine assumes that documents beyond a certain point will become less relevant.
One known relevance determination technique for determining the relevance of an information source involves counting the number of links or citations contained within the information source. This technique may be useful in a network containing relatively uniform types of information sources. In such a uniform system, it may be reasonable to assume that an information source often cited by other information sources is of greater relevance than a less frequently cited information source.
This technique may be implemented by incorporating all information sources in a network in a graph. If the graph represents information sources, such as documents on the world wide web, a node may be provided to represent each document and an edge may represent each hyperlink between two documents. Initially, every node may be assigned an equal weight. Based on how many links connect one node to another, weights shifts. After multiple iterations, shifting of weights will be complete and prior relevance of a node can be determined. When an edge points to a node having no outlinks, its weight will be re-distributed back into system of linked documents as a whole by a junk vector or reset vector. The default junk vector may assign a weight equal to (1/number of sources in the system) to each node.
The above-identified algorithm does not consider document content in its relevance determination. Accordingly, in the context of the World Wide Web, due to such factors as spam and web page proliferation, the algorithm has become less effective. Web page proliferation has included a large increase in category specific pages. Accordingly, in order to improve on results and to consider the proliferation of category specific web pages, a system has been developed that pre-seeds category specific pages before running the page rank algorithm. For instance, the system might initially rank some page categories, for example sports, news, or politics, higher than other pages and subsequently execute the above-identified algorithm. This system can find prior rank of given document based on category.
A problem with these existing solutions is their purely forward-looking nature. Existing solutions move forward and consider outgoing links from a node, but do not look backwards in the linked network or consider incoming links. Furthermore, existing solutions fail to take advantage of known information in order to categorize documents. For example, existing solutions fail to consider whether links move from one domain to another. Furthermore, existing solutions fail to filter out undesirable items belonging to pre-selected categories, such as for example pornography and hate information sources. Thus, a solution is needed for determining initial relevance of a document with respect to a given category while considering contextual information such as category and domain.