Searching is the most popular way to get useful information from the web and enterprise networks. For web page search, a most famous and effective algorithm is Google's PageRank method, which is to calculate the web page's importance via hyperlinks among the huge set of web pages on the web. The main principle of Page rank algorithm is that, if a page is pointed by many pages, then it indicates this page is a good page; on the other hand, if an important refers to another page, then the other page is also important. The PageRank method has been used in Googles search engine, which has been proved to be the best search engine at present. The PageRank method was invented by Google's founders Larry Page and Sergey Brin while at Stanford University in 1998, and has been patented as U.S. Pat. No. 6,285,999.
An alternative to the PageRank algorithm is the HITS algorithm proposed by Jon Kleinberg. The HITS proposes two types of web pages. One is a hub page containing a lot of web pages linked by the same subject, and the other is an authority page whose content corresponds to a related subject. The HITS algorithm presumes that a good hub page points to many good authority pages, and a good authority page is a web page pointed to by many other web pages. Hub pages and authority pages exhibit a mutually reinforcing relationship, i.e. a better hub page points to many good authority pages, and a better authority page is pointed to by many good hub pages.
A critical factor for applying these above algorithms is the hyperlinks between web pages. But as to enterprise internal search, there exists a big problem. As we all know, unlike web-based documents, a plurality of documents such as enterprise internal documents are not usually interlinked, thus a search engine technology based on link analysis is not applicable. This is one of the reasons of the inefficiency in enterprise internal document search.
Therefore, there is a need for a method and system for conducting a document search high efficiently, particularly a method and system for an enterprise internal document search.