With the continuous development of computer technologies, people rely on computer systems (including computer networks) increasingly for storage of huge information. Search engines that are widely used nowadays aim to facilitate users' retrieval from a large amount of information so as to obtain helpful information conveniently and rapidly. In the field of information retrieval, search engines have achieved great success, and many useful technologies has been developed and adopted. Among them, various technical improvements and optimizations of search engines are directly reflected in search ranking.
An important search ranking technology is to rank web pages by using hyperlinks between such web pages, such as the PageRank algorithm proposed in 1998 by Sergey Brin and Lawrence Page—the founders of Google and the HITS algorithm proposed by J. Kleinberg in the same year. The basic principle of PageRank is to utilize the link relationship between web pages so as to calculate importance of the web pages, i.e. authority scores of the web pages. The PageRank algorithm follows two primary premises: a web page, which has been cited many times, might be very important; and a web page, which has not been cited many times but has been cited by (a) important web page(s), might also be very important. The Google search engine calculates the PageRank score of a web page with the PageRank algorithm, and accordingly determines the position where a web page appears in a set of search results. Higher the PageRank score of a web page, higher the position of this web page in the results. With the application of the PageRank algorithm, conventional search ranking methods are improved, accuracy in search results is increased, and the average time a user spends on finding his actually expected web page(s) is significantly shortened.
However, such a search ranking method based on link analysis is not suitable for a file system. The main reason is that, in a file system, no association which is similar to web link exists among files. In addition, although the dataset of a file system is not as large as that of the web, the data type in a file system is much more than that in the web.
Currently, search ranking in a file system is mainly implemented with a keyword-based search method. The basic principle of a conventional keyword-based search method is that, the search engine first analyzes contents of a document, extracts keywords in the document, counts up frequency and position where a specific keyword appears in the document as well as the number of documents containing this keyword in an entire set of documents, and creates index for these information. After a user inputs a query, the search engine first analyzes the query request, finds corresponding documents containing each keyword in the index, then calculates a final relevance score with respect to the query for each document, and finally ranks the documents in accordance with the magnitude of the final relevance scores and returns the ranked result to the user. Such a method is difficult in that, in most cases, a user's query request cannot be precisely described with very simple keywords and the accuracy of its search results is relatively low due to the limitation of natural language understanding techniques and the method for calculating final relevance score.
Besides the conventional keyword-based search method, a search log analysis method can also be utilized to make further improvements on search results in search ranking of a file system. Based on the user's feedback and operation on search results, including inputting query words and clicking history, the search log analysis method can make further analysis on a user's search interest, track the user's searching characters so as to improve search effects.
However, both the conventional keyword search methods and the search log analysis method neglect the impact of the tree structure of a file system on search ranking, and with both methods, potential relations between files are not reflected in ranking.
It can be seen that the prior art fails to provide users with a search ranking scheme which is suitable for the structural characteristics of a file system and which can dynamically adapt to user's interactions.