1. Field of the Invention
The invention relates to a method for quickly searching and ranking related documents in a database based on user input queries.
2. Related Art
A search engine allows input of a plurality of keywords and is capable of finding a document that contains the keywords, where binary searching provides an efficient way to quickly match a keyword in a sorted list such as finding a name of a person in a phonebook. However, the binary searching is not applicable to normal documents where their contents are not sorted. Ranking is a method for scoring documents based on matched keywords and then displaying these documents in an ordered manner according to their score, which would require extensive computations in order to provide an accurate score that would closely related to relevant documents. Since there is a time constraint requirement (i.e, not too long) to present result of relevant documents to a user, there exists a need for a method that provides quick searching and ranking of relevant documents according to user queries.
An object of the invention is to describe a method for scoring relevant files more accurately. Another object is to utilize advantages of binary searching for quickly eliminating unrelated documents and then scoring the remaining documents based on the scores of paragraphs that are most relevant to queries for each document. The scoring process is then used by the ranking process to present the result of ranking documents in an ordered manner from a highest score to a lowest score on a display.
While the below described method would be more suitable to be utilized by entities or organizations such as IEEEs, Patent Offices or the likes, where only a very small portion of a database is added and updated daily, and the whole database is used extensively in a daily basis, the method would be applicable to web crawlers, libraries for searching old archive files that have been sorted in advance, noting that these old archive files would substantially grow in size with time, thereby increase searching time.
In the following, the query, content, word or keyword as described would refer either to a number, character, symbol or a combination thereof. The distance between two keywords would refer to the difference in index locations of the two keywords. For example, the distance between two keywords, “Amy” and “Virginia”, in the phrase “Amy is living in Virginia” would be 4.