As the use of computers to create and manage all types of data continues to grow, the ability to access and retrieve information on a particular topic rapidly becomes unmanageable. This phenomenon is evident within organizations of all sizes, and is particularly noticeable in environments where the sources of information are potentially infinite. With its vast array of computers that are inter-connected around the world, the internet best exemplifies this problem. To permit users to easily locate sources of relevant information, therefore, the use of search engines has become almost ubiquitous.
In general, two types of search engines are employed to find relevant data that could be located in a variety of places. One type of search engine analyzes and indexes the contents of the various information sources before a search is conducted. When a user requests a search on a particular topic, the search engine only needs to refer to the index in order to quickly locate relevant documents and the like. The other type of search engine analyzes the contents of the information sources at the time that the search is being conducted. Although this type of search engine exhibits slower performance, because the available information is not preprocessed, it has the capability to return more current information in an environment where the data is being updated on a relatively frequent basis.
Typically, either type of search engine functions to retrieve all documents or files that match criteria specified by the user. Depending upon the capabilities of the search engine, it may return only those documents which exactly match the criteria that has been specified, or it may return a larger collection of documents which are equivalent to, or otherwise related to, the documents that exactly match the search criteria.
In many situations, the number of documents which meet the user's request can be quite voluminous. For instance, a typical search that is conducted on the internet might return hundreds, or even thousands, of "hits", i.e., documents which match the user's search criteria. To assist the user in reviewing these documents, therefore, many search engines attempt to rank them according to their relevance.
One particular technique that is commonly used to rank documents relies upon the frequency of occurrence of criteria-matching information within a document. For instance, the number of times that a user-specified term appears in a given document can be compared to the total number of words in the document, to determine a relevance ratio. Using this approach, a single-page document in which the user-specified term appears several times will have a much higher ranking than a multi-page document in which the user-specified term appears only once or twice. After the relevance of each of the retrieved documents is determined, using such an approach, they are presented to the user in a manner indicative of their respective relevance rankings.
While this approach to the searching and presentation of documents assists the user in sorting through vast amounts of information, it does not always present the user with the information that is most relevant to his or her request. For example, a large document might contain an entire section that is devoted to the specific topic in which a user is interested. However, if that section forms only a small portion of the overall document, its relevance ranking could end up being relatively low. If the user operates on the assumption that only the most highly ranked documents presented by the search engine are likely to be of real interest, he or she may never get to the document which is, in fact, right on point.
Typically, users view the results of a search through some form of browser, which enables them to navigate between each of the documents that was located during the search. When the user selects a particular document from the list of those which have been retrieved, the browser displays the beginning of the document, normally the top half of the first page. Based upon the information contained in this displayed portion, the user may decide to look further into the document, or proceed to the next document that turned up in the search.
As is very often the case, the portion of a document which is relevant to a user's inquiry may not be evident from viewing the top half of the first page. This situation is particularly evident in an example of the type described above, in which a large document may contain an entire section devoted to the particular topic of interest. If that section is buried deep in the document, the user may never take the time to scan far enough into the document to discover this fact. Consequently, the user may end up missing the very document which is most relevant to his or her inquiry.
Accordingly, it is desirable to provide a system for searching and retrieving documents which is capable of identifying the portions of documents that are truly relevant to a user's request, regardless of the size of the document. Further along these lines, it is desirable to provide such a system in which the relevant portion of the document is immediately displayed to the user, thereby alleviating the user of the need to take the time to scan through voluminous documents to determine their possible relevance.