Due to rapid advances made in electronic storage technology, it is becoming ever more convenient and economically attractive to store information electronically as a series of digital bits of data. As such, "texts" from magazines, newspapers, journals, encyclopedias, books, and other printed materials are increasingly being classified and grouped together into various databases. These texts can be comprised of miscellaneous strings of characters, sentences, or documents having indeterminate or varied lengths and can be of a wide variety of data classes, such as words, numbers, graphics, etc. Computers are then utilized to access these databases in order to store additional new text and to retrieve old, stored texts. One added advantage of electronically storing information is that computers can be programmed to search and retrieve specific texts in a database which is of special interest to the user. In essence, a computer can perform indexing functions, such as a card catalog. A user can retrieve a particular text by inputting the title, author, date of publication, or some other description specific to that text. In response, the computer can automatically search, retrieve, and display the desired text.
However, if the user does not know of a specific text or wishes to conduct research on a general subject matter, the computer can be programmed to select certain text which might be of significance to the user. Prior art search and retrieval systems have typically accomplished this by focusing on "keywords" or query terms. A user who wishes to find texts of a particular nature, first specifies one or more keywords which might be contained in the desired texts. Typically, each text in the database is assigned a unique reference number. All words in the text, except for trivial words such as "a," and "the," etc., are tagged with the unique reference number and are placed in an alphabetical index. Hence, all texts in the database containing a given keyword are located by searching for that keyword in the alphabetical index and returning a set of reference numbers. Thereby, texts corresponding to the reference numbers are known to contain the keyword and are accessed via the computer.
In order to provide the user with greater flexibility, many prior art search and retrieval systems provide for "Boolean" searches. A Boolean search involves searching for documents containing more than one keyword. This is typically accomplished by joining the keywords with conjunctions such as the exclusive "AND" function and/or the inclusive "OR" function. If two or more keywords are joined by an AND, only those texts which contain all those joined keywords are retrieved. If two or more keywords are joined by the inclusive "OR" function, all texts which contain at least one of the joined keywords are retrieved. For example, given that a user specifies a search for (keyword 1 AND keyword 2) OR keyword 3, the computer retrieves all texts containing keyword 3 plus those texts containing both keyword 1 and keyword 2. Two examples of this type of text retrieval system are the LEXIS.TM. and Dialog.TM. systems.
Even though computerized search and retrieval systems greatly facilitate a user in locating relevant texts, there yet remains many disadvantages with these systems. One disadvantage of this type of prior art search and retrieval method is that the user is required to anticipate one or more keywords used to identify and distinguish relevant texts. In other words, the user must guess the words used by the author of a desired text. This problem arises because a user typically does not have advance knowledge of how the texts of interest are worded. If a user fails to guess appropriate keywords, highly relevant text might be missed.
Another disadvantage with typical prior art search and retrieval systems is that picking significant keywords is a tricky and delicate operation. If a keyword is too common and/or if a user utilizes an inclusive OR function to join multiple keywords, a search request can potentially result in the retrieval of hundreds of text satisfying the broadly defined search criteria. Often, only a small handful of text among the hundreds of retrieved texts is of actual interest to a user. The user must then expend much time and energy to tediously scan each text and winnow out the truly relevant texts from the vast pool of retrieved texts. Conversely, if the keyword is too specific or if the exclusive AND function is used to join multiple keywords, the search might be too restrictive. Highly relevant text which did not meet the specific keyword criteria will not be retrieved. Hence, a user frequently chooses different keywords and conjunctions in a costly and time-consuming iterative process to tailor the search request. Consequently, operating typical prior art search and retrieval systems require skill, training, and expertise.
Therefore, what is needed is an apparatus and method for determining and ranking the significance of each retrieved document so that a user can broaden the scope of a search to catch any relevant text without being unduly burdened by having to wade through inconsequential texts. It would be highly preferable for the same apparatus and method to also provide a mechanism to easily and naturally navigate between texts dealing with related subject matter.