1. Field of the Invention
The present invention is directed toward the field of search and retrieval systems, and more particularly to a knowledge base search and retrieval system.
2. Art Background
In general, search and retrieval systems permit a user to locate specific information from a repository of documents, such as articles, books, periodicals, etc. For example, a search and retrieval system may be utilized to locate specific medical journals from a large database that consists of a medical library. Typically, to locate the desired information, a user enters a "search string" or "search query." The search query consists of one or more words, or terms, composed by the user. In response to the query, some prior art search and retrieval systems match words of the search query to words in the repository of information to locate information. Additionally, boolean prior art search and retrieval systems permit a user to specify a logic function to connect the search terms, such as "stocks AND bonds", or "stocks OR bonds."
In response to a query, a word match based search and retrieval system parses the repository of information to locate a match by comparing the words of the query to words of documents in the repository. If there is an exact word match between the query and words of one or more documents, then the search and retrieval system identifies those documents. These types of prior art search and retrieval systems are thus extremely sensitive to the words selected for the query.
The terminology used in a query reflects each individual user's view of the topic for which information is sought. Thus, different users may select different query terms to search for the same information. For example, to locate information about financial securities, a first user may compose the query "stocks and bonds", and a second user may compose the query "equity and debt." For these two different queries, a word match based search and retrieval system would identify two different sets of documents (i.e., the first query would return all documents that have the words stocks and bonds and the second query would return all documents that contain the words equity and debt). Although both of these query terms seek to locate the same information, with a word search and retrieval system, different terms in the query generate different responses. Thus, the contents of the query, and subsequently the response from word based search and retrieval systems, is highly dependent upon how the user expresses the query term. Consequently, it is desirable to construct a search and retrieval system that is not highly dependent upon the exact words chosen for the query, but one that generates a similar response for different queries that have similar meanings.
Prior art search and retrieval systems do not draw inferences about the true content of documents available. If the search and retrieval system merely compares words in a document with words in a query, then the content of a document is not really being compared with the subject matter identified by the query term. For example, a restaurant review article may include words such as food quality, food presentation, service, etc., without expressly using the word restaurant because the topic, restaurant, may be inferred from the context of the article (e.g., the restaurant review article appeared in the dining section of a newspaper or travel magazine). For this example, a word comparison between a query term "restaurant" and the restaurant review article may not generate a match. Although the main topic of the restaurant review article is "restaurant", the article would not be identified. Accordingly, it is desirable to infer topics from documents in a search and retrieval system in order to truly compare the content of documents with a query term.
Some words in the English language connote more than a single meaning. These words have different senses (i.e., different senses of the word connote different meanings). Typically, prior art search and retrieval systems do not differentiate between the different senses. For example, the query "stock" may refer to a type of financial security or to cattle. In prior art search and retrieval systems, a response to the query "stock" may include displaying a list of documents, some about financial securities and others about cattle. Without any further mechanism, if the query term has more than one sense, a user is forced to review the documents to determine the context of the response to the query term. Therefore, it is desirable to construct a search and retrieval system that displays the context of the response to the query.
Some prior art search and retrieval systems include a classification system to facilitate in the location of information. For these systems, information is classified into several pre-defined categories. For example, Yahoo|.TM., an Internet directory guide, includes a number of categories to help users locate information on the World Wide Web. To locate information in response to a search query, Yahoo|.TM. compares the words of the search query to the word strings of the pre-defined category. If there is a match, the user is referred to web sites that have been classified for the matching category. However, similar to the word match search and retrieval systems, words of the search query must match words in the category names. Thus, it is desirable to construct a search and retrieval system that utilizes a classification system, but does not require matching words of the search query with words in the name strings of the categories.