The present invention is directed to a method and system for searching on a computer. More particularly, the present invention is directed to a system and method for deriving an evidence set from a knowledge base.
The field of search engines is known. Known search engines include those developed by Verity, Inc., AltaVista, and Lycos. By implementing a search engine, a user can express with precision a focussed area of interest in order to retrieve needed information. Typically, a search engine retrieves documents satisfying the exact terms in a search query. For example, if the search query includes the term xe2x80x9cPDA,xe2x80x9d the search will not retrieve occurrences of xe2x80x9cpersonal digital assistant,xe2x80x9d xe2x80x9cpocket device,xe2x80x9d or other related terms. This produces under-inclusive results, meaning that documents containing relevant information are not retrieved. Often, however, it is difficult for a user to formulate a query capable of producing appropriately-inclusive results without existing knowledge of a subject area. This difficulty is especially prevalent when a lay user searches in subject areas containing technical terminology or jargon, which is unfamiliar to the lay user. For instance, when searching in the subject area of medical terminology, the lay user is more likely to employ everyday names for terms rather than the technical terms used by medical professionals. Even medical professionals may have difficulty in correctly spelling or recalling a proper medical term. Under-inclusive results also occur when relatively inexperienced users attempt to use search engines. For example, inexperienced users may fail to appreciate that certain search engines are case sensitive or require specific syntax.
Three approaches have been adopted to address under-inclusive results. The first approach employs manual query expansion. As noted above, if a search query is xe2x80x9cPDA,xe2x80x9d the search will not retrieve occurrences of xe2x80x9cpersonal digital assistant,xe2x80x9d xe2x80x9cpocket device,xe2x80x9d or other related terms. Users familiar with these related terms may manually expand the query by substituting xe2x80x9cPDAxe2x80x9d in the search query with xe2x80x9cxe2x80x98PDAxe2x80x99OR xe2x80x98personal digital assistantxe2x80x99 OR xe2x80x98pocket devicexe2x80x99xe2x80x9d. This query uses the logical OR operator and would retrieve those documents containing at least one of these terms. Manual query expansion, however, requires user knowledge of related terms. In addition, manual query expansion requires excessive user input. For instance, if a user manually expands the same query term and wished to repeatedly conduct the search, the user must reenter the same related terms each time the query is submitted. Finally, users must have working knowledge of the search engine syntax and the controlled vocabulary of the subject matter that is being searched.
The second approach to address under-inclusive results employs meta tagging. To implement meta tagging, the author of a document inserts metadata, also known as metainformation, into the contents of document itself or otherwise associates it with the document. Metadata is data that describes other data. For example, an author of a web page on the Internet""s World Wide Web may insert meta tags into the source code of the web page. Typically, the meta tag is invisible to those viewing the web page with a traditional browser, such as Netscape Navigator, but is present in the source code and visible to search engines. Meta tags are usually words and phrases, which are related to the content of the web page, but do not exist in the text of the web page visible to the user. For example, when a search engine searches for xe2x80x9cPDAxe2x80x9d on the World Wide Web, the search engine retrieves documents containing xe2x80x9cPDA,xe2x80x9d if xe2x80x9cPDAxe2x80x9d is either in the meta tag or the contents of the document. One disadvantage to meta tagging, however, is the investment required by authors to insert meta tags in each document. Moreover, once a document is created, it is time-consuming to modify the meta tags; each document must be reopened to edit the meta tags. Also, since meta tag information is inserted into each document there is an increased likelihood of a data entry error in the spelling or format of the meta tag information. In addition, the meta tag vocabulary might change, thus requiring a modification to all documents containing the meta tag information. Finally, meta tagging requires knowledge of the content of the web page. In many instances the author of a web page is a web page developer, who is developing the web page for others that are familiar with the content. Thus, meta tagging often requires coordination between a web page developer and those familiar with the content of the web page.
The third approach to address under-inclusive results employs evidence sets. An evidence set contains evidence, which constitute phrases or terms. The evidence is organized into topics. This knowledge is organized, typically in a hierarchical structure or taxonomy, and made available as a shared resource to users. An evidence set is employed by an application, such as a search engine, by incorporating knowledge about topics and associated phrases. One company, Sageware, Inc., has developed a number of KnowledgeSets, which are functionally similar to evidence sets, for specific subject areas. See SAGEWARE, INC., Our Products: Sageware KnowledgeSets (accessed on Mar. 21, 1998; copyright 1997)  less than http://www.sageware.com/products.html greater than . One use of evidence sets is for query expansion. In contrast to manual query expansion, query expansion with evidence sets does not require a manual substitution of related terms for each query. Rather, the search engine may automatically access the contents of the evidence set to automatically expand the search query.
Known methods for creating evidence sets require extensive user input. Other methods for learning evidence sets exist, however, it is known that evidence sets generated with learning algorithms on training data typically produce inferior quality evidence sets. In addition, known methods for creating evidence sets often produce evidence sets that are difficult to modify. Typically, methods for creating evidence sets include the use of either a standard text editor or a graphical user interface (GUI). An evidence set may be created with a text editor by inputting text and symbols in accordance with a known evidence set format. As evidence sets generally require a specific syntax, text editor creation has the disadvantage that minor inadvertent input errors may create an improperly formatted or non-working evidence set. For instance, a misplaced symbol or term may inadvertently change the relationship between evidences or topics in an evidence set. Because the syntax of evidence sets is often cumbersome, a user cannot readily apprehend when mistakes have occurred. Moreover, once an evidence set has been created with a text editor, it is relatively difficult to modify its structure. A text-edited modification requires reentry of evidences in the evidence set to comport with the newly-modified structure. Also, creating an evidence set with a text editor requires a user with working knowledge of the syntax of the evidence set. In addition, a user may create an inconsistent evidence set. For instance, a user may create a text-edited evidence set with multiple occurrences of the same topic. Moreover, using a text editor to create an evidence set, each topic may have a different set of evidences. This could create an internal inconsistency in the evidence set and result in an evidence set that is non-functioning or, at the very least, capable of producing inconsistent results. Finally, when making changes to a text-edited evidence set, a regression test must often be performed to fully understand the impact of changes to the evidence set.
A second known method for creating evidence sets employs GUIs. Such a method, developed by Verity, Inc., is topicEditor. VERITY, INC., Introduction to Topics Guide V2.0 (copyrighted Sep. 23, 1996; visited Mar. 21, 1998)  less than http://www.verity.com/support/s97dk/topic20/topcover.htm greater than discloses the use of topicEditor. In topicEditor users create topics and evidences in a hierarchical GUI environment, which allows users to expand and collapse topics, copy or move topics using drag and drop, and re-use topics by selecting them from a drop-down list. Once a topic is created in topicEditor, a user may generate topic sets, which are functionally similar to evidence sets. These topic sets may be stored in a knowledge base. Typically, these types of knowledge bases only include information that is represented in the GUI environment. For instance, a GUI-created knowledge base typically contains only information that relates to the hierarchical structure of the topics and evidences. Typically, for any given GUI-created knowledge base there exists only one corresponding evidence set. Finally, modification of a GUI-created knowledge base requires excessive manipulation of the GUI environment.
The present invention is directed to a method for searching on a computer. In accordance with the method of the present invention, a knowledge base is generated, which includes information, an evidence set is specified to include a proper subset of the information, and an evidence set is derived from the knowledge base.
In accordance with another aspect of the invention, the knowledge base further includes a first entity and a second entity with a description logic relationship existing between the first entity and the second entity.
In accordance with one other aspect of the invention, the knowledge base further includes a first class expression and a second class expression with a rule-based relationship existing between the first class expression and the second class expression.
In accordance with yet another aspect of the invention, the knowledge base includes a role, which defines authorship.
In accordance with another aspect of the invention, the knowledge base includes a class, which includes metainformation.
In accordance with another aspect of the invention, the knowledge base includes a class, which defines visibility.
In accordance with another aspect of the invention, the knowledge base includes necessary and sufficient conditions.