1. Field of the Invention
The present invention relates to solution automation of inventor and user problems, and more particularly, to using semantic methods of information and knowledge representation and processing for solving such problems.
2. Related Art
Solving inventor problems and technical problems of a user may require, first of all, good information support, i.e. operative access to information or knowledge. Good information support can answer how to solve the problem or can facilitate providing information related to the solution of the problem (for example, information in another knowledge domain, in the same knowledge domain but in other type of system, etc.) that is able to point to the inventor or user the needed direction for the solution search. Conventionally, computer based information retrieval may be performed by means of a search engine.
In unsophisticated information retrieval systems, a search may be performed by searching for the presence of key words (inputted by the user) in documents contained in a database. This kind of search may be characterized by low precision and recall. Modern information retrieval systems should provide the user the possibility of formulating a query in natural language, i.e. the systems should have a natural language user interface. Then, the automatic linguistic analysis of the query is performed and its formal representation created. The linguistic analysis can be performed at different levels of depth of natural language. This analysis, in an ideal case, should include the semantic level. It is important to recognize not only relations between different elements of the query (usually, the most informative elements), but as well the relations between query elements and the corresponding components from an outer world model or a certain knowledge domain. Thus, it became desirable to use semantic relations between concepts, described in models of knowledge representation such as a thesaurus or ontology, to improve information retrieval system performance in different manners in various applications.
Ontology is a hierarchical lexical structure where concepts expressed by words or word-combinations are defined and are linked with semantic relations. Ontologies can be domain-specific or general depending on the terms they describe and attempt to reflect a human's knowledge about the specific domain or the surrounding world. Since ontology represents a valuable and extensive set of data, ontology can be successfully used in information retrieval to improve the precision and the recall of search results.
Some information retrieval systems like the system described in U.S. Pat. No. 6,675,159 B1 (“the '159 patent”), the contents of which is incorporated herein by reference in its entirety, use ontologies to index collections of documents with ontology-based predicate structures. The system of the '159 patent extracts the concepts behind user queries to return only the documents that match these concepts. The system has the capabilities of an ontology-based search system and it can search for logically structured groupings of items from the ontology. For example, from an exemplary query “What is the current situation of the stock market?” an attribute extractor extracts direct attributes “current”, “situation”, “stock”, and “market” from the query. The attribute extractor can also, e.g., expand attribute “stock” to “finance”, “banks”, “brokerages”, “Wall Street”, etc. by using an ontology that contains hierarchically-arranged concepts.
The information presented in a knowledge base search and retrieval system, described in U.S. Pat. No. 5,940,821, (“the '821 patent”) and the related document knowledge base research and retrieval system, described in U.S. Pat. No. 6,460,034 B1, (“the '034 patent”) (the contents of both of which are incorporated herein by reference in their entireties) use a knowledge base (that stores associations among terminology/categories that have a lexical, semantic or usage association.) for document theme vectors identification (by inferring topics from terminology of a document), document classification in categories and are also able to retrieve a relevant document in response to a query by expanding query terms and the theme with the help of the knowledge base. The '034 system includes factual knowledge base queries as well as concept knowledge base queries. The factual knowledge base queries identify, in response to a query, the relevant themes, and the documents classified for those themes. In contrast, the concept knowledge base queries do not identify specific documents in response to a query, but identify the potential existence of a document by displaying associated categories and themes.
The content processing system of the '821 and '034 patents include a linguistic engine, a knowledge catalog processor, a theme vector processor, and a morphology section. The linguistic engine, which includes a grammar parser and a theme parser, processes the document set by analyzing the grammatical or contextual aspects of each document, as well as analyzing the stylistic and thematic attributes of each document. Specifically, the linguistic engine generates, as part of the structured output, contextual tags, thematic tags, and stylistic tags that characterize each document.
The knowledge base of the '821 and '034 patents is used to generate an expanded set of query terms, and the expanded query term set is used to select additional documents. To expand a query term using the knowledge base, the levels or tiers of the classification hierarchy as well as the knowledge base associations are used to select nodes within predefined criteria. In one embodiment, the query term strength is decreased based on the distance weight (e.g. query term weights are decreased by 50% for each point of semantic distance when expanding either to a more general category, (e.g. a parent category) or to an association), and all nodes with a resultant query term weight greater than one are selected. All child categories and terms beneath a node are selected.
However, the system of the '821 and '034 patents is oriented mainly to theme vector identification. The '034 system requires the documents from the retrieved database to be indexed with special contextual, thematic and stylistic tags and the query terms expansion based on ontology is used to retrieve additional documents taking into consideration its theme vector.
Ontology also is conventionally applied in database management systems. In International patent application publication No. WO-2003/030025A1 (“the '025 publication”), the contents of which is incorporated herein by reference in its entirety, the database management system uses ontology to solve the problems of semantic heterogeneity, and semantic mismatch and query integration against distributed resources. The proposed solution to the problems of semantic heterogeneity is to formally specify the meaning of the terminology of each system using ontologies (shared and personal ones). Thus, the system of the '025 publication provides a distributed query solution for a network having a plurality of database resources. The network helps users to make queries which retrieve and join data from more than one resource, which may be of more than one type such as an SQL or XML database.
Consequently, ontology is used in the system of the '025 publication to disambiguate terminology vagueness while retrieving information from different heterogeneous information resources.
In U.S. patent application Publication No. 2002/0107844 A1 (“the '844 publication”), the contents of which is incorporated herein by reference in its entirety, ontology is referred to be used in an information generation and retrieval system as an instrument that helps to build semantic representation of the sentence in the form of a conceptual graph. During the information request procedure, a natural language query of a user is transformed to the conceptual graph by analyzing sentence structure and semantic structure and then a conceptual graph in a database, which is nearest to the conceptual graph of the query with respect to sense is searched and semantic appropriateness is computed to display information indexed by the searched conceptual graph to the user.
Thus, the application of ontology in information retrieval implies conceptual graph building as well as the query and database conceptual graph comparison.
The method and apparatus for active information discovery and retrieval, described in U.S. Pat. No. 6,498,795 B1, (“the '795 patent”), the contents of which is incorporated herein by reference in its entirety, use an active network framework and an ontology-based information hierarchy for semantic structuring and automated information binding, and provide a symmetrical framework for information filtering and binding in the network. Queries from information requesters are directly routed to relevant information sources and contents from information providers are distributed to the destinations that expressed an interest in the information.
The method of the '795 patent implies creating content ontology instance trees and query ontology instance trees on each of the active network nodes. Active networks architecture and an ontology-based information hierarchy are used as the network and the semantic frameworks respectively. The system uses simple hypertext markup language (HTML) ontology extensions (SHOE). When a SHOE instance makes specific claims based upon a particular ontology, a software agent can draw on that particular ontology to infer knowledge that is not directly stated. The ontology provides context as implicit knowledge. SHOE tags allow defining new ontologies based on existing ones. The search operational model is applied on any part of sub-hierarchy of the ontology instance tree. Special coefficients are calculated to determine the probability of the child nodes of ontology to be accessed with the parent node of ontology.
Hence, ontology in the '795 patent is used for semantic structuring of the retrieved information, which implies previous annotation with ontological tags (using SHOE, both automatically or manually) of the information resources, and only then it is possible to retrieve information based on the ontology relations represented by SHOE tags.
In U.S. patent application Publication No. 2002/0116169 A1(“the '169 publication”), the contents of which is incorporated herein by reference in its entirety, a method and apparatus for generating normalized representations of strings is described. Ontologies, thesauri, and terminological databases are used therein as means for normalization of semantic representation of the string.
The described method of the '169 publication attempts to increase the retrieval performance of information retrieval systems by suggesting use of ontology to semantically normalize query and database strings.
An ontology-based information management system and method, described in U.S. patent application Publication No. 2003/0177112, (“the '112 publication”), the contents of which is incorporated herein by reference in its entirety, uses ontology to provide semantic mapping between entries in a structured data source, and concepts in an unstructured data source and includes processes for creating, validating, augmenting, and combining ontologies for life sciences, informatics and other disciplines. The system of the '112 publication proposes to use an ontology to enable effective syntactic and semantic mapping between mapping entities discovered using concept-based text searching, and those derived from data warehousing and mining in a plurality of disciplines.
The system of the '112 publication may evaluate the distance between a pair of terms in a given information space using an information retrieval engine capable of categorizing large document sets.
Nevertheless, the proposed method of the '112 publication is mainly oriented to managing information sources based on ontologies, which help to integrate structured and unstructured data. The information sources are the source of creating new ontologies combining them, etc. The information retrieval engine is based on the categorization of the data.
Ontology is also used for query expansion. In U.S. Pat. No. 5,822,731 (“the '731 patent”), the contents of which is incorporated herein by reference in its entirety, a semantic network is applied to maximize the number of relevant documents identified during a query search by semantically expanding the search in response to the part of speech associated with each query term in the search.
In U.S. patent application Publication No. 2001/0003183 A1, (“the '183 publication”), the contents of which is incorporated herein by reference in its entirety, a method and apparatus for knowledgebase searching is described. Ontologies are an integral part of this system. A library of query templates and a dictionary that relates keywords to more abstract concepts are first prepared on a computer system. Each template contains one or more typed variables. A query is then generated by entering into the system one or more keywords. Each keyword is abstracted to concept (using different thesauri and ontologies). Each concept may be further refined by additional abstraction, or by picking one concept from several candidates, or by successive abstraction and rejection of different keywords until an acceptable concept is found. Next, for the concepts that are obtained, the system finds all query templates are then instantiated with those concepts or with the keywords used to form the concepts. The user then selects the most appropriate query from among the instantiated query templates. The system of the '183 publication may be applied in formulating queries to access any set of information sources. The '183 publication system is particularly useful to access distributed, heterogeneous databases which do not have a single standardized vocabulary or structure.
In fact, the latter three above-mentioned methods represent key word search expansion by means of ontology with different variations.
The method and device for supporting information retrieval by using ontology, and storage medium recording information retrieval support program, described in JP-2000222436, (“the '436 publication”), the contents of which is incorporated herein by reference in its entirety, is designed to provide an information retrieval supporting method. The method is capable of dynamically preparing a database selection menu for selection of a database suited for retrieving information required by a user. The solution suggested by the author of the '436 publication is ontology describing the concept system of information managed by a database as tree structure of information concepts from a higher degree of abstraction to a lower degree of abstraction and a database selection menu for specifying the concept of information required to be retrieved by a user is dynamically generated by presenting concepts registered in the ontology stepwise from the higher degree of abstraction to the lower degree of abstraction.
Briefly, the method of the '436 publication suggests using ontology, reflecting database content, to help a user to specify the concept that is searched by refinement or generalization of the concept.
The method and system for query reformulation for searching of information is described in U.S. patent application Publication No. 20020147578 A1(“the '578 publication”), the contents of which is incorporated herein by reference in its entirety. The method provides reformulating the query by eliminating one or more non-interesting terms using semantic and syntactic information for one or more of the terms; and querying a database of information based upon the reformulated query. Numerous interrelated dictionaries, thesauri and ontologies are used in the course of processing each question.
Hence, ontology in the system of the '578 publication is a part of system that reformulates a query by eliminating non-informative terms.
Ontology is also applied in information retrieval systems to rank query feedback terms, as is described in U.S. Pat. No. 6,363,378 B1(“the '378 patent”), the contents of which is incorporated herein by reference in its entirety. The information retrieval system processes the queries, identifies topics related to the query as well as query feedback terms, and then links both the topics and feedback terms to nodes of the knowledge base with corresponding terminological concepts. At least one focal node is selected from the knowledge base based on the topics to determine a conceptual proximity between the focal node and the query feedback nodes. Hierarchical relations from ontology are used to calculate semantic proximity between focal categories and query feedback terms. The query feedback terms are ranked based on conceptual proximity to the focal node.
Thus, in the '378 patent's information retrieval system, ontology is used for topic identification in the knowledge base and in the query and then for calculating the semantic proximity between query feedback terms and the node chosen from the database on the basis of determined topic.
Therefore, the idea of using ontology to improve information retrieval system performance is not new and it is disclosed in different manners in various patents. For example, some of the different manners disclosed include searching in structured and unstructured databases, document theme or topic identification, normalization of semantic representation of the string, search and integration of different types of data, query expansion, etc. As far as ontology use in query expansion is concerned, ontology is applied, generally, to expand keyword-based and concept-based search and hierarchical relations from ontology, and may be preponderantly used in a certain knowledge domain.