1. Field of the Invention
The present application generally relates to data and query processing and, more particularly, to a method for supporting semantic matching queries in a database management system (DBMS) that stores the transitive relationships of an ontology in the DBMS, expresses semantic matching queries on the transitive relationship and instance data, and processes the semantic matching queries.
2. Background Description
Database management systems (DBMSs) have been used with great success in managing and manipulating huge amounts of structured data; however, there is a substantial gap between manipulating the semantic or domain knowledge that describes the data stored in a DBMS and the data itself.
For example, assume a given database of, for example, various companies and, for each, the locations of their branches and specific names for their fields of business. A “semantic query” means a query that cannot be answered using the information contained in that database alone, without a further domain of knowledge. As an illustrative example, a user may wish to search companies that have branches in countries having a per-capita income of under X dollars. Assuming that the example database of companies, their branch locations and fields of business does not have information as to the per-capita income of countries, the user could not successfully search the database for the desired information. There may be a further domain that may known to the user, but it is a separate database.
A particular example of where such information may exist is an “ontology,” which is an explicit specification of a conceptualization of a universe, or domain. Ontologies are known in various information fields and endeavors in the context of semantic web. The use of ontologies and ontology languages, such as the OWL Web Ontology Language, has attracted attention in the data processing arts. For many database applications, ontologies appear to be the best way to represent the domain knowledge of the data instances stored in the database. There is a very substantial problem, though, due to the information in the ontology, e.g., the per capita income of countries, being in separate form than the given XML database, such as the example identified above having the names, locations of branches, and specific fields of business in which each company operates.
Various systems have been developed for building and manipulating ontologies. For example, the Protégé ontology editor is a knowledge-based editor that allows the user to construct a domain ontology, customize data entry forms, and enter data. RStar is a resource description framework (RDF) storage and query system for enterprise resource management. Other ontology building systems include OntoEdit, OntoBorker, OntologyBuilder and OntologyServer and KAON (the KArlsruhe ONtology and semantic web tool suite) ontology management infrastructure. Systems often use a file system to store the ontology (e.g., OntoEdit). Others (e.g., RStar and KOAN) allow storing ontology using a relational database management system (RDBMS). However, queries on an ontology in these systems are typically processed by some middleware (wrapper) built on top of the DBMS engine. The two main disadvantages of this loosely-coupled approach are (1) ontology data cannot be accessed inside the DBMS directly and (2) the query processing and optimization power of DBMSs are lost when manipulating ontology data.
One approach to ontology management proposed in the database arts is a tightly-coupled solution by Oracle, as described by S. Das, E. I. Chong, G. Eadon, and J. Srinivasan in “Supporting Ontology-based Semantic Matching in RDBMS”, VLDB 2004, pp. 1054-1065 (“the S Das et al. approach”). The S. Das et al. approach proposes a method to support ontology-based semantic matching in RDBMS using a structured query language (SQL) directly. In this approach, ontology data are pre-processed and stored in a set of system-defined tables. A set of special operators is introduced to query/access the ontology and a new index schema is introduced to optimize query processing. A database user can thus reference the ontology data directly using the new operators. The S. Das et al. approach may be viewed as “tightly coupled” and, compared to the approaches typically characterized as “loosely-coupled” approaches, it may be viewed as providing some possibility of combining ontology query operators with existing SQL operators (such as joins with other data stored in relational tables).
The present inventors have identified, though, that there is an inherent “mismatch” between the relational schema employed by SQL and the hierarchical model of ontology data. The present inventors have also identified that, because of this mismatch, the S. Das et al. relational-model approach likely has inherent, substantial shortcomings in query processing efficiency.
For example, inferencing is one of the most fundamental and also most computationally expensive operations on ontology data. Previous approaches, including the S. Das et al. approach, typically require precomputing and materializing a significant quantity of inferencing results (i.e., transitive closures) to achieve reasonable performance at query time. These in turn impose processing burdens.
Stated more specifically, materializing inferencing results involves explicitly computing and storing information that can be derived from the ontology and database tables using logical inferencing. Further, transitive closure for a collection of instances of a relation R is the collection of all possible instances of the relation R that can be derived by applying the transitivity rule. This leads to significant expense and overhead in terms of time and storage at the preprocessing step. More significantly, with these approaches, updates of ontology data may be practically impossible once they are preprocessed.
Accordingly, the present inventors have identified a need for processing semantic queries on a combined resource of an XML database and an ontology having additional information pertaining to elements stored in the XML database but which is not in a native XML system.