With the rapid development of information technology, information integration has inevitably been one of important tasks. Moreover, “loosely coupled” information integration on heterogeneous data sources benefits enterprises by greatly reduced deployment cost and an ability to respond much more quickly to changing business requirements. To enable interoperability and integration of information systems, explicitly attaching the data with semantics has been a matter of common sense. Ontology, a hierarchically structured set of terms for describing a domain, is considered as the most efficient approach to impose semantics on data at present. In fact, W3C recently recommended Resource Description Framework (RDF) as a core data model and Web Ontology Language (OWL) as an ontology representation language for the Semantics Web.
Ontology schema such as a Resource Description Framework Schema (RDFS) is a kind of shared domain knowledge generalized by domain experts. Generally speaking, it could be widely applied into or populated by varieties of data in a certain application domain. For example, FIG. 1 shows a simple ontology example in the securities business. The upper part of the figure refers to ontology schema or RDFS in semantic web, which describes the terms used in the domain and hierarchical structure (or taxonomy) among them. The bottom part characterizes a general example, in which the data populates the ontology schema.
However, it is not guaranteed that the generalized ontology schema by domain experts might totally satisfy varieties of application requirements in a certain domain. Supposing that an application scenario needs to query the times of exchanging shares and the amount of money the trades involved, we would require different data from that in the above general example. FIG. 2 shows a conflict occurring between shared ontology schema and application data in the domain of securities business. There exist 3 independent transactions of exchanging share between a shareholder of “ID1” named “David Johnson” and a listed company of “ID2” titled with “IBM”. Although each of the 3 transactions surely stands for a kind of “Shareholding” relationship owned by “ID 1”, they can not populate the ontology schema with a complete context. The feature is intrinsic to ontology that it does only allow the classes defined in ontology schema to contain their properties. Therefore, a property itself can not be populated with varieties of data instances.
In order to clear the conflict between existing ontology schema and application data, e.g., the scenarios occurring between “ID1” and “ID2” in FIG. 2, only three ways would be intuitively considered.
The first way is totally ignore the ontology schema and fully rely on the data schemas There have been a few data integration tools to extract RDFS from data schemas from relational database, the eXtended Markup Language (XML), and so on. By them, we can completely extract ontology schemas from a database, which is perhaps very different from the well-known ones in the domain. A lot of limitations have been identified in this method. For example, the data schemas must be very well organized to suit to the application requirements for fear that redundant and faulty classes and properties are extracted. However, the demand is too rigid to be satisfied in most cases. Furthermore, those well-known ontology schemas have undergone severe training and hammering in the domains. Even when they can not fully cover all requirements in a certain application, it is still reasonable to continue using and extending them instead of replacing them.
The second is to ignore the detailed information like trading time and amount of money contained in data. By ignoring this detailed information, the data becomes compatible with existing ontology schema. We might do nothing but build the mapping between the property of “Shareholding” in ontology schema and the expression of join operations from the table storing shareholder information to the one recording listed company in a real system. No additional effort would be taken to clear the conflict. However, a few specific application requirements, e.g., querying the times of exchanging of shares, can not be supported by existing ontology schema.
Opposed to the above two ways, the third way is to rebuild or refine the ontology schema by enhancing the semantics with the detailed information. By enriching the ontology schema with specific features contained in application, a few already existing application requirements presented above, e.g., querying the times of exchanging of shares, could continue to be satisfied in ontology-based application environment.
FIG. 3 shows a refined ontology schema example by transforming the property of “Shareholding” into a class of “Transaction” which could contain properties like trading time and amount of money. In this example, semantic content in data populates the refined ontology schema without any loss. In accordance with the intrinsic feature of ontology illustrated above, each transaction relationship between “ID1” and “ID2” is referred to as an instance of a new class called “Transaction”, and not subsumed within the property of “Shareholding” as in the older ontology schema. The property of “Shareholding” directing from “Shareholder” to “Listed Company” is transformed into a kind of indirect relationship in which a class serves to define the properties semantically contained in the relationship data. By this way, the query not supported by the second way might be conducted here.
The method of ontology schema refinement based on context, illustrated in FIG. 3, has been well known as the most intuitive and common way to resolve the problem (or “conflict”) that occurs when the property defined as a relationship between two classes is too general to vividly characterize the varieties of semantics that the relationship-property contains. Unfortunately, most of the jobs applying these kinds of refinements are done by hand. Under the circumstance, whether the refined ontology schema is adequate for the application is determined by the knowledge and experience that users master. It is undoubtedly costly to reach the goal.
Besides the huge cost in design time, it is more difficult to monitor the inconsistency dynamically in runtime. With the evolving of the data, even the data schema is stable or with a little revision, the content will also possibly cause the changing of ontology schema. For example, all CEOs of a company are originally different in any two tenure of the office, therefore in the application, users can simply define a serving property to describe the relationship between a CEO and the company. However, since a certain time, a person who served as CEO of the company at past time has come back and still been entitled with CEO. The scenario does not hurt a data schema any more but do harm to ontology schema defined before. It must totally rely on a tool to automatically refine ontology in the runtime.
In order to save the costs, we expect to develop an intelligent agent to finish the job. Even while the automatically refined ontology schema is not the final one that users prefer to employ, it does not matter since most of features have been discovered. Users could design what they prefer only with minor revision on the automatically refined ontology schema. What the present invention prepares to solve is to automatically refine ontology schema based on a context hidden in data.