Ontology provides a way to model things that exist. Basic constructs of an ontology model are classes, properties thereof and inheritance. Classes are sets, the elements of which are referred to as instances of the class. For example, a class People is a set of instances that represent specific people. A property, p, of a class is a function p: C→D from a class C, referred to as the source of p, to a class D, referred to as the target of p. The classes C and D may be the same class of different classes. When it is important to distinguish between properties defined on different classes, the notation C.p is used to denote a property, p, defined on C.
Properties may be composed, so that if p: C→D and q: D→E, then the composition qop: C→E has source C and target E. The composition is denoted by C.p.q.
A class C is said to be a subclass of D if C⊂D, in which case D is also said to be a superclass of C. In this case, every instance of C is also an instance of D, and properties defined on D are also defined on C by inheritance. For example, a class named Passengers may be a subclass of a class named People. A property such as firstName, defined on People is inherited by Passengers.
A special class named Being is defined in an ontology model as a universal class that contains all classes as subclasses thereof. Properties defined on Being are thus inherited by all classes in the ontology model.
Certain properties, referred to as “representations,” take on concrete fundamental alphanumeric values. The significance of representations is that they are properties one can reason about using arithmetic, logical and string operators, since their type corresponds to the types of mathematical expressions and programming language expressions.
In order to accommodate and provide values for representations, a special class Values is preferably created, so as to include all possible fundamental values a property may have. In addition, a special class Formats is also created, to include formats in which instances of Values can be expressed. Formats include inter alia conventional integer formats, real number formats, character string formats and date and time formats. A function representation: Values×Formats→Alphanumerics, converts a value into an alphanumeric string according to a specific format. For example, if lastName: People→Values, then representation(person.lastName, titleCase)=“Smith” (a character string), for an instance, person, of People corresponding to John Smith. Observe that lastName is a representation, and titleCase is a format.
Alternatively, various formats can be modeled as properties on the class Values, or subclasses thereof. With respect to this alternative model design choice, the last name of John Smith represented as a character string in title case is denoted person.lastName.titleCase. Observe that lastName and titleCase are both representations in this alternative model.
Applicant's co-pending application U.S. Ser. No. 10/053,045, filed on Jan. 15, 2002 and entitled “Method and System for Deriving a Transformation by Referring Schema to a Central Model” describes mapping data schema, including inter alia relational database schema and XML schema, into a central ontology model. Basic constructs of the data schema are mapped to classes, properties and compositions of properties in the central ontology model. Thus, for relational database schema, tables are generally mapped to ontology classes, and fields of tables are generally mapped to ontology properties or compositions of properties—more specifically, to properties or compositions of properties with target Values. Similarly, for XML schema, complex types are generally mapped to ontology classes, and elements and attributes within complex types are generally mapped to ontology properties or compositions of properties.
Enterprise data systems, especially for large enterprises, typically include multiple data sources that may be compliant with different data schemas. Indeed, as a result of several generations of IT and/or mergers and acquisitions, several databases with different schemas may contain information on the same functional area of the business. For example, enterprise employment data may be stored in relational databases conforming to a first relational database schema, enterprise accounting data may be stored in relational databases conforming to a second relational database schema, enterprise sales forecasts may be stored in relational databases conforming to a third relational database schema, enterprise inventory data may be stored in XML documents conforming to a first XML schema, and enterprise bill of materials data may be stored in XML documents conforming to a second XML schema. Often these various data sources may overlap, and it is difficult for a user to query across the enterprise data sources. For example, a user may want to know the bills of materials for items that need to be replenished in inventory based on demand forecasts.
There is thus a need for a unified querying tool that enables a user to query across data sources conforming to disparate data schemas.
The need for a unified querying tool also arises with one or multiple data sources, when engineers involved in application development, enterprise application integration or data warehousing may not be aware of the precise semantics of a database, and may therefore be unable to use its data appropriately.
The need for a unified querying tool also arises when a single question crosses multiple data sources.
Another difficulty faced by enterprises is being able to locate data within multiple data sources. With reference to the example above, a user may want to locate data sources containing employee stock option data. Such data may be distributed over multiple data sources, and may involve joining relational database tables that conform to different data schema.
There is thus a need for a data locator tool that enables a user to specify data of interest, and receive a list of constructs corresponding to the data of interest, and the various data sources containing data for such constructs.
The need for a data locator tool also arises when trying to locate overlaps, where a single aspect of enterprise information is stored in multiple locations. Such overlaps signal the potential for bad data quality, as they generally lead to inconsistencies.