1. Field of the Invention
The present invention relates generally to database management systems and more particularly to techniques for defining constraints on columns in database management systems.
2. Description of Related Art
Relational database systems have become increasingly robust with many features that insure the integrity of the data stored within the columns of tables of a database instance. The means of ensuring the integrity of data are a set of business rules defined by the application developer and are referred to as Integrity Constraints in the database system.
There are many types of integrity constraints; the most relevant type for the present discussion is referential integrity constraints. A referential integrity constraint is a constraint which maintains consistency between coupled tables. A value from a key column in a first one of the coupled tables is used as a value in a non-key column of a second one of the coupled tables. The value from the key column is termed the parent key; the key value in the non-key column of the other table is termed the foreign key. A referential integrity constraint requires that every foreign key value in the non-key column of the second table have a value which is equal to the value of one of the keys in the key column. A consequence of the rule is that when keys are added to or deleted from the key column in the first table, there may be effects on the foreign key values in the second table. The rules for dealing with these effects are:                Restrict: Disallows the update or deletion of parent keys.        Set to Null: When a parent key is updated or deleted, all of the foreign keys which have that value are set to NULL.        Set to Default: When a parent key is updated or deleted, the foreign keys which have the value are all set to a default value.        Cascade: When a parent key is updated, all foreign keys which had the old value are correspondingly updated. When a row containing a parent key is deleted, all rows having foreign keys with the parent key's value are deleted.        No Action: Disallows the update or deletion of parent keys. This differs from RESTRICT in that it is checked at the end of the execution of the SQL statement that makes the change statement, or at the end of the transaction to which the execution of the SQL statement belongs if enforcement of the constraint is deferred until then.        
FIG. 1 shows a set of tables which will be used in the following to demonstrate referential integrity constraints. Table EMP 105 has five columns that define the employee within the context of his employment.
Tables 103 and 105 are coupled by a referential constraint, as shown by arrow 104. The primary key values in column DNAME 109(i) are parent keys. The foreign keys which have the values of the parent keys are values in column DNAME 109(ii). The referential integrity constraint guarantees that every value in the column 109(ii) already exists as a value in the column 109(i). Because of the constraint, the relational database system handles inserts in a manner that ensures data integrity. Insert 107 violates the referential integrity constraint in that the value ‘MARKETING’ for the column DNAME 109(ii) cannot be found in the rows of column 109(i). The referential integrity constraint further enforces rules concerning what occurs in EMP table 105 when a row containing a DNAME 109(ii) has the corresponding row deleted from DEPT table 103.
What a referential constraint does in general terms is constrain DNAME column 109(ii) such that all of the values in DNAME column 109(ii) must be selected from the set of values defined by the values of DNAME column 109(i). There are many contexts other than parent keys and foreign keys where it would be desirable to constrain a column such that the values in the column had to be selected from a dynamic set of values, that is, a set of values whose member values were not known until the time the constraint was enforced. In the following, such constraints will be generically termed dynamic set constraints. At present, the only dynamic set constraints which are supported in relational database systems are referential constraints. It is an object of the invention disclosed herein to provide a relational database system which supports other kinds of dynamic set constraints.
One kind of dynamic set constraint which is particularly useful is one in which the dynamic set is a set of terms that belong to a domain in an ontology. For purposes of the following discussion, ontology and domain are defined as follows:                An ontology defines the terms used to describe and represent an area of knowledge. Ontologies are used by people, databases, and applications that need to share domain information (a domain is just a specific subject area or area of knowledge, like medicine, tool manufacturing, real estate, automobile repair, financial management, etc.). Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them. They encode knowledge in a domain and also knowledge that spans domains. In this way, they make that knowledge reusable (From OWL Web Ontology Language Use Cases and Requirements. W3C Recommendation (10 Feb. 2004). Jeff Heflin, editor)        
FIG. 2 shows a graph depicting an illustrative ontology that represents the relationships between departments of a company. Each node of the graph represents a department. The arcs connecting the nodes represent relationships between the nodes. Here, all of the arcs represent the relationship “subDepartmentOf” 204 (modeled using the subClassOf property) i.e., the node at the lower end of the arc is a subdepartment of the node at the upper end. Department_Ontology 201 begins with a Management department node 203. Engineering 205, Sales 207, and Manufacturing 209 represent subdepartments of the Management department node 203. These nodes have further sub-nodes which represent lower level departments. The ontology defines various sets of departments. For example, the set of all the departments is all the departments which have nodes in the graph. The set of engineering departments is Engineering 205 and its subdepartments Software and Hardware.
As will be explained in more detail below, a set of departments defined in the ontology may be used as a constraint on column DNAME 109(i) or on column DNAME 109(ii). For example, if EMP table 105 were a table of engineering employees, column DNAME 109(ii) could be constrained such that a row for an employee could be added to the table only if his or her department was Engineering, Software, or Hardware.
Within the RDBMS, an ontology can be represented using Resource Description Framework (RDF). RDF is a language that was originally developed for representing information (metadata) about resources in the World Wide Web. It may, however, be used for representing information about absolutely anything, including ontologies. U.S. Ser. No. 11/108,204, Integrating RDF data into a relational database system, discloses how this may be done.
U.S. Ser. No. 11/108,204 discloses the integration of RDF into SQL by means of a set of tables and user objects that represent RDF data sets and a table function RDF_MATCH that takes a specification of an RDF data set and an RDF pattern as parameters and returns a set of result rows of triples from the RDF data set that match the RDF pattern. The solution of the RDF pattern may include inferencing based on RDFS and user-defined rules.
The signature of RDF_MATCH is as follows:
RDF_MATCH (PatternVARCHAR,ModelsRDFModels,RuleBasesRDFRules,AliasesRDFAliases,)RETURNS AnyDataSet;
FIG. 3 shows an SQL SELECT statement that includes an RDF_MATCH function invocation at 305. The RDF data from which the function invocation will return triples is shown at 223 in FIG. 2. The first parameter is one or more character strings 307 that indicate the RDF pattern to be used for the query. Each character string specifies a template which the set of RDF triples returned by the query must match. The notation ?<name> indicates that the template will be matched by any RDF triple which satisfies the rest of the character string. Thus, ?r rdf:type STUDENT matches any triple having the type STUDENT. The triples returned are the ones that satisfy the AND of the templates; thus the pattern at 307 specifies that triples be returned for students of any age that are reviewers of papers for any conference. This set of triples is shown at 316. The remaining parameters specify the RDF data set to be queried. Models specifies the data set's RDF models, in the case of invocation 305, the Reviewers model, as indicated at 309, RuleBases specifies rule bases that contain the RDF rules that apply to the models, and Aliases specify any aliases that apply to the RDF data set. These parameters are set to NULL in invocation 307. As is true with any table function, RDF_MATCH returns a set of result rows. Here, the result rows describe an RDF graph that is a sub-graph of the graph represented by the RDF dataset (including rulebases) against which the query has been posed.
It should be noted that the contents of the result rows returned by RDF_MATCH will depend on the RDF pattern used in the query and the RDF data against which the query is run. For this reason, the return type for RDF_MATCH has been defined as AnyDataSet, which is a collection of tuples of a generic type called AnyData. When an SQL query employs the RDF_MATCH table function, components of the query such as its SELECT, WHERE, ORDER BY, etc., clauses can reference the variables present in the RDF pattern simply by the variable names.
An advantage of using the RDF_MATCH table function in a SELECT statement to query RDF data is that any SQL construct that can be used with a SELECT statement can be used to further process the result rows 316 returned by RDF_MATCH. These constructs include iterating over the result rows, aggregating values contained in the result rows, constraining the result rows using WHERE clause predicates, sorting the result rows using ORDER BY clauses, and limiting the result rows by using the ROWNUM clause. Also, the SQL set operations can be used to combine result sets of two or more invocations of RDF_MATCH. In SELECT statement 303, the WHERE clause limits the triples to those for students whose age is less than 25. The output of the SELECT statement is shown at 314.
An Ontology can be accessed using the new SQL operators disclosed in U.S. Ser. No. 10/916,547, System for ontology-based semantic matching in a relational database system. The new operators are named ONT_RELATED, ONT_EXPAND, ONT_DISTANCE, and ONT_PATH. These operators may be used directly in SQL statements, and thus allow database users to combine these semantic matching operators with other conventional SQL operations such as joins to make use of the full expressive power of SQL while performing semantic based matching. Prior to executing a query containing the semantic matching operator, the specified ontology is expressed in RDF and represented in the database system as described in U.S. Ser. No. 11/208,204. The operators, explained using ontology 201 of FIG. 2, are the following:
The ONT_RELATED operator performs ontology-based semantic matching and is expressed within an SQL statement using an expression of the form: “ONT_RELATED (term1, reltype, term2, ontology).” When executed, the ONT_RELATED operator determines whether the two input terms (term1 and term2) are related by the specified relationship type “reltype” by consulting the ontology. Thus, ONT_RELATED (Software, subDepartmentOf, Management, Department_Ontology) will return TRUE, because the subDepartmentOf relationship (modeled using subClassOf property) is transitive, and consequently, if Software is a subdepartment of Engineering and Engineering is a subdepartment of Management, then Software is a subdepartment of Management as well.
Two ancillary operators, ONT_PATH and ONT_DISTANCE, are employed to determine additional measures for pairs of terms belonging to the ontology, namely, the shortest path connecting two terms and the length of the shortest path, respectively.
The ONT_EXPAND Operator.
This operator returns rows representing a set of terms in the ontology. The terms to be returned are indicated by parameters that specify a first term, a second term, and a relationship between the first and second term. The specified relationship may be either a simple relationship or combination of them. The following is an example of the operator's declaration:
CREATE TYPE ONT_TermRelType AS OBJECT (Term1Name VARCHAR(32),PropertyName VARCHAR(32),Term2Name VARCHAR(32),TermDistance NUMBER,TermPath VARCHAR(2000) );CREATE TYPE ONT_TermRelTableType AS TABLE OFONT_TermRelType;ONT_EXPAND (Term1, RelType, Term2, OntologyName ) RETURNS ONT_TermRelTableType;
Typically, non-NULL values for RelType and Term2 are specified as input and then the operator computes all the appropriate <Term1, RelType, Term2> tuples in the closure taking into account the characteristics (transitivity and symmetry) of the specified RelType. In addition, it also computes the relationship measures in terms of distance (TermDistance) and path (TermPath). For cases when a term is related to input term by multiple paths, one row per path is returned. It is also possible that ONT_EXPAND invocation may specify input values for any one or more of the three parameters or even none of the three parameters. In each of these cases, the appropriate set of <Term1, RelType, Term2> tuples is returned.
The term1, reltype, and term2 can have either a specific input value or NULL value. The NULL means all possible values. For example, ONT_EXPAND (NULL, subDepartmentOf, Engineering, Department_Ontology) will generate all nodes that are related by the subDepartmentOf relationship to the node Engineering, namely the nodes Software and Hardware.
Objects of the invention include providing techniques for defining column constraints in terms of arbitrary sets of values and providing techniques for dealing with the effects of changes in the set of values used to define the constraint on the values in the column subject to the constraint, as well as providing techniques for defining column constraints in terms of sets of values obtained by queries on ontologies.