1. Field of the Invention
This invention relates to database management systems that can handle composite objects, e.g., Abstract Data Types (ADTs), for complex data objects, and more specifically where attributes of the composite objects can be changed in queries, trigger statements, insert statements, and update statements by a mutator function in a syntactical environment such as SQL3 where an order of processing can not create different results.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on random access storage devices (DASD) such as magnetic or optical disk drives for semi-permanent storage.
A DBMS is structured to accept commands to store, retrieve, update, and delete data. One widely used and well known set of commands is called the Structured Query Language (SQL). The current SQL standard is known informally as SQL/92. A new proposed standard is referred to as SQL3. Tables of a relational database management system are created by means of a Data Definition Language (DDL), and the tables are manipulated by means of a Data Manipulation Language (DML). Both of these languages are defined in the SQL standards. The definitions for SQL also provide that a DBMS should respond to a particular query with a particular set of data given a specified database content, but the method that the DBMS uses to actually find the required information in the tables on the disk drives is left up to the DBMS. Typically there will be more than one method that can be used by the DBMS to access the required data. The DBMS will optimize the method used to find the data requested in a query in order to minimize the computer time used and, therefore, the cost of doing the query.
In object-oriented databases (OODB), the database is organized into objects having members that can be pointers to other objects. An object can have parent-child hierarchical relationships. The objects contain references, and collections of references, to other objects in the database, thus leading to databases with complex nested structures.
The integration of object technology and database systems has been an active area of research for the past decade. One area of research has included extending relational database systems to handle data in an object-oriented fashion. It should be noted that SQL has object-relational queries; and IBM, Oracle, Informix and Illustra relational database systems have object oriented features in them.
Along this area of technology enhancements, SQL3 has proposed use of an abstract data type referred to as an ADT. The ADT is a feature of the next generation of object-relational databases. The purpose of introducing abstract data types is to deploy a structure type in order to model complex data easily.
An abstract data type (ADT) is a general mechanism supported by a database engine to model any kind of complex object in an object-relational DBMS. In more general terms an ADT is a type of "composite" object for handling complex data objects. The term "composite" object is used herein to mean an object having subcomponents wherein each subcomponent can be separately changed (i.e., mutated, etc.). Other kinds of SQL types of mechanisms that are also composite objects include arrays, lists, multi-sets, etc. A composite object can handle an infinite number of different kinds of complex objects including, but not limited to, multi-media objects, geographical data, time-series data, web data (e.g., URL documents), etc.
Many applications such as multimedia extenders, spatial extenders, and time-series extenders can be implemented using the ADT facility. An extender is an application that is built on top of a database server. An end-user or application developer does not have to worry about all of the details of the language features in a database supported by the database engine. The extender provides another interface outside of the database engine. An extender is also referred to as a data blade or cartridge.
For example, a multi-media extender can store images, audio, and video in a database. In addition to the object itself being stored, many other attributes associated with the object including format, title, caption, duration, color, actresses, actors, etc., can also be stored in the database. This type of object may be modeled by a time structured mechanism. Time may be important because different types of video formats will determine different ways to express the duration in time of a video. For example, the time interval of each sampling and the number of samples can determine the length of the video.
As other examples, a spatial extender is capable of storing geographical data or location data which could have mapped or coordinated data, including a reference system. A Time-series extender is capable of storing a series of data which is associated with time information. For example, this type of complex object can be used for financial applications such as for storing the value of stocks over a time interval showing a history of the stock.
It should be noted that although the problem that the invention herein overcomes is being explained in terms of an ADT, the problem is also associated with other types of SQL mechanisms including array type, list, multi-sets, etc. Again, all of these meet the definition herein of composite objects.
The usage of ADT is shown through the following example. A CREATE ADT statement in a data definition language (DDL) is shown to create an ADT type called "mediatype" that has seven attributes. For each attribute the first component is an attribute name and the second component is a type name.
______________________________________ CREATE ADT mediaType( title varchar(30), description varchar(256), rated char(4), type char(1), format char(4), duration float, content BLOB(1M)); ______________________________________
The first attribute has an attribute name of "title" and a type name of "varchar(30)". The "title" attribute is the actual title of the work; the "description" attribute is an abstract or summary which describes the contents of the object; the "rated" attribute is the rating of the work such as "C", "PG", "R", etc.; the "type" attribute indicates whether it is audio, video, or image; the "format" attribute indicates the type of format, e.g., if the type attribute is an image, the format attribute indicates whether it is "mpeg", "gif", or "postscript"; the "duration" attribute indicates the length of time of the work; and the "content" attribute contains the complete content of the work whether it be video, image, audio, etc. The type name for the "content" attribute is "BLOB", i.e., a binary large object type.
______________________________________ CREATE TABLE mediaTbl( name varchar(20), publisher varchar(40), unit.sub.-- price float, rental.sub.-- price float, media mediaType); ______________________________________
In the above CREATE table statement, the media table has 5 columns or attributes including name, publisher, unit price, rental price and media. The "media" attribute has a type name of "mediatype" which is the ADT created above. The "mediatype" column encapsulates all of the attributes of the ADT.
When the ADT is created or defined, the database engine not only creates a new type called mediatype, but it also creates a set of methods to manipulate the ADT object. These methods are compiler generated. The following is a minimal set of methods needed to manipulate a mediaType ADT. It should be noted that this minimal set is being used to describe the problem and to later describe the invention. It should be noted that the problem described herein, and the invention later described, are not limited to just these methods; especially since, over time, new methods may be introduced into the SQL standard which may be just as applicable. Basically, any method that acts upon a composite object is applicable.
Constructor: mediaType()-&gt;mediaType PA1 Copy constructor: copy(mediaType)-&gt;mediaType PA1 Observers: title(mediaType)-&gt;varchar(30), . . . PA1 Mutators: content(mediaType, blob(1M))-&gt;mediaType, PA1 Update tbl.T of SET media..rated=PG
The first category of methods is called "constructor". The name of the constructor is the name of the ADT. A constructor creates an ADT with attributes that are set to a default value, e.g., NULL. A second category of methods is called "copy constructor". The name is "copy" and it takes as input the ADT and returns another copy of the ADT object as output. There is one "constructor" and one "copy constructor" per ADT. A third category is called "observer". There is one observer method for each attribute. For example, for the title attribute, there is an observer method called title; and it takes the ADT as input and returns the attribute value as a result. In this example, it is of type varchar(30). Essentially, an observer is a partial read since it reads only a specific attribute of the ADT and not the whole ADT object. The mutator method updates the value of attributes. For example, for the attribute content, the ADT is taken as input and whatever new value for that attribute is the second argument. It mutates that attribute with the new value and returns the mutated object as the result. The mutator method is essentially a partial write since it only updates an attribute within the ADT. The above minimal set of methods are automatically generated by the compiler when the ADT is created. For the above example, the compiler would have generated one constructor, one copy constructor, seven observers, and seven mutators.
The syntax of how these methods can be invoked is as follows. The methods are actually user-defined functions. The methods can be used in any kind of context that a user-defined function can be used. As shown below, the function notation is shown on the right hand side of the "=" sign. For example, "title(media)" means that one wants to get the title attribute value out of the corresponding media; "copy(media)" makes a copy of the input; and "rated(media, `PG13`)" is a mutator function which takes the media as input and mutates the rated attribute value with "PG13".
______________________________________ Double-Dot Notation Function Notation ______________________________________ media..title = title(media) media..copy = copy(media) media..rated(`PG13`) = rated(media, `PG13`) SET media..rated = PG = SET media = rated(media, `PG`) ______________________________________
As shown above, the ADT methods above can be invoked through function notation. Also shown above is an equivalent double-dot notation for each of the above methods. The double-dot notation reflects an object-oriented approach for implementing the methods. Nevertheless, the internal semantics are equivalent whether the function notation or the double-dot notation is used. The double-dot notation for the mutator function for the example "media..rated(`PG13`)" means that the object media is sent a rated message along with the argument `PG13`. The message will tell the object to mutate itself with the new value.
The SET clause is currently used in two different contexts. One is in an update statement as follows:
which means that the rated attribute of the media column of table T is to be updated with the value `PG`. The second context where the SET clause is used is in triggers to update transition variables and transition tables. Regardless of the context used, when the double-dot notation is used on the left hand side of the equality sign in a SET clause, it actually means a mutation function.
It should be noted that the above double-dot notation is not limited to ADTs. It is proposed in the SQL3 standard that the double dot notation can be used in a more general way for general function invocation. For example, 3..smallint is equivalent to smallint (3).
Since double-dot notation and function notation are equivalent, they can be used interchangeably as in the following examples:
______________________________________ UPDATE mediaT SET media..rated = `PG` media..title = `Snow White` media..description = media..title .vertline..vertline. media..description ______________________________________
is equivalent to
______________________________________ UPDATE mediaT SET media = description(title(rated(media, `PG`), `Snow White`, title(media) .vertline..vertline. description(media)); ______________________________________
The above two update statements are equivalent. The first uses double-dot notation which clearly states the purpose of the update statement. However, the second update statement uses function notation which may not be as clear as the previous double-dot notation because the second statement uses nested function invocation. The innermost nest is the mutation function for the rated attribute, rated(media, `PG`), which takes the media input and mutates the rated attribute with `PG`, and returns a mutated object as output. Based on that mutated output, the title attribute is mutated with `Snow White`. Then, the description attribute is mutated with the title attribute value and the description value attribute concatenated together as indicated by ".vertline..vertline.". Regardless of how the sequence of mutations are carried out, at the end of all of the mutations there should be a media object with the designated attributes mutated accordingly, i.e., the media object should reflect an accumulated effect.
As can be seen from the above example, if there were ten attributes in an ADT, there would be ten levels of nested function invocation. Essentially, using function notation to mutate an ADT requires mutating the ADT in a nested fashion.
The following examples show ways to use the methods described above using data manipulation language (DML).
______________________________________ INSERT INTO mediaTbl values (`Family film #1`, `Walt Disney, Inc.` 19.99, 2.5, mediaType( ) ); UPDATE mediaTbl SET media = title(rated(description(...content(media, :hv), ... `A story in Africa...`), `PG13`) `Lion King`); UPDATE mediaTbl SET media..title = `Lion King`, media..rated = `PG13`, media..description = `A story in Africa ...`, ... media..content = :hv; SELECT publisher, unit.sub.-- price, media..title, media..description FROM mediaTbl WHERE media..type = `VIDEO` AND media..rated =`PG13` AND media..duration &lt; 100; ______________________________________
As shown above, an ADT value can be inserted into a table. In the above example, an "empty" object (with all its attributes set to default values) is inserted for the ADT. The UPDATE statement populates the ADT column using a nested function invocation. In the innermost label, . . . content(media, :hv), "media" is referenced which maintains the content of the media column with whatever is represented by the host variable. The result of the mutation is a mutated ADT. The other attributes are mutated through a nested function invocation: the description attribute is mutated with `A story in Africa . . . `; the rated attribute is mutated with `PG13`; and the title attribute is mutated with `Lion King`.
The third example above is an equivalent UPDATE statement using double-dot notation.
The last example above is a query. After a table is populated through the above UPDATE statement, one can use a SELECT statement as shown above to find attributes-e.g., publisher, unit.sub.-- price, title, description-from the media table that meet the criteria in the WHERE clause.
ADT mutators update attribute values in place. That is, the ADT compound structure object is only updated in parts, i.e., the whole object is not updated, only the applicable part of the object is modified, in effect performing piecemeal mutation. ADT mutators are needed to construct ADT values. This is done by using INSERT and UPDATE statements as shown above.
The SQL language is declarative. This means that the order of evaluation selected by the optimizer should not affect the result of the query. For example, if there are multiple items in a select list, for example, SELECT expression1, expression2 FROM table T, it should not matter whether expression1 or expression2 is evaluated first. Likewise, for conjuncts in a WHERE clause having predicate 1 AND predicate 2, it should not matter which predicate is evaluated first. The same should hold true for SET clauses in an UPDATE statement for SQL2 where there are no provisions for ADTs, and as provided otherwise herein for SQL3 which supports ADTs.
Before describing the problem which the invention herein addresses, it should be noted that it is believed that the problem itself has not been previously discerned.
The problem that is being identified and solved by this invention is as follows. Although a mutation is needed to construct an ADT, it will create a side effect that makes the order of evaluation significant. As such, the mutation of an ADT will cause an anomaly in the context of the declarative SQL language. The order of evaluation will indeed matter. This will create a Read/Write conflict and a Write/Write conflict.
An example of a Read/Write conflict is as follows:
______________________________________ SELECT for.sub.-- children(media..rated, media..duration), rated(media, :hv) FROM mediaTbl Where publisher = `Walt Disney, Inc.`; ______________________________________
In the select-list of the SELECT statement the rated attribute of the media column is referenced and a function, for.sub.-- children, is applied to it. In the second select-list item the rated attribute of the media column is mutated with a new value represented by a host variable. Depending on the order of evaluation, the value of the first rated attribute will be either the updated value (if the second select list item is evaluated first since it will reflect a mutated value), or it will be a previous value prior to being mutated (if the first select-item is evaluated first). Consequently, the function `rated` in the second select item has side affects because its purpose is to perform a mutation. This is contrary to SQL being a declarative language since there should be no effect on the order of evaluation. The SQL language does not even differentiate on the order that the SELECT statement is written. For example, writing the above statement as:
______________________________________ SELECT rated(media, :hv) for.sub.-- children (media..rated, media..duration) FROM mediaTbl Where publisher = `Walt Disney, Inc.`; ______________________________________
should have no effect on the values returned since the order of evaluation is not determinative on the order in which lists in statements are written. (It should be noted that the query compiler will give the answers in the order specified; so the values themselves will be the same although the order presented to the user will be different.) Internally, the query optimizer will determine the best access plan to evaluate the SELECT statement. The query optimizer should not be concerned with the order of evaluation.
The following UPDATE statement also illustrates a Read/Write conflict:
______________________________________ UPDATE mediaTbl SET media..title = :hv, media..description = media..title .vertline..vertline. media ______________________________________ description;
In the second line of the UPDATE statement, the title attribute of media is updated; and in the third line on the right hand side of the equality, the title attribute is read. Depending upon which item is evaluated first will determine the value that is read thereby creating different results.
An example of a Write/Write conflict is as follows:
______________________________________ SELECT rated(media, :hv1) FROM mediaTbl WHERE publisher = `Walt Disney, Inc.` AND for.sub.-- children(rated(media, :hv2)..rated, media..duration)=1; ______________________________________
In the SELECT list, the media column is mutated by host variable1, hv1. In the WHERE clause there is another mutation function to mutate the media column by host variable2, hv2. There are two mutations of the same rate attribute value of the media column in a single statement. However, according to the SQL language, it should not matter which function is evaluated first. The result should always be the same. The above illustrates that this is not the case, and thereby creates a problem because the principles of SQL language can not be adhered to. Since the mutation function "rated" has side affects, there is a conflict. Since both functions are writing, it is called a Write/Write conflict.
An example where there is no conflict is as follows:
______________________________________ UPDATE mediaTbl SET media..title = :hv, media..rated = `PG13`; ______________________________________
The media table is updated at two different attribute values. There is no write/write conflict or read/write conflict.