The present invention relates to database systems and, more specifically, to the use of user-defined data types within database systems.
Typically, data items are stored on non-volatile memory in a different way than they are stored in volatile memory. Thus, when loaded into volatile memory from non-volatile memory, some conversion operation must be performed. Similarly, when stored back onto non-volatile memory from volatile memory, another conversion operation must be performed. For the purpose of explanation, the process of converting data from a volatile format to a non-volatile format is referred to herein as xe2x80x9cpicklingxe2x80x9d the data item, and the process of converting data from a non-volatile format to a volatile format is referred to herein as xe2x80x9cunpicklingxe2x80x9d the data item.
Many conventional relational database systems perform automatic pickling and unpickling for only a few types of data. The types of data supported by such systems typically include scalar types, such as numbers and dates. Relative to other programming environments, such as xe2x80x9cCxe2x80x9d and xe2x80x9cJavaxe2x80x9d, the set of data types supported by typical database systems is extremely limited. Thus, difficulties arise when the database systems are used to store data that is created and used by computer programs that were written in those other environments.
For example, a user may create a set of routines that represent and manipulate data using a complex user-defined data type (xe2x80x9cTYPE1xe2x80x9d). For the purpose of illustration, the user-implemented routines that use TYPE1 are collectively referred to herein as APP1.
The structure of TYPE1, or any of the attributes thereof, may be significantly different than the structure of any data type supported by a database system (xe2x80x9cDBS1xe2x80x9d). To pass the data used by APP1 to a database managed by DBS1, every TYPE1 data item must be converted to one or more instances of the data types that are supported by DBS1. Once the data is converted to data types that DBS1 understands and supports, DBS1 can store and retrieve the data from disk. Likewise, for APP1 to use data from DBS1, the data must be converted from the structure associated with the data types supported by DBS1 into the structure and format associated with TYPE1.
Referring to FIG. 1, it is a block diagram illustrating the conversion operations that must be performed to allow APP1 to store its data within DBS1. Specifically, a data item generated within APP1 is stored according to the structure and format of user TYPE1. To pass the data item into DBS1 for storage, the data item is converted to a data type supported by DBS1 (DBTYPE1). While in volatile memory within DBS1, the data item is stored as an unpickled DBTYPE1. DBS1 pickles the DBTYPE1 data item to store it on disk.
To supply APP1 with a data item stored on disk, DBS1 unpickles the data item to create an unpickled DBTYPE1 data item. The unpickled DBTYPE1 data item is then converted to the user TYPE1 data type before being supplied to the routines within APP1 that manipulate the data item.
An example of a user-defined type is a type declared as follows:
struct TYPE1
{
int i;
char *s;
}
This declaration may occur, for example, within the source code of APP1. The source code of APP1 also includes one or more methods used to manipulate data that is stored in a TYPE1 data structure. An example of the interface to such method is:
my_method(TYPE1 *me, int i);
A TYPE1 data item may be passed into DBS1 by mapping the attributes of TYPE1 to data types that are supported by DBS1, such as Number and Date. An example of a statement to create a database object for storing data from a TYPE1 data item is:
create type DBTYPE1 as OBJECT
(
a Number;
b Date;
memberprocedure.set_date();
To convert data between the TYPE1 structure used by APP1 and the DBTYPE1 structure used within DBS1 to store TYPE1 data, the following structure may be used:
struct
{
OCINumber n;
OCIDate d;
}
In this example, it was assumed that the attributes of TYPE1 could be adequately represented by data types supported by DBS1. However, data types designed and implemented in common programming languages (such as C or Java) are not easily captured by the database system because their internal structures are modeled using the particular constructs of the language, and are not understood by the database system.
Object oriented databases are tightly coupled to a particular programming language and, even though they enable modeling of data types in that language, the flexibility of language neutrality in the database system is lost. For example, if DBS1 is designed around the same language as was used to create APP1, then DBS1 may support the TYPE1 data type. But if, on the other hand, DBS1 is designed around a different language than was used to create APP1, complicated conversions may still be necessary.
To reduce the burden associated with converting user-defined types whose attributes do not closely correspond to data types supported by a database system, some database systems support a xe2x80x9cRAWxe2x80x9d data type. From the perspective of the database system, a RAW data item is simply a dump of bytes with no structure. As with other database-supported data types, RAW data items may be stored in the columns of relational tables. Because the database system does not assume any structure to a RAW data item, the RAW data item may be used to store the data for complex user-defined data types that have attributes that are not easily converted to any data type supported by the database system.
The following statements create a table with a RAW column that can be used, for example, for storing data from a TYPE1 data items.
create table t
(coll raw(20), . . . );
The following statement creates a routine that is internal to the database for invoking the external my_method routine:
create procedure mymethod(a IN RAW)
The input to this internal routine is a RAW data item, while the external my method routine expects a TYPE1 data item. Consequently, the implementation of the mymethod procedure must take the form:
mymethod(a)
{
raw-to-struct(a)
manipulate
struct-to-raw(a)
}
In this example, the mymethod routine receives a RAW data item xe2x80x9caxe2x80x9d. The raw-to-struct(a) statement invokes a user-supplied routine that converts the data item from the RAW format used by the database to store the data item to the TYPE1 format used by APP1. The xe2x80x9cmanipulatexe2x80x9d statement generally represents calls to user-supplied routines that manipulate the TYPE1 data item. After the desired operations have been performed on the data item, the call to struct-to-raw(a) converts the data item from the TYPE1 structure back to the RAW format used by the database.
Referring to FIG. 2, it is a block diagram illustrating the conversion operations that must be performed to allow APP1 to store its data within a database (DBS1) that supports the RAW data type. Specifically, a data item generated within APP1 is formatted according as xe2x80x9cuser type1xe2x80x9d. To pass the data item into DBS1 for storage, the data item is converted to the RAW data type. While in volatile memory within DBS1, the data item is stored as unpickled RAW data. DBS1 pickles the RAW data to store it on disk.
To supply APP1 with a data item stored in the database, DBS1 unpickles the RAW data item to create unpickled RAW data. The unpickled RAW data is then converted to the user TYPE1 data type before being supplied to the routines within APP1 that manipulate the data item.
As illustrated by the example, even with database systems that support the RAW data type, the user that creates the user-defined type (the xe2x80x9ctype implementorxe2x80x9d) is responsible for providing routines for converting RAW entities back and forth into their appropriate structured equivalents every time the control is handed over to user routines from the database system. Specifically, in the example given above, the type implementor is responsible for writing the raw-to-struct and struct-to-raw routines.
There are various drawbacks associated with storing data from user-defined types as RAW data items within the database. For example, this technique does not support strong typing. That is, data items associated with different user-defined types are stored in the database as the same database type. Thus, the database system and other database users cannot differentiate one of these types from another, as they are all treated as raw entities by the database management system. Consequently, the database system would not be able to detect situations in which one user erroneously stores data from one kind of user-defined data type in a RAW column that is supposed to hold data from a different kind of user-defined data type.
In addition, the technique of storing user-defined types as RAW data provides poor modeling. It is very cumbersome for a type implementor to work around the database system""s inability to store user-defined data types. Further, this technique provides relatively poor performance because performing conversions every time data moves back and forth between the database system and the user application is computationally expensive.
To provide support for strong typing as well as take advantage of database support for the RAW data type, the data for user-defined types may be stored in database object types that have RAW attributes. For example, assume that a type implementor has defined two types TYPE1 and TYPE2. Data from the TYPE1 user type may be stored in database objects created by the following statements:
create type DBTYPE1 as OBJECT
(
a RAW(20);
mymethod1(a IN DBTYPE1);
Similarly, data from the TYPE2 user type may be stored in database objects created by the following statements:
create type DBTYPE2 as OBJECT
(
a RAW(20);
mymethod2(a IN DBTYPE2);
)
By using database-defined objects in combination with the RAW data type in this manner, the data associated with TYPE1 and TYPE2 user-defined data types may be distinguished from each other within the database system. However, the type implementor is still responsible for supplying raw-to-native-format conversion routines. In addition, the overhead associated with invoking the conversion routines is still incurred every time data for the user defined types passes between the database system and the user-supplied routines.
Specifically, if the mymethod1 routine is to perform any data manipulation, then the RAW attribute of the input data item xe2x80x9caxe2x80x9d must be converted from a RAW data item to a TYPE1 data item. After the manipulation, the TYPE1 data item must be converted back to the RAW data attribute. Similarly, mymethod2 would involve converting a RAW attribute to a TYPE2 data item, calling an external routine, and then converting the TYPE2 data item back to a RAW data attribute.
Based on the foregoing, it is clearly desirable to provide a mechanism that allows a type-implementor to construct data types in the programming language of the type-implementor""s choice (C, JAVA, etc). It is further desirable to have a database system store and index those data types even though it does not understand the internal structure of such types. In addition, it is desirable to provide a mechanism that allows data of user-defined types to appear in their native language environment in their native form (as C structures or Java classes) while continuing to be accessible from other language environments in the database management system. It is also desirable to reduce or eliminate the need to perform conversions every time a set of data passes between the database environment and its native language environment.
A method and apparatus are provided for handling within a database system data items that are associated with data types whose native structure is not known to the database system. The data items are stored within the database system in their native structure, even though it is not understood by the database system. To store the data items, the database system calls a pickling routine that is provided by the user, or by the runtime subsystem of the programming environment that is native to the data item. To retrieve the routine from storage, the database system calls an unpickling routine, also provided by the user or the appropriate runtime subsystem. Because the database maintains the data items in their native format, no conversions are required as the data items are passed between the database system and external routines that manipulate the data items.
Techniques are also provided for declaring attributes of the data item that can be accessed within the database system. The user provides routines for the database system to call in order to access the declared attributes, which may be different than the actual attributes that the data item has in its native environment.