The present invention relates to database systems and, more specifically, to the use within database systems of data types that are not supported by the database language of the database systems.
The data types supported by a database system are generally dictated by the database language through which the database system communicates with external applications. The data types that are supported by the database language of a database system are referred to herein as xe2x80x9csupported data typesxe2x80x9d, while data types that are not supported by the language used by the database are referred to as xe2x80x9cunsupported data typesxe2x80x9d. Instances of supported and unsupported data types are respectively referred to as supported and unsupported data items.
A popular database language is known as the Structured Query Language (SQL). Numerous versions of SQL have been developed, including SQL92 and more recently SQL3. The types of data supported by SQL generally include scalar types, such as numbers and dates. Relative to other programming environments, such as xe2x80x9cCxe2x80x9d and xe2x80x9cJavaxe2x80x9d, the set of data types supported by SQL is extremely limited. For example, SQL does not provide support for multimedia objects, time series, and spatial data. Thus, difficulties arise when the database systems are used to store data that is created and used by computer programs that were written in those other environments, but which cannot be expressed in the database language understood by the database system.
One environment in which complex data types are prevalent is the electronic commerce (xe2x80x9ce-commercexe2x80x9d) environment. In the e-commerce environment, numerous standards attempt to dictate the format of complex objects that should be used for exchanging business data. One such standard, referred to as EDI (Electronic Data Interchange), was developed by the Data Interchange Standards Association and has been established as ANSI X12.
An EDI message contains a string of data elements, each of which represents a singular fact, such as a price, product model number, and so forth, separated by delimiters. The entire string is called a data segment. One or more data segments framed by a header and trailer form a transaction set, which is the EDI unit of transmission (equivalent to a message). A transaction set often consists of what would usually be contained in a typical business document or form. The parties who exchange EDI transmissions are referred to as trading partners.
In addition to EDI, certain industries have developed their own industry-specific data format standards. For example, SWIFT (Society for Worldwide Interbank Financial Telecommunication) specifies the data formats and protocol for transferring funds electronically. HL7 (Health Level 7) specifies the data formats and protocol for transferring patient records electronically.
Data definition languages, such as XML (Extensible Markup Language), allow individuals, groups or industries to define data formats that can be used to share data on the World Wide Web, intranets, and elsewhere. For example, a set of companies might agree on a standard or common way to describe the information about a particular type of product that they manufacture. The companies can then use XML to describe a data format for storing the product description information. If the companies store data in the agreed-upon format, then a user may send an intelligent agent (a program) to each company""s Web site, gather data, and make a valid comparison.
Difficulties arise when a database system is used to store data that, in its native environment, has a format that is not understood by the database system. One approach to using a database system under these circumstances involves converting each element of the unsupported data type to a data type that is supported by the database system.
For example, a business application (APP1) may expect data to be formatted according to a complex EDI data type (xe2x80x9cTYPE1xe2x80x9d). The structure of TYPE1, or any of the attributes thereof, may be significantly different than the structure of any data type supported by a database system (xe2x80x9cDBS1xe2x80x9d). To pass the data used by APP1 to a database managed by DBS1, every attribute of a TYPE1 data item must be transformed to one or more instances of the data types that are supported by DBS1.
Once the data is transformed to data types that DBS1 understands and supports, DBS1 can store and retrieve the data from disk. Likewise, for APP1 to use data from DBS1, the data must by transformed from the structure associated with the data types supported by DBS1 into the structure and format associated with TYPE1.
Referring to FIG. 1, it is a block diagram illustrating the transformation operations that must be performed to allow APP1 to store its data within DBS1. Specifically, a data item generated within APP1 is organized according to the structure and format of TYPE1. To pass the data item into DBS1 for storage, the data item is transformed to data types supported by DBS1 (dbtype1 . . . dbtypeN). While in volatile memory within DBS1, the data item is stored as unpickled instances of dbtype1 . . . dbtypeN. DBS1 pickles the instances to store them on disk.
To supply APP1 with data currently stored on disk, DBS1 unpickles the instances of dbtype1 . . . dbtypeN to create unpickled instances of dbtype1 . . . dbtypeN. The unpickled data is then transformed to the structure of the TYPE1 data type before being supplied to the routines within APP1 that manipulate the data item.
To reduce the burden associated with transforming unsupported types whose attributes do not closely correspond to data types supported by a database system, some database systems support a xe2x80x9cRAWxe2x80x9d data type. From the perspective of the database system, a RAW data item is simply a dump of bytes with no structure. As with other database-supported data types, RAW data items may be stored in the columns of relational tables. Because the database system does not assume any structure to a RAW data item, the RAW data item may be used to store the data for complex unsupported data types that have attributes that are not easily transformed to any data type supported by the database system.
The following statement creates a routine that is internal to the database for invoking an external xe2x80x9cmanipulatexe2x80x9d routine:
create procedure my method(a IN RAW)
The input to this internal routine is a RAW data item, while the external manipulate routine expects a TYPE1 data item. Consequently, the implementation of the my method procedure must take the form:
my method(a)
{
raw-to-struct(a)
manipulate
struct-to-raw(a)
}
In this example, the my method routine receives a RAW data item xe2x80x9caxe2x80x9d. The raw-to-struct(a) statement invokes a user-supplied routine that transforms the data item from the RAW format used by the database to store the data item to the TYPE1 format used by APP1. The xe2x80x9cmanipulatexe2x80x9d statement generally represents calls to user-supplied routines that manipulate the TYPE1 data item. After the desired operations have been performed on the data item, the call to struct-to-raw(a) transforms the data item from the TYPE1 structure back to the RAW format used by the database.
Referring to FIG. 2, it is a block diagram illustrating the transformation operations that must be performed to allow APP1 to store its data within a database (DBS1) that supports the RAW data type. Specifically, a data item generated within APP1 is formatted according to xe2x80x9cuser type1xe2x80x9d. To pass the data item into DBS1 for storage, the data item is transformed to the RAW data type. While in volatile memory within DBS1, the data item is stored as unpickled RAW data. DBS1 pickles the RAW data to store it on disk.
To supply APP1 with a data item stored in the database, DBS1 unpickles the RAW data item to create unpickled RAW data. The unpickled RAW data is then transformed to the user TYPE1 data type before being supplied to the routines within APP1 that manipulate the data item.
As illustrated by the example, even with database systems that support the RAW data type, the user that creates the unsupported type (the xe2x80x9ctype implementorxe2x80x9d) is responsible for providing routines for transforming RAW entities back and forth into their appropriate structured equivalents every time the control is handed over to user routines from the database system. Specifically, in the example given above, the type implementor is responsible for writing the raw-to-struct and struct-to-raw routines.
There are various drawbacks associated with storing data from unsupported types within the database using the techniques described above. Specifically, whether or not the database supports the RAW data type, performing transformations every time data moves back and forth between the database system and external applications is computationally expensive. Further, critical data may be lost during such transformations, for example, when attributes have higher resolution in their native format than they do in the format to which they are transformed for storage in the database system.
Performing transformations every time data moves back and forth between the database system and external applications also raises legal problems. For example, it may be critical to establish that a particular electronic message is an xe2x80x9coriginalxe2x80x9d. However, after undergoing transformations going into and out of the database environment, the resulting message is no longer identical to the xe2x80x9coriginalxe2x80x9d message, and therefore may no longer qualify as being the xe2x80x9coriginalxe2x80x9d message.
Based on the foregoing, it is clearly desirable to provide a mechanism that reduces or eliminates the need to perform transformations every time a set of data passes between the database environment and its native environment. It is further desirable to provide a mechanism that allows messages retrieved from a database to be identical to messages originally delivered to the database so that storage within a database does not disqualify a message from being a legal xe2x80x9coriginalxe2x80x9d.
A method and system are provided for handling within a database system data items that are associated with data types that are not supported by the database language used to the database system. The data types are registered with the database system by storing, in the database, metadata that indicates the structure, physical layout, and constraints that apply to the data types.
The database system uses the structure and physical layout information to access individual elements of data items. The database system validates data items using the constraint information, which includes constraints that dictate what values are valid for certain elements based on what values are supplied for other elements. Data items associated with many different data types may be stored in the same table.
Applications can access the data items through transformation views that transform the data items from their native format to a format expected by the applications.