Database systems are computer programs optimized for creating, storing, manipulating, and reporting on information stored in tables. The tables are organized as an array of rows and columns. The values in the columns of a given row are typically associated with each other in some way. For example, a row may store a complete data record relating to a sales transaction, a person, or a project. Columns of the table define discrete portions of the rows that have the same general data format. For example, columns define fields of the records.
Modern computer programming languages permit information to be defined by an "abstract data type" (ADT), object type or class. Object types provide a way to model real world information. In computer programs written using a language that supports object types or classes, every constant, variable, expression, function, or combination has a certain type. Thus, an object type is a representation of the structure of the data and the operations (or behavior) associated with the data. An object type is made into an explicit representation of the type using a declaration of the constant, variable, or function. The thing that is declared is called an information object or simply an object. In object-oriented programming languages such as C++ and Java, objects may store a combination of data and methods for acting on the data. An object is "instantiated" or created when the program is run, based upon the declaration of the object.
For example, in the C language a programmer can define an object type as:
______________________________________ struct Employee { char[20] Name; Date Hired; OCIArray *Supervises; OCITable *Dept.sub.-- Name.sub.-- Table; int Salary; char[16] Position; int Employee.sub.-- Number ______________________________________
This example assumes that earlier in the same program, other types such as Date have been defined. The programmer can then define an explicit object of the type Employee. Thereafter, the programmer can use the object in expressions that refer to values of the abstract data type. A detailed discussion of object types and data structures is provided in N. Wirth, "Algorithms+Data Structures=Programs" (Englewood Cliffs, N.J.: Prentice-Hall, 1976). In this context, the term "ADT" refers broadly to the concepts of abstract data types, object types, and classes.
ADTs may be considerably more complex than the type Employee shown above. An ADT may comprise scalar data types such as integers and other numeric types, characters or strings, pointers, other ADTs, database tables, or arrays defined and stored in association with each other. Each such component of the ADT is called an attribute. Object-relational database systems, in addition to storing scalar data (like integers, strings), can store objects as well. The reason for storing objects is to minimize mismatch between applications and the database system. However, database systems are known to operate fastest and with greatest efficiency when simple data types are stored in the database tables. Accordingly, storing objects defined by complex ADTs in a database table presents a difficult problem.
One approach to this problem is to separate the attributes of an ADT and store each attribute in a column of a database table. In this approach, each row represents an instantiated object, and the object is said to be stored in "unpacked" format. This approach is used in the commercial products known as Illustra and UNI-SQL. This approach has the advantage that standard database operations such as indexing, sorts and filters can be carried out on the stored object.
However, a disadvantage of this approach is that the attributes of the object must be identified, separated, and individually written to database columns whenever the object is stored. When the database table is created, each column of the database must be declared to have a data type compatible with the data type of the attribute of the object to be stored in that column. This is difficult or impossible in the case of attribute data types that are not supported or recognized by the database.
Also, when the object is retrieved using the database table, each stored attribute of the object must be read from columns of the table, and the attributes then must be assembled into an object. This is inefficient because it requires processing operations to be carried out on attributes that are not necessarily needed by the program that is storing or retrieving the object to or from the database. Past systems such as Illustra and UNI-SQL do not provide a way to retrieve the information object from the database table as a single object assembled as defined in its ADT. Rather, in these past approaches, the information of the object can be retrieved only as a series of discrete items stored in columns.
Another disadvantage of this approach is that it is awkward to use in a complex, distributed system or network that interconnects different types of computers and program processes. Data are not universally transportable from one computer to any other computer. Different computers, operating systems, programming languages, and application software often use different native forms or formats for representing data. For example, several different formats can be used to represent numbers in a computer memory. Some processors represent a numeric value in memory as a string of bits in which the least significant bit is at the lowest memory location. Other processors represent values with the most significant bit at the lowest memory location. One type of processor cannot directly access and use values stored in a memory that were created by the other type of processor. This is known as a format representation problem. Examples of such incompatible processors are the SPARC and VAX processors.
Incompatibilities also exist among different programming languages that are usable on the same platform. For example, such modern programming languages as C and Pascal enable a programmer to express a set of information in a complex abstract data type such as a record or structure, but there is no universal protocol for representing such abstract data types in a computer memory. This incompatibility increases the complexity of computer systems and makes data interchange difficult and inefficient. In addition, such abstract data types may include pointers or addresses that direct a compiler or processor to another memory location a portion of the data of the abstract data type is located. Not all programming languages use or understand pointers. Some programming languages permit a pointer to reference the same abstract data type that contains the pointer. Such "circular references" are not compatible with all languages or platforms and cannot easily be transported over a network.
Further, different processors may represent a data type of a programming language in different ways. One processor may represent a floating-point number in four bytes while another processor may represent it in eight bytes. Thus, data created in memory by the same program running on different processors is not necessarily interchangeable. This is known as a layout representation incompatibility.
Alignment representation presents yet another problem in data interchange. With some processors, particular values or data types must be aligned at a particular memory location. When data is interchanged, there is no assurance that the inbound information uses the alignment required by the computer receiving the information.
Still another problem is inheritance representation. Certain object-oriented programming languages, such as C++, support the concept of inheritance, whereby an abstract data type may inherit properties of a previously defined abstract data type. Languages that support inheritance provide extra pointer fields in memory representations of abstract data types or classes that use base classes and functions defined at runtime. The value of an inheritance pointer is not known until runtime, and is not persistent. Therefore, transmission from one system to another of an instance of an abstract data type that inherits properties from another abstract data type is not generally practical.
Character representation is another problem. Computers used in different nations of the world also may use incompatible character sets. Data formatted in one character set cannot be directly used or interpreted by a system that uses a different character set.
In a networked computer environment, these problems are more acute. A network may comprise several different types of computers, platforms, or application programs. A programmer writing software for use in a widely distributed network has no assurance that a destination or target computer can understand information sent from a source machine. Moreover, many network communication protocols are best suited to the transmission of simple, linear strings of values or characters. Complex abstract data types, especially those with pointers, generally cannot be transmitted reliably over such a network in the same form used to represent the data types in memory. Also, when a pointer points to a large or complex collection of data values, such as a table of a database system, it may be impractical or inefficient to convert the entire table to a universal form for transmission over the network.
When the unpacked storage format is used in such a heterogeneous environment, objects retrieved from the database must be transported around the computing environment in object form. As a result, the objects may need to undergo transformation or conversion at each different machine.
Another problem arising in the storage of complex information objects in a database system is that database tables often have a maximum column size that is well below the amount of storage needed to store an attribute of an object. For example, one known database allows a maximum of 4096 bytes to be stored in a single column of a database table. Complex information objects often have attributes that are far larger than this, such as nested tables or arrays of a megabyte or more in size.
Maintaining the state of objects during the interval between successive executions of a program that uses the objects, known as object persistence, is another problem that arises in using computer programs that create complex information objects. When a large, complicated program runs, it creates numerous inter-related objects in main memory. If the main memory becomes full, some of the objects must be extinguished, or temporarily stored using a non-volatile storage device such as a disk storage device. This is known as a virtual memory approach. Another approach for using large objects in systems with limited volatile memory is to store a "graph" or description of the objects and their interrelationships, and recreate the objects when needed. Either approach causes degradation in program performance because of the relatively slow response time of disk storage devices. In addition, neither approach provides a way to permanently save the state of the objects, because the approaches are intended only to store objects temporarily during program execution. If the program or computer system crashes, the state of the objects may be lost. This can result in catastrophic data loss.
Still another problem arising in storage of objects in databases is the representation of objects that are defined but contain no information. The Structured Query Language (SQL) used in many databases permits a "null" value to be assigned to any column that has a scalar data type. The null value indicates that the value of the column is undefined. Generally, the null value is stored in the column as a reserved bit pattern. Another approach is to store a null value or sentinel value in the column adjacent to a data value. The disadvantage of these approaches is that they remove a potentially desirable bit pattern from the set of all usable bit patterns for that column. For example, some systems define the value zero, or a negative number, as representing a null value. This is undesirable because the value used to represent null may represent a useful value for that data type, or an application program or user may need that bit pattern for a particular application.
Moreover, in this approach each data type potentially has a different bit pattern that indicates null. For example, a database can define the value -1 as null. This value can be used for columns of type Age, where Age represents a person's age in years, because a person's age is not expressed in negative integers. But if the same database has a column of type Account.sub.-- Balance, representing the monetary value in an account, the value -1 is potentially needed and cannot be used to represent null. Accordingly, application programs or the database server must track all the different bit patterns that mean null, and translate such bit patterns into a null indication. This is cumbersome and error-prone. It is virtually impossible to define a reserved null value (or "sentinel value") that is usable in any data type and that is not potentially needed by any type to represent a legitimate value.
In addition, this approach is not easily adapted to columns defined to store information objects declared using complex ADTs. The traditional approach provides no way to mark the entire object as null. In previous approaches, the only way to mark an object as null is to store a null value in each attribute of the object. When such an object is retrieved, the retrieving process must check each attribute for a null value and can conclude that the object is null only if all attributes are set to null. This is slow and inefficient.
Further, it is desirable to retain the ability to apply queries in the Structured Query Language (SQL) to objects that are stored in a database. Past object storage approaches have been best suited for use in conventional computer programming environments in which a program or compiler is accessing the stored objects. These approaches have not provided a way in which stored objects can be queried and processed using SQL.
Thus, there is a need for a system or process that permits complex information objects to be stored efficiently in a table of a database system.
There is also a need for a system or process that permits complex information objects to be retrieved efficiently from a table of a database system without unnecessary operations to reconstruct the object.
There is a need for a system or process that can enable storage of complex information objects in database table columns that have limited size.
There is also a need for a system or process that can efficiently store null values in information objects defined by complex abstract data types, and that are stored in columns of a database table.
There is also a need for a system that supports performing SQL operations on objects stored in a database in these forms.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.