Database systems are computer programs optimized for creating, storing, manipulating, and reporting on information stored in tables. Each table is organized as an array of rows and columns. The values in the columns of a given row are typically associated with each other in some way. For example, a row may store a complete data record relating to a sales transaction, a person, or a project. Columns of the table define discrete portions of the rows that have the same general data format. For example, columns define fields of the records.
Many computer programming languages permit information to be defined by an object type, "class," or "abstract data type" (ADT). ADTs provide a way to classify information. In computer programs written using a language that supports ADTs, every constant, variable, expression, function, or combination has a certain type. Thus, an ADT is an abstract representation of data that serves as a template for an information object. An ADT is made into an explicit representation of the type using a declaration of the constant, variable, or function. The thing that is declared is called an information object or simply an object. In object-oriented programming languages such as C++ and Java, the term "class" refers to the ADT of an object. Objects may store a combination of data and methods for acting on the data. An "instance" of an object matching a class is "instantiated" or created when the program is run, based upon the class definition of the object.
A detailed discussion of ADTs and data structures is provided in N. Wirth, "Algorithms+Data Structures=Programs" (Englewood Cliffs, N.J.: Prentice-Hall, 1976), and D. Knuth, "The Art of Computer Programming" (Reading, Mass., Addison-Wesley Pub. Co., 2d. ed. 1973), ch. 2, "Information Structures". In the context of this document, the term "ADT" refers broadly to a type definition or object class for an information object and is intended to include the meaning of the term "class" as that term is used in object-oriented programming environments.
ADTs may be complex. An ADT may comprise a combination of one or more scalar data types such as integers, arrays and other numeric types, characters or strings, pointers, other ADTs, database tables, or large arrays. Each component of an ADT is called an attribute. However, database systems are known to operate fastest and with greatest efficiency when simple data types are stored in the database tables. Accordingly, storing objects defined by complex ADTs in a database table presents a difficult problem. A number of effective approaches to this problem are disclosed in co-pending U.S. patent application Ser. No. 08/962,409, "Object Representation and Storage in a Database System," attorney docket number 3018-093 (OID 1997-10-02), which is hereby incorporated by reference as if fully set forth herein.
A related problem is communicating complex information objects in a complex, distributed system or network that interconnects different types of computers and program processes. Data are not universally transportable from one computer to any other computer. Different computers, operating systems, programming languages, and application software often use different native forms or formats for representing data. For example, several different formats can be used to represent numbers in a computer memory. Some processors represent a numeric value in memory as a string of bits in which the least significant bit is at the lowest memory location. Other processors represent values with the most significant bit at the lowest memory location. One type of processor cannot directly access and use values stored in a memory that were created by the other type of processor. This is known as a format representation problem. Examples of such incompatible processors are the SPARC and VAX processors.
Incompatibilities also exist among different programming languages that are usable on the same platform. For example, such modern programming languages as C and Pascal enable a programmer to express a set of information in a complex abstract data type such as a record or structure, but there is no universal protocol for representing such abstract data types in a computer memory. This incompatibility increases the complexity of computer systems and makes data interchange difficult and inefficient. Further, different processors may represent a data type of a programming language in different ways. One processor may represent a floating-point number in four bytes while another processor may represent it in eight bytes. Thus, data created in memory by the same program running on different processors is not necessarily interchangeable. This is known as a layout representation incompatibility.
Alignment representation presents yet another problem in data interchange. With some processors, particular values or data types must be aligned at a particular memory location. When data is interchanged, there is no assurance that the inbound information uses the alignment required by the computer receiving the information. Still another problem is inheritance representation. Certain object-oriented programming languages, such as C++, support the concept of inheritance, whereby an abstract data type may inherit properties of a previously defined abstract data type. Languages that support inheritance provide extra pointer fields in memory representations of abstract data types or classes that use base classes and functions defined at runtime. The value of an inheritance pointer is not known until runtime, and is not persistent. Therefore, transmission from one system to another of an instance of an abstract data type that inherits properties from another abstract data type is not generally practical.
Character representation is another problem. Computers used in different nations of the world also may use incompatible character sets. Data formatted in one character set cannot be directly used or interpreted by a system that uses a different character set.
In a networked computer environment, these problems are more acute. A network may comprise several different types of computers, platforms, or application programs. A programmer writing software for use in a widely distributed network has no assurance that a destination or target computer can understand information sent from a source machine. Moreover, many network communication protocols are best suited to the transmission of simple, linear strings of values or characters. Complex abstract data types, especially those with pointers, generally cannot be transmitted reliably over such a network in the same form used to represent the data types in memory. Also, when a pointer points to a large or complex collection of data values, such as a table of a database system, it may be impractical or inefficient to convert the entire table to a universal form for transmission over the network.
Approaches to solving these problems are presented in the above-referenced copending U.S. Patent Application, and in co-pending U.S. patent application Ser. No. 08/961,795, entitled "Apparatus and Method for Pickling Data," attorney docket number 3018-092 (OID 1997-10-01).
Still another problem encountered in these systems is the storage and internal representation of information objects that are defined by ADTs that have collection attributes. In this context, "collection attribute" refers to a non-scalar attribute comprising a related set of scalar attributes, arranged in one or two dimensions. For example, collection attributes include nested database tables and varying length arrays.
Yet another problem of these systems is the storage and representation in columns of tables managed by database systems, including relational databases and object-relational database systems, of information objects that are collections. For example, present database systems do not provide a mechanism by which information collections, such as a database table or a varying length array, can be stored in a column of a database table.
Present systems do not provide a convenient, efficient way to store information objects defined by an ADT in which attributes are collections. In one approach, when an application needs to have a collection associated with an information object, generally the collection has to be stored apart from the information object, e.g., in a separate object. In this approach, the system also must provide a mechanism to relate the original information object to the collection object. This is awkward and imposes overhead. Having methods for storing collections as attributes of an ADT is highly desirable, because it provides programmers and application developers with greater freedom in defining and manipulating information structures.
In another approach, the contents of a collection are stored in a database column associated with an information object. For example, in a method of storage in which each attribute of the ADT of an information object is stored in a separate column of a database table, each collection attribute also is stored in a separate column. However, in most database systems approach this is impractical because the systems impose a limit on the size of an element in a column. For example, in one known database system each element in a column must be no larger than 4096 bytes. This is too little to accommodate a large nested table or large varying length array.
Thus, there is a need for ways to store information collections integral with information objects.
There is also a need for a mechanism that permits information collections to be stored in association with information objects as an attribute of the objects.
There is a need for a mechanism providing for storage of information objects that have collection attributes in database tables.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.