The present invention relates to the fields of databases, object oriented databases, distributed computing systems, and object-oriented programming. More specifically, the present invention teaches methods, data structures, and apparatus for providing transparent persistent data support to data types which are foreign to a database or other persistent storage mechanism.
While the teachings of the present invention are suitable for a variety of computing environments, the background and detailed description of the invention present a number of specific examples drawn from a couple of different computing environments. Because of its capabilities and current popularity, the class of object oriented computing environments is particularly highlighted and in fact a number of embodiments of the present invention are well suited for use therein. However, the examples provided in both the background and the detailed description are intended to draw forth and clarify details of the present invention and should in no way be construed as limiting.
Object oriented programming methodologies have received increasing attention over the past several years in response to the growing tendency for software developed using traditional programming methods to be delivered late and over budget. This stems from the fact that traditional programming techniques that emphasize procedural models and "linear" code tend to be difficult to design and maintain in many circumstances. Generally, large programs created using traditional methods are "brittle". That is, even small changes can effect numerous elements of the programming code. Thus, minor changes made to the software in response to user demands can require major redesign and rewriting of the entire program.
Object oriented programming strategies tend to avoid these problems because object methodologies focus on manipulating data rather than procedures; thus providing the programmer with a more intuitive approach to modeling real world problems. In addition objects encapsulate related data and procedures so as to hide that information from the remainder of the program by allowing access to the data and procedures only through the object's interface. Hence changes to the data and/or procedures of the object are relatively isolated from the remainder of the program. This provides code that is more easily maintained as compared to code written using traditional methods, as changes to an object's code do not affect the code in the other objects. In addition, the inherent modular nature of objects allows individual objects and interfaces to be reused in different programs. Thus, programmers can develop libraries of "tried and true" objects and interfaces that can be used over and over again in different applications. This increases software reliability while decreasing development time, as reliable programming code may be used repeatedly.
A more recent advance in the field of object oriented methodologies has been the implementation of distributed object operating environments over computers interconnected via a computer network. As used herein, the term "distributed object" or "object" refers to an encapsulated package of code and data that can be manipulated by operations through an interface. Thus, distributed objects will be seen by those skilled in the art of object oriented programming (OOP) as including the basic properties that define traditional programming objects. However, distributed objects differ from traditional programming objects by the inclusion of two important features. First, distributed objects are multilingual. That is, the interfaces of distributed objects are defined using an interface definition language (IDL) that can be mapped to a variety of different programming languages. One such interface definition language is Object Management Group's IDL. Second, distributed objects are location-independent, i.e., distributed objects can be located anywhere in a network. This contrasts sharply with traditional programming objects which typically exist in a single address space.
Elaborating further on the distributed object operating environment, distributed objects can be object clients or object servers, depending upon whether they are sending requests to other objects or replying to requests from clients. In a distributed object operating environment, requests and replies are made through an Object Request Broker (ORB) that is aware of the locations and status of the objects. One architecture which is suitable for implementing such an ORB is provided by the Common Object Request Broker Architecture (CORBA) specification. The CORBA specification was developed by the Object Management Group (OMG) to define the distributed computing environment world in terms of objects in a distributed client-server environment, where server objects are capable of providing services to clients requesting the service.
From the perspective of the distributed object life cycle, objects fall into one of two categories: transient or persistent. When discussing the transient or persistent nature of an object, what is being referred to is the transient or persistent nature of the object's state. As will be well familiar to those skilled in the art of object oriented programming (OOP), an object may be described by two components: executable code and state. Executable code is essentially the instructions by which the object operates; it defines the "behavior" of the object. State is simply the remaining portion of the object such as data which is not code. In further explanation, the nature of the state which an object will maintain is defined through the variables defined by the object developer. More specifically, these variables will be of a predetermined data type, as will be discussed below.
Much of the prior art teaching is directed towards transient objects. Transient objects typically have a short life span and are bound to a single host computer process. That is, when a host computer process ceases, all transient objects residing in the host computer process cease. Therefore there is no continuity of identity of a transient object from one process to another. Because transient objects are bound to a single process, they inherently cannot change their location. Hence transient objects could also be referred to as "immobile" objects, as their addresses may never change. A programmer whose object kinds include only transient objects is limited in preserving the object state from instance to instance.
In contrast, persistent objects are not bound to a single process and their address and memory location may change over time (e.g., they may have many "life" cycles). With a persistent object, there is a continuity of identity from one process to another. In brief, persistent objects are objects whose state (i.e. the variables of persistent data type) can outlive the life of a specific instance of the object. As will be appreciated, persistent objects may provide many advantages to the object developer. Unfortunately, prior strategies for implementing persistent objects fall short of providing a satisfactory solution to the object developer, as will be described below.
In what is perhaps the crudest strategy for providing persistent objects, the programmer codes within the object the reading and writing (i.e., the management) of data from a permanent storage medium such as a hard disk drive. While this strategy may work for simple scenarios, there are at least two defects. First, the object developer bears the burden for implementing this data management into each persistent object. Second, this strategy has unnecessary overhead. By way of example, each object must include the previously mentioned code for persistent data support. Furthermore, the structure of the data file may become quite intricate, demanding elaborate parsing just to read and write data. All of this data management must be done at execution time by the object, resulting in an expensive solution in terms of system resource utilization.
In another approach for providing persistent objects, persistence has been introduced into existing object programming languages. In one approach, an extension to the existing programming language provides an interface to the functionality of a database. For example, a compiler for an extended language using this approach might accept a specialized set of commands geared for use with a database. The most common type of database used with these methods is a relational database. As relational databases were not designed for storing many of the data types in which object state is typically found, this approach may introduce its own set of dilemmas. Object oriented databases are available, but objects created through object oriented databases are also limited by the data types available within the object oriented database. In any event, each of the prior art strategies places an undue burden on the programmer as they must still consciously manage the object data persistence.
FIG. 1 illustrates one possible flow 100 of the creation of a persistent programming language object in accordance with prior art, the prior art limited to data types understood by the object oriented database utilized. As will be appreciated by those skilled in the art, an object developer will generate a variety of files including two distinct components which are: an object oriented (OO) program 101; and an OO schema definition 102. The OO program 101 is coded in a standard OO language such as C++ and defines the behavior and attributes of the object being developed. The OO schema definition 102 is coded in a data definition language (DDL) and defines and describes all the object variable structures and the natures thereof. OO schema definition 102 thus includes variable declarations defining variable names and data types, as well as an indication of which variables require persistence as provided by the object oriented database (OODB).
The creation flow 100 is as follows. 00 schema definition 102 is processed by the OODB schema tool 104 to generate the OODB schema 106 and the local headers 108. Essentially, the local headers 108 are the data classes available within the object. As will be appreciated, an object "class" is a template from which an object can be created. It is used to specify the behavior and attributes (e.g. persistent) available to all objects of the class. The OODB is the schema utilized by the OODB to help provide persistence.
The local headers 108 are, figuratively, merged together with the OO program 101 and the product compiled by the OO compiler 114. In turn, the product of the 00 compiler 114 is linked together with standard libraries 118 by an OO linker 115 to generate the OO binary 116. As will be appreciated, the OO binary 116 is executable code from which object instances are generated. Finally, the OODB engine 110 utilizes the OODB schema 106 together with a particular object instance to generate a database which maintains persistent data in a persistent storage medium 120. Thus the types of data which can be maintained persistently are directly limited, on the one hand by the DDL and on the other hand by the capabilities of the OODB.
For any given data definition language (such as CORBA's DDL), a certain number of data types will be defined. As will be appreciated, the data types define the structural characteristics, features, and properties of data that may be directly specified by the software developer. In an object oriented computing environment, the given DDL will likely include the standard, well known, OOP data types. Examples of these include LONG INTEGER, SHORT INTEGER, FLOATING POINT, CHARACTER STRING, ARRAY, and STRUCTURE. Additionally, the given DDL will include data types which are specific to the particular object operating environment. For example, OMG's DDL includes the data types OBJECT REFERENCE, ANY, TYPE CODE, and PRINCIPLE.
While much of the preceding discussion has focused on object oriented computing environments, many of the same dilemmas arise within other computing environments. In general, when a computing environment provides a persistent storage mechanism of the prior art (e.g. a relational database or an object oriented database), the prior art persistent storage mechanism only provides persistent storage to known data types. Thus persistence is not provided to "foreign data types" which are not known to the persistent storage mechanism.
In order to bring out a further defect of the prior art, attention is now focused on the data type CORBA::ANY. When data which a selected parameter may receive must not be constrained to a particular data type, the object developer defines the selected parameter as a data type ANY. By way of analogy, a data variable of type ANY can be thought of as an envelope (the defined parameter of data type ANY) which can contain a letter (the data whose type is a priori unknown). A distributed object operating environment abiding by CORBA will provide facilities for construction of a type ANY parameter from whatever declared data types subsequent operations require. In essence, the in memory form of a type ANY parameter is composed of a graph of non-contiguous elements linked together by pointers. The CORBA specification does not define any other form visible to the developer.
Based upon the CORBA defined form of the data type ANY, developers are not provided with data type ANY externalize and internalize commands. As will be familiar, such "externalize" commands would convert the data type ANY parameter into a form suitable for operations such as interprocess communication, network transmission, and storage in persistent memory. Conversely, "internalize" commands perform the tasks involved in reforming the data type ANY parameter into an active internal format. Since object developers are not provided with type ANY externalize/internalize commands, they don't have a direct mechanism to perform this conversion. Hence they must write their own externalize/internalize operator commands or, on a case by case basis, open up the type ANY "envelope" to extract data and recreate the type ANY "envelope" to insert data.
Similar dilemmas are encountered with a variety of important data types within a number of computing environments. For example, the well known data type "STRUCT" is a foreign data type within computing environments utilizing relational databases. In general, externalizing and internalizing data types such as OBJECT REFERENCE, ANY, TYPE CODE, PRINCIPLE (each foreign to OODBs and relational databases) and STRUCT (foreign to relational databases) require difficult, time consuming development which software developers should not be concerned with. However, externalizing and internalizing are basic steps in accomplishing any persistent data mechanism and thus, if persistence is desired, these steps cannot be avoided. What are needed are methods, apparatus and data structures to transparently (with respect to the developer) perform the processes of externalizing and internalizing a variety of data types while transparently managing the desired persistence.