1. Field of the Invention
The present invention relates generally to object-oriented software systems, and more particularly, to the storage and retrieval of objects.
2. Related Art
Object-oriented programming methods are the primary programming methods used by the computer software industry. Object-oriented programming methods allow programmers to create complex systems by reusing common mechanisms. Specifically, object-oriented programming is a method of implementation in which programs are organized as cooperative collections of objects, each of which represents an instance of an object model, and whose models are members of a hierarchy of models united via inheritance relationships. Object-oriented systems are designed according to these object models, commonly referred to as classes. Objects are created, or instantiated, by creating in memory an instance of a class. Object models encompass principles of abstraction, encapsulation, modularity, hierarchy, typing, concurrency and persistence. Examples of object-oriented programming languages include C++, Java, Smalltalk, Eiffel, Ada, and CLOS, among others. Objects and object-oriented programming concepts are well-known and are described in more detail in the reference entitled Object-Oriented Analysis and Design With Applications, Second Edition by Grady Booch, Benjamin/Cummings Publishing Company, 1994, incorporated herein by reference.
An object has both state and behavior. That is, an object stores data, interacts with other objects, and many include one or more methods associated with that object. An object method typically performs some function with respect to data. Primitive data for an object includes simple storage types such as integer or character data types. A simple object comprises simple storage types. A general compound object is a combination of simple objects, simple storage types and other compound objects. A fundamental concept of object-oriented programming is to define a construction mechanism that allows a compound object to be built from simpler objects. A compound object has all of the operational characteristics of the simpler existing object class plus any modifications to behavior or data.
An object generally has a size which defines an area of storage locations that an object could use to store information related to the object. The size of the object is typically known by a compiler for the programming language. The size of the object is the aggregation of immediate data, references to data, and access to a function table defined for the class. Immediate data is data that is defined within the structure of the object. References are generally pointers to data. The size of immediate data is known by the compiler while references are typically pointers to data located elsewhere. Generally, a pointer is an identifier that indicates a location of an item of data.
Referenced data may be fixed or variable size. For an object with a reference to data it is usually not always the object responsibility to know whether the data that is referenced is fixed or variable in size. The referenced data contained by an object may be a reference to a block of data of fixed sized, a block of data of variable size or a reference to some other object or an instance of some other object. Typically, a general storage mechanism accounts for storage elements including immediate data, referenced data and object references and object instances.
An object may be derived from another object. The instance data of a derived object has as a subset instance data of the base object and also a function table of the base object. This relationship is defined as xe2x80x9cinheritance.xe2x80x9d From the derived object""s point of view the instance data of the base object appears to be immediate data. The size of the base instance data is known apriori. The base object is cognizant of whether the base instance data is purely immediate data (no references) or whether there is a mix of references and or immediate data. The derived object has no direct access to the base instance data. The derived object may access the base instance data by the base function table.
A derived object may override the functionality of a base object function. In this situation, the derived function is called if a request is made directly on the derived object. A xe2x80x9cvirtualxe2x80x9d function is a special case of derivation. A class definition declares that a function is to be virtual. By so doing, the class includes a function that may be overriden by the derived class. If a base object reference is called with a function that is overriden by a derived class object instance, the derived function is called and the base object assumes the personna of the derived class. If the function is not virtual, the base function calls the function from the base class function table whether the derived object class provides an override function or not. A virtual function table is maintained by the computer system in memory as a result of code generated by the C++ compiler at compile time. An object that includes methods that are virtual has a compiler-generated table that is not visible to the user. The virtual function table is evolved as a derived object is constructed and devolved as a derived object is destructed.
When an object is realized in memory, it is said to be instantiated. Instantiation of objects is performed by functions known as constructors. Constructors operate on volatile memory and define the way to initialize an object of its class. The content of an object has two major components: (1) data comprising the instance data of the class and (2) an object function table that defines methods of the class. The data comprising the object is generally in one of two forms: (1) immediate data and (2) a reference to data located elsewhere. An object class may have one or more constructors that defines how an instance of the object is to be built in volatile memory.
Each constructor has a unique parameter passing argument list which defines how the object is to be constructed. The user calls the constructor with an argument set that matches one of the defined sets of arguments to initialize an object with the desired constructor. Defining functions such as constructors, with the same name that operate on different types is called overloading, and is well-known in C++. The constructor usually allocates memory for the object according to the size of the object and initializes all data that forms the immediate data area of the object. Each object class has a special function table entry for a function, referred to as a destructor, that is called when the object is destroyed. The destructor is utilized to release all system assets that the object utilizes. One of the main purposes of the destructor is to release memory that was allocated by the object on its behalf during some phase of computation.
A derived object is created by calling a constructor in a manner analogous to the original base class. The derived object calls a base class constructor by invoking a base constructor with a matching set of arguments from a base class constructor set. The base class is intialized before any instance data of the derived object is initialized or before the code in the constructor written by a user is called.
Object-orientation programming languages allow data to be abstract, that is, defined by the user. Further, object-oriented languages allow objects to inherit properties from other objects. In particular, inheritance permits a programmer to create objects that inherit data and methods from other objects. Objects in an inheritance relationship are said to have a hierarchical relationship. Generally, software systems include many objects in an object hierarchy.
Objects may include a number of different data types. Examples of these types are integer (int) and character (char). Some data types such as integer and character are built-into a programming language such as C++. There are also types referred to as abstract data types that are user-defined, such as those complex data types defined by class definitions in an object-oriented programming language. Data types and the C++ language are described in more detail in the reference entitled C++ Programming Language, Third Edition by Bjarne Stroustrup, Addison-Wesley, 1997, incorporated herein by reference.
Objects are said to be either dynamically or statically bound to a data type. Both static and dynamic terms refer to the times when names of objects are bound to types. Static binding means that the types of all variables and expressions are fixed at the time of compilation; dynamic binding means that the types of all variables are not known until runtime.
An object takes up some amount of space in memory and exists for a particular amount of time. Persistence in an object-oriented system allows a programmer to save state and class of an object across time and space. More specifically, persistence is defined as the property of an object through which its existence transcends time (i.e. the object continues to exist after its creator, such as a software program, ceases to exist) and/or space (i.e. the object""s location moves from the address space in which it was created).
Programming languages such as C++ and object-oriented databases provide an ability to store and retrieve runtime objects from memory. Generally, objects are persisted from volatile to non-volatile types of memory. Volatile memory or storage is storage that loses its data when power is removed from the system. An example of volatile memory device is a RAM device of a general purpose computer. In contrast, non-volatile memory or storage holds information when power is removed from a system. Examples of non-volatile memory include magnetic media such as hard disks or diskettes or optical media such as optical disks. Generally, media is defined as the means by which data is transmitted or stored. Other types of volatile and non-volatile media are available.
In C++, there are three fundamental ways of using volatile memory. In static memory, an object is allocated a predefined amount of memory for the duration of the software program. Automatic memory is an area in which function arguments and local variables are allocated. Automatic memory is automatically created and destroyed and is referred in the art to as xe2x80x9cthe stack.xe2x80x9d Free store is a type of memory which is explicitly requested by the program and where a program can the free store memory once the program is done with it (such as through using C++ new and delete operators). Free store (also referred to as xe2x80x9cthe heapxe2x80x9d) grows throughout the lifetime of the program because no free store memory is returned to the operating system. Objects may be instantiated on the stack or heap, depending on the implementation.
One such programming language that provides persistence functions is the Visual C++ language available from the Microsoft Corporation. The Visual C++ language includes an archive function which allows a programmer to create persistent objects. Also, Microsoft provides what is known in the art as the Microsoft Foundation Class (MFC) programming library which is used by object-oriented programmers to create software programs. These and other objects may be persisted to non-volatile media by using the conventional archive function.
Basic steps involved in conventional recovery of objects from non-volatile media include (1) allocating memory for an object, (2) accessing non-volatile media (such as a file on disk storage), (3) copying data from non-volatile media to internal memory buffers of a volatile image of the file, and (4) copying elements of data from the volatile memory image of the file to the internal contents of the object. This process is performed for each individual object, and may take long periods of time for systems with large numbers of objects.
In one aspect of the invention, a method is provided which eliminates steps of conventional archival of objects, thereby saving time in their recovery. Improved response time during recovery provides a competitive advantage for software systems that use large numbers of objects, such as computer-aided design systems and systems having complex graphical user interfaces, among others. In some complex software systems, thousands of objects are not uncommon.
According to various aspects, the number of memory allocations is reduced and the number of data transfers from non-volatile medium to the internals of the object is reduced. Also, additional processing of immediate data of objects is not required. Also, the need to recover each of the individual elements defined by the simple storage types is eliminated. As a result, individual allocation steps for each individual objects are eliminated; a single allocation operation is performed which bypasses the need to make the individual allocation requests to memory allocators. In another aspect, a system and method is provided that conforms with standard operations generated by commercially-available compilers.
In one aspect, a storage and retrieval system and method is provided that stores a collection of objects on a permanent medium in such a way as to allow for fast recovery of the object contents from the medium. A major objective is high performance in reconstitution of the objects. The objects that exist in a volatile memory state are stored on a non-volatile medium so that the objects may be recovered later in volatile memory. The objects that are reconstituted from the non-volatile medium have equivalent computational behavior as an original collection that resides in volatile memory. Alternatively, objects may be persisted to volatile memory as well. That is, the objects may be stored on volatile memory in the same format as the non-volatile memory format and moved electronically to some other location for reconstitution elsewhere without ever being persisted to non-volatile memory.
In another aspect, a data structure is provided that includes a plurality of objects. The plurality of objects may be serialized as a stream. The data structure may be used to represent a document composed of objects. Alternatively, the objects may be transmitted over a communications network.
In another aspect, a computer program product is provided that comprises a computer readable medium having computer program logic recorded thereon for enabling a processor in a computer system to store objects, the computer program being adapted to cause the computer to perform steps of a) storing, in a data stream, data of a first object having a reference to a second object; b) storing, in the data stream, a reference to the location second object in the data stream; and c) storing, in the data stream, data of the second object.
Further features and advantages of the present invention as well as the structure and operation of various embodiments of the present invention are described in detail below with reference to the accompanying drawings. In the drawings, like reference numerals indicate like or functionally similar elements. Additionally, the left-most one or two digits of a reference numeral identifies the drawing in which the reference numeral first appears.