1. Field of the Invention
This invention is related to object-oriented programming environments, and more particularly to the calculation of the memory size required by an object.
2. Background
An object-oriented programming (“OOP”) environment is often described as a collection of cooperating objects, as opposed to a traditional programming environment in which a program may be seen as a group of tasks to compute (subroutines or functions). In OOP, a “class” is an abstract construct that provides a blueprint defining the nature of an object that can be created. Objects are therefore created based on the structure and definitions of a particular class. In object-oriented programming, an “object” is an instance (or instantiation) of a class and multiple objects (instances) can be created based on the same class. The class object contains a combination of data and the instructions that operate on that data, making the object capable of receiving messages, processing data, and sending messages to other objects. A “cluster” is essentially a group of classes or objects.
In the context of data storage and transmission, serialization is the process of converting an object into a binary form that can be saved on a storage medium (such as disk storage device) or transmitted across a network. The series of bytes in the serialized binary form of the object can be used to re-create an object that is identical in its internal state to the original object (actually, a clone). The opposite of serialization, known as deserialization, is the process by which the serialized binary form can be constructed into an instance of the object.
Each object created in memory requires a portion of the available memory capacity. Some of that memory portion may be comprised of information imposed by the programming environment “overhead” and some of which may be consumed by data contained in the object. Many related objects with references between each other are often used to represent a complex piece of data and are known as clusters. Each constituent object in the cluster can require its own portion of available memory.
The performance cost of reading data from disk storage devices such as hard disk drives is typically much higher than accessing the same data in random access memory (RAM). This is generally contributed to the slow mechanical nature of magnetic disk drives and the slower data transmission paths from a hard disk drive to the memory and microprocessor components. For this reason, higher performance memory (such as RAM) is typically used as a memory cache or data buffer for data that is stored on the hard disk drive. This technique increases performance when the same data is accessed repeatedly and minimizes performance delays caused by retrieving data directly from the hard disk drive when it is required for a processing task. There is an added delay when retrieving objects from the hard disk drive because they are stored in a serialized binary form and must first be deserialized (i.e., “constructed”) into instantiated objects before processing. Processing performance is therefore significantly increased when run-time instances of deserialized objects are cached in memory before they are needed, as opposed to being stored only on the hard disk drive in a serialized representation of object clusters.
However, storage capacity on a hard disk drive is typically much cheaper and far greater than that of memory. There is therefore a limited amount of high-performance memory available to be used as a memory cache or data buffer. In determining how to best allocate available memory to maximize performance, it is generally necessary to first determine how much memory will be required by objects under consideration for being deserialized and instantiated in memory.
There is therefore a need to increase performance by deserializing and creating run-time objects in the memory cache but in order to optimize utilization of the limited memory capacity, there is a need to first determine how much memory will be required (and used) by the objects before any such creation takes place.
One simple solution is to calculate accurately the exact amount of memory required for each and every object, but the calculations and processing required to analyze each and every object to make such a determination can prove excessively expensive in performance terms and thus detracts from the benefits of any subsequent performance gains.
By way of example, an Extensible Markup Language (XML) document that has been parsed can be represented as a cluster of related objects. The Document Object Model (DOM) application programming interface allows for navigation of such XML documents wherein the “tree” of “node” objects represent the document's content. However, DOM implementations tend to be memory intensive because the entire document must be loaded (i.e., deserialized) into memory as a tree of objects before access is allowed. Thus, the exact amount of memory required for the tree (cluster) of objects cannot be determined without deserializing it into run-time objects in memory, at least temporarily. This is therefore an expensive operation in terms of performance.
Another possible solution is to cache the object representations in memory in their serialized format, creating the run-time object instances only when they are needed and discarding them from memory as soon as they are no longer required. This strategy is essentially a “half way approach” because by copying only the serialized object representations to memory, no calculations of the size of actual run-time instances of the objects are required. However, performance is only partially improved because each time objects need to be processed, the serialized object representations must first be deserialized into the actual run-time instances of objects, which decreases performance. Thus, while the calculation of the required memory cache for serialized objects is simple and this technique improves performance as opposed not having objects in the memory cache altogether, it will not perform as well as situations where the actual run-time instances of objects are created in the memory cache.
It is therefore the case that where the size of a piece of data on the hard drive disk is known (i.e., the serialized representation of an object), but the size of its representation in memory is much harder to compute (i.e., the run-time instance of the object), it becomes very hard to optimize the use of memory cache where there is a fixed maximum capacity of cache. There is a need for a method that increases performance and achieves an optimal utilization of available memory by deserializing actual run-time instances of objects into memory, but without having to first calculate the exact amount of memory required for each and every object.