1. Field of Invention
The invention relates to the organization of data objects on computers and other electronic devices. More particularly, it relates to composition of data objects and nested data objects, and insertion and removal of variable-length data in data-objects and nested data-objects at run-time.
2. Prior Art
A data-processing system, such as a computer or other electronic device, stores data objects in their storage-units, such as memory-regions, files or file-regions, data-structures, or the like.
A data-object comprises a number of data-elements organized in a logical manner. A data element may be a primary data-object or a composite data-object. A data-object that is itself an element of another data-object is called a child or nested data-object, and the other data-object is called the parent data-object.
For the purposes of data-composition, a distinction is made between a data-object and a pointer to the data-object. Thus,                i. A data-object may be stored within another data-object's storage-space. The data-object is then called a child data-object, or a nested data-object.        ii. Alternatively, a data-object may be stored in its own storage-space. A pointer, reference, handle, or other address of the data-object is stored in another data-object. Thus, the data-object is not an element of the other data-object. Instead, the pointer, reference, handle, or other address is the child data-object, or nested data-object, in the other data-object.        
Data-objects may be small, or large. A ‘struct’ in the C language is an example of a data-object. Data-structures and file structures may also be data-objects. A database, comprising a number of data-tables and indexes, is another example of a data-object. The amount of storage-space required or used by a data-object is called its size or length.
A storage-unit provides a number of sequential addresses in a logical address space, for storing data. Data-objects may be stored in storage-units such as memory-regions, files, or the like. Some data-structures (for example, an array) also provide a number of sequential logical addresses for storing data, and may be used as storage-units in which data-objects may be stored.
A storage-unit also has a physical address space—such as pages in memory, and sectors or clusters on disk. In some cases, the physical addresses may be the same as the logical addresses. However, in other cases, the physical addresses are distinct the logical addresses. In such cases, an association is maintained between a physical address and a logical address in order to provide access to the data.
How is storage-space allocated at run-time
Dynamic composition of data-objects, including modification of existing data-objects, requires additional storage-space. Usable, or free, storage-space may be found in a number of places:    1. Free storage-space may be allocated by creating a new storage-unit. Thus, a new memory-region, or a new file etc., may be created to provide storage-space.    2. Free storage-space may be found within a data-object if spare-space is reserved for the purpose.    3. Free storage-space may be provided by a heap, or other heap-like dynamic storage manager. The heap comprises a large memory-region from which it allocates a number of smaller sub-regions for storing variable-length data objects. Thus,            a. The C language provides a ‘malloc’ function to allocate a number of bytes from a heap.        b. In many file-structures, heap-like memory-managers track unused storage-regions and are used to dynamically provide storage-space.            4. Free storage-space may be allocated at the end of the logical address space. For example,            a. A number of lines may be inserted in a text file by using storage-space at the end of a file.        b. Many C compilers implement the ‘alloca’ function, which is used to allocate a variable-length memory region at the top of the process stack.        
Each of these methods for allocating memory has its disadvantages. The choice of method used depends on the length of the data, the type of operations expected to be frequently performed on the data, and the need for speed.
Methods for Composition of Data-Objects
A number of methods for composition of data-objects exist. The choice of method usually depends on the characteristics of the data required to be stored, and the operations that are expected to be performed on the data. Each existing data-composition method has its strengths and weaknesses. No existing method is without its weaknesses. Further complications arise when data-objects are nested. Many of the advantages of a method may not be available when the data-object is nested.
The efficiency of a system depends on the methods used for composition of data objects, especially methods used for dynamic composition, including modification of existing data-objects.
The placement of a data element in relation to the parent data-object, or in relation to other data-elements in the parent data object, is of considerable importance. When data-objects are well-organized in a system, the system is able to perform at greater efficiency.
Some existing data-composition methods provide optimized placement of data-objects in the system, but provide rudimentary or no support for dynamic-composition or run-time modification of data-objects. Other methods provide efficient dynamic composition and run-time modification of data-objects, but fail to provide optimized placement of data-objects.
Dynamic composition and modification of data-objects is difficult because of many reasons. Some of the reasons are:                i. There is need to maintain variable-length data-elements in a data-object.        ii. There is need to maintain a variable-number of data-elements in a data-object.        iii. There is need to insert or remove data-elements in a data-object, including a nested data-object, at run-time.        iv. There is need to maintain a high level of efficiency of operations.        v. There is need to use the available storage-space judiciously.        vi. There is need to maintain the system free of errors.        vii. There is need to maintain order in the placement of the data after allocation of dynamic memory.        viii. Each method for allocation of dynamic memory has its disadvantages.        
Hence, there is a need for a new method for composition of data-objects.
A number of books describe methods for composition of data-objects. For example,                i. “The ‘C’ Programming Language” by Kernighan and Ritchie, describes methods for composition of data-objects in the C language.        ii. The book “Data Structures and Algorithms” by Aho, Hoperoft and Ullman, describes methods for composition of data-structures.        iii. The book “File Structures, An Object-Oriented Approach with C++” by Folk, Zoellick, and Riccardi, describes methods for composition of data-objects for storage in a disk or a file.        
Some commonly used methods for data-composition are discussed below:
1. Fixed, Equal-Sized Data Objects:
In accordance with a commonly used method of data-object composition, a number of fixed and equal-sized data-objects are stored at successive logical addresses in the storage-unit. This method is useful for maintaining a list of data-objects.
An example of this method is an ‘array’ in the C programming language. Another example is a data table stored in a file, comprising a number of fixed-length records. Each record can be directly accessed by its record-number. The relative position of a record is determined by multiplying the count of preceding records with the record-size. This method allows direct access of the data-objects. A yet another example of this method is a b-tree node in which a number of fixed and equal sized entries are stored.
As the lengths and relative-addresses of each element is known or easily determined, this method does not require the lengths or addresses of the data-elements to be tracked.
Insertion or removal of data in the data-object may be performed if the size of the storage-unit can be expanded, and the data-object is not bounded by another data-object in the storage-unit. Thus,                i. A number of additional entries may be inserted in a b-tree node, but limited by the amount of spare-space reserved in the b-tree node for the purpose.        ii. Additional records may be inserted in a data-table that is stored in its own file.        
This method suffers from a number of disadvantages. Some of these are:                i. Insertion of data, beyond any storage-space reserved for such purpose beforehand, is not possible. It is also not always possible or advantageous to provide an unbounded logical address-space to stole a data-object.        ii. This method is unsuitable if the data-objects are not of fixed and equal sizes.        iii. This method does not establish a relationship, (such as a parent-child relationship, or other relationship), between data-objects.        
Hence, this method is unsuitable for composition of data-objects, especially nested data-objects.
2. Delimited Data:
In accordance to another method, a number of variable-length data-objects are stored at successive logical addresses in the storage-unit. Each data-object is separated, or delimited, from its neighboring data-object by a ‘separator’ or ‘delimiter’.
For example,                i. A ‘line-feed’ character is used to separate one line from another line in a text file.        ii. A comma or other delimiter is used to delimit fields in a comma-separated file.        iii. A variable-length entry may be separated from other entries in a b-tree node by a delimiter.        iv. An xml document uses the ‘<’ and ‘/>’ tags as separators.        
Insertion of additional data may be performed if the data-object is stored in an unbounded address space, such as a file. Alternatively, insertion may be performed if spare storage-space is reserved within the data. When a character in the text file is inserted or removed, the succeeding lines in the file are moved to higher or lower logical addresses.
This method suffers from a number of disadvantages. Some of these are:                i. It is necessary to ensure that the ‘separator’ is not confused with the data objects.        ii. Insertion and removal of data in an unbounded data-object, beyond any storage-space reserved for such purpose beforehand, is not possible. In many cases, it is not possible or advantageous to provide unbounded address space to a data-object. Thus, insertion of entries in a b-tree node is limited by the size of the b-tree node.        iii. The relative addresses or lengths of the data-elements are not tracked, nor are they easily known. As such, this method does not provide direct access to a data object. For example, it is necessary to scan for line-break characters in a text file in order to access a predetermined line. Hence this method is not suitable ifdata is expected to be modified frequently, the amount of the data is large, or efficiency of the system is critical.        iv. The method does not establish a relationship between the data-objects, even though such a relationship may be described within the data. Thus, relationship of a data-object with another data-object (such as a parent-child relationship) is not readily inferred, and the entire data is required to be scanned before such relationship is determined.3. Data-Type Based Organization:        
In accordance with another method, the system maintains a ‘data-type’ or ‘data-template’ for a data-object. The data-type provides a storage-pattern for storing a number of data-elements in the data-object. The storage-pattern is used to determine the addresses of the data-elements.
This method is used in many programming languages. A struct in a ‘C’ language program is an example of this method. The method is also used in many file-structures. The storage-space allocated for a data-element may not be equal to the storage-space allocated for another data-element in the data-object. The system provides direct-access of a data-element. The address and the length of a data element are fixed, and as such there is no need to track the lengths or relative-addresses of the data-elements. This method allows composition of nested data-objects.
This method provides the ability to manage a limited amount of variable-length data. Typically, a moderately large amount of storage-space for storing a variable-length data-element is reserved. For example, a ‘Student’ data-type may comprise a 30-character-wide field for storing the name of a student. However, this has many disadvantages. Some of these disadvantages are:                i. The amount of storage-space is required to be estimated beforehand.        ii. It results in wastage of storage-space if the storage-space reserved is too large.        iii. It results in failure to store data if the amount of data required to be stored is larger than the reserved space.4. Tracking Lengths or Addresses:        
In accordance with another method, a number of variable-length data-elements in a data object are stored at a number of addresses in the data-object's storage-space.
Direct-access to the data-elements is provided by tracking the lengths of the data-elements. Alternatively, the addresses of the elements are tracked. As may be appreciated, these two methods are equivalent, and perform the same function—to provide access to a data-element without scanning the data. The addresses, offsets, lengths, cumulative lengths, data-offsets, or other relevant information may be stored in order to track the length or address of the data-elements.
For example, in a b-tree node, a number of variable-length entries may be stored. This may be done in a number of ways, some of which are:                i. A length indicator is stored at the beginning of each entry.        ii. Length-indicators of all the entries are stored in the node, followed by the entries.        iii. The offsets of the entries are stored in the node, followed by the entries.        iv. Cumulative-lengths of the entries are stored in the node, followed by the entries.        
A limited amount of data may be inserted or removed in a data-object if spare storage-space is maintained for the purpose. Thus, a number of additional data-elements may be appended, inserted, or removed, in an existing data-object. Also, the length of a data-element can be increased or decreased, from time to time. This method is used in some b-trees to maintain variable-number of entries in a b-tree node.
There are several disadvantages to this method. Some of the disadvantages are:                i. The amount of spare space required needs to be estimated beforehand.        ii. It results in wastage of storage-space if the spare space is too large.        iii. It results in failure to store data if the amount of data required to be stored is larger than the spare-space.5. Array-Based Organization of Data-Objects        
In accordance with another method, a number of data-objects are stored in a linear storage-space. A relationship is established between the data-objects, so that a data-object is related to a number of other data-objects.
For example, Aho, Hoperoft, and Ullman describe a method for composition of a tree data-structure in a linear array (see section ‘An Array Representation of Trees’ in the book ‘Data Structures and Algorithms’, Addison Wesley Publishing Company, 1983, page 84). The authors also describe methods to traverse the tree. Here, a relationship of a node in a tree with a number of child-nodes in the tree are represented.
In some cases, insertion and removal of data is provided by storing new data at the end of the file. Thus, for example, a ‘Student’ field in a record in a file may be replaced by a larger field by                i. allocating a new data-region at the end of the file for storing the new larger field,        ii. invalidating the existing field in the data-object in the file, and        iii. in its place, storing a pointer to the new data-region at the end of the file.        
However, insertion and removal of data using this method has a number of disadvantages:                i. The method creates a number of small data-regions each time a modification is made to the data, leading to inefficiency in accessing data.        ii. Locality of reference is not preserved as a result of the method, resulting in a large number of page-faults and cache-misses, and inefficiency of operations of the system.        iii. A number of unused, invalidated data-regions in the file are created, when a modification is made to the data.        iv. The data is not maintained in its most optimal manner. With each modification, more and more disorder is introduced in the data.        v. The data must be periodically re-written to a new file, in proper order, in order to regain efficiency of operation.6. Re-Creating the Data-Object        
In accordance with this method, a data-object may maintain variable-length data, by re-creating the data-object when a new data-element is inserted, removed or modified. For instance,                i. In the C language, an array may store a fixed number of elements. In order to insert an additional element, a new array is created by allocating new memory-space. A number of existing elements in the original array are then copied to the new array, along with the new element. The original array is then usually discarded, and the new array is used in its place.        ii. In order to insert a data-element in a file, the contents of the original file may be copied to a new file, along with the new data-element. The original file is then usually discarded, and the new file is used in its place.        
This method is inefficient as it involves accessing or copying large amounts of data.
As a consequence of the disadvantages listed above, each of the methods discussed above is of unsuitable for dynamic composition of data-objects, especially nested or large data-objects.
7. Heap-Based Storage of Data
In accordance to another method, variable-length data is stored using storage-space allocated from a heap. The data-object does not store the data itself. Instead, the data-object stores a pointer, reference, or other handle, to the data. For the purposes of data-composition, the pointer, reference or other handle is the child-element in the data-object.
This method thus provides the ability to store variable-length data in a data-object. It does not require that the length of the data to be known beforehand. This method is thus more flexible than the other methods discussed above.
This method is also used in many file-structures. The organization of a b-tree in a file is another example of this method. In a b-tree, a heap-like dynamic memory manager is used to track unused storage-regions to provide dynamically allocate storage-space.
However, this method also suffers from a number of disadvantages. Some of the disadvantages are:                i. This method requires management of the unused storage-regions in the heap.        ii. Management and tracking of data-elements required by this method is the cause of many failures.        iii. This method results in fragmentation of storage-space, and increased number of page faults and cache-misses, thereby slowing down the system.        iv. Locality of reference is not preserved, thereby causing inefficiencies.        v. This method results in creation of regions of unusable storage-space.        vi. It is usually not possible to store the data at their most optimal locations.        