The invention relates to a method of reorganizing a data entry database. More particularly, the invention relates to a method of reorganizing selective units-of-work in a data entry database.
1.1 IMS
IMS is one of the oldest and most widely used database systems. It runs under the MVS operating system on large IBM 370 and 370-like machines. IMS is based on the hierarchical data model (discussed below). Queries on the IMS databases are issued through embedded calls in a host language. The embedded calls are part of the IMS database language DL/I.
Because performance is critically important in large databases, IMS allows the database designer a large number of options in the data definition language. The database designer defines a physical hierarchy as the database scheme. Several subschemes may be defined by constructing a logical hierarchy from the record types comprising the scheme. There are a variety of options available in the data definition language (block sizes, special pointer fields, etc.) that allow the database administrator to "tune" the system for improved performance.
1.2 Hierarchical Databases
A hierarchical database consists of a collection of records that are connected to each other with links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records. For example, consider the database representing a customer-account relationship in a banking system that is shown in FIG. 1. There are two record types: customer and account. The customer record consists of three fields: name, street, and city. Similarly, the account record consists of two fields: number and balance.
The set of all customers and account records is organized in the form of a rooted tree where the root of the tree is a dummy node. A hierarchical database is a collection of such rooted trees.
1.3 Data Entry Database
One well known IMS hierarchical database is the data entry database (DEDB). As shown in FIG. 2, a DEDB is a collection of a number of database records stored in a set of partitions called Areas. An Area contains a range of DEDB records. As shown in FIG. 3, an Area is divided into three parts: a root addressable part, an independent overflow part, and a sequential dependent part.
1.3.1 Root Addressable Part of an Area
As shown in FIG. 3, the root addressable part of an Area contains units-of-work (UOWs). A UOW consists of a user-specified number of physically contiguous control intervals. A control interval is the unit of transfer between a disk drive storing the DEDB and a computer. When a DEDB is created, the database administrator sets the size of the control intervals for the DEDB. For example, a 4k byte control interval may store up to 3976 bytes of data. (The remaining 120 bytes in the 4k byte control interval define various parameters of the control interval.) Empty data storage elements within a control interval are known as free space elements. The minimum length of a free space element is 4 bytes. Thus, in certain circumstances, storage locations in a control interval are not large enough for a free space element. These storage locations will not be utilized to store data. Such unutilizable storage locations are known in the art as scrap.
A UOW is divided into a base section and an overflow section. The base section contains control intervals that are used for the storage of data. The overflow section of a UOW is used to store data after the base section control intervals of the UOW are fall, ie., unable to satisfy a request for space.
1.3.2 Independent Overflow Part of an Area
As shown in FIG. 3, the independent overflow part of an Area also contains control intervals. These control intervals may be used to extend a particular UOW. Thus, the independent overflow control intervals are logical extensions of the overflow section of a particular UOW. However, once a control interval has been used to extend the overflow section of a particular UOW, only data associated with that UOW may be stored therein. Thus, an independent overflow control interval that is allocated to a particular UOW may be considered to be "owned" by that UOW.
The first control interval in the independent overflow data part contains a space map. This space map indicates which UOW owns the first 120 control intervals in the independent overflow part. There is another space map for every 120 independent overflow control intervals., ie., the 1st, 121st, 241st, etc. control interval in the independent overflow part is a space map control interval.
1.3.3 Sequential Dependent Part of an Area
The sequential dependent part of an Area contains space for storing data in a time-ordered sequence without regard to the UOW containing the root segment. The sequential dependent part is used as a circular buffer for data storage.
1.4 Data Storage in a DEDB
When data is stored in a DEDB, the data is associated with a particular UOW. Initially, the UOW's basic section control intervals will be empty. Thus, the UOW will contain base section control intervals that may be used to store the data. However, as more data is associated with a particular UOW, the base section control intervals will become full.
If additional data is to be associated with a UOW that contains full base section control intervals, then the first control interval within the overflow section of the associated UOW is utilized to store the data. If the first control interval is also full, then the second control interval within the overflow section will be utilized to store the data. Additional data may be similarly associated with the UOW until all control intervals within the overflow section are full.
If additional data is to be associated with a UOW and no space can be found in a UOW's overflow section, then a space map control interval in the independent overflow part of the Area will be allocated to the UOW. This allocation provides the UOW with 119 additional control intervals for data storage. After these additional control intervals are full, another space map control interval will be allocated to the UOW. This sequence continues until no unallocated space map control intervals are available. When this occurs, an error is generated.
1.5 Reorganization of a DEDB
As data is added, updated, and deleted, a DEDB becomes physically disorganized, decreasing operating efficiency. More I/O operations are needed to retrieve data stored in the DEDB. When this occurs, DEDB response time slows. Such a physically disorganized DEDB is known as a fragmented DEDB.
However, by grouping the data associated with each UOW, the data can be accessed more quickly. Thus, the performance of the DEDB is increased. In addition, because related data is grouped together, it is possible to reclaim formally unusable space on a disk drive.
1.6 Conventional Methods of Reorganizing a DEDB
Conventional methods of reorganizing a DEDB reorganize the root addressable and the independent overflow parts of an Area. The sequential dependent part of an Area is not affected. Conventional reorganization of a DEDB reorganizes one UOW at a time.
1.6.1 Conventional On-line-UOW Reorganization Method
One conventional UOW reorganization method progressively copies control intervals that are associated with a particular UOW to a "reorganization UOW." The control intervals typically include basic section control intervals, overflow section control intervals, and independent overflow control intervals. After all control intervals that are associated with a UOW are copied into the reorganization UOW, the reorganization UOW is copied over the original UOW. Then, independent overflow control intervals that are no longer needed by the original UOW are released. Thus, the released control intervals may be allocated to other UOWs. This method of reorganizing a UOW is known as an on-line-UOW reorganization method.
The above described method may be repeated for other UOWs. An example of such a conventional DEDB reorganization method is discussed in Guide to IMS/VS V1 R3 Data Entry Data Base (DEDB) Facility, IBM International Systems Center, p. 48, (May 14, 1984) (IBM Document Number GG24-1633-0).
1.6.2 Conventional Off-line-UOW Reorganization Method
One conventional off-line-UOW reorganization method progressively copies control intervals that are associated with UOWs to a sequential file, such as a tape. This procedure is known as unloading a UOW. Next, data contained in the sequential file is loaded back onto a randomly accessible disk drive. Such a method requires very high I/O activity and is very time consuming. Typically, all UOWs in a DEDB are unloaded and then loaded.
1.7 Deficiencies in the Prior Art
As the size and complexity of a DEDB increases, reorganization processing time increases. However, typically the task of reorganization of a DEDB is performed during off-peak hours by executing a batch job. Because of the shrining time window for ring such batch jobs due to the need to provide near continuous DEDB access, there is a need to perform DEDB reorganization as quickly as possible. Conventional DEDB reorganization methods are neither rapid nor efficient. Thus, there is a need for a method that rapidly and efficiently reorganizes a DEDB.