1. Field of the Invention
The present invention relates to a method and apparatus for efficiently updating data stored on fixed block media, and more particularly to creating data structures to describe the format of emulated CKD data tracks, which are loaded into electronic memory to enable fast writing of data and locating of data without loading an entire track of data into memory.
2. Description of the Related Art
Disk storage systems such as those used with mainframe computer systems, also called host systems, such as the IBM 390, often utilize many disk drives. The present invention is useful for updating data stored on such disk drives, such as the IBM 3380 and 3390 disk drives. These type disk devices, commonly know as Direct Access Storage Devices (DASD) must communicate with host programs that are based on the IBM 360/370 architecture, which has been in place since the 1960's and early 1970's. This means that the older data formats for DASD must be emulated in order for the host to recognize the storage devices. Typically, this means that newer and more efficient, higher-capacity drives, such as those using the Small Computer System Interface (SCSI) and which work with fixed block architecture (FBA), must be provided with emulation software in a device controller to communicate to a host, such as an IBM 390, without inducing error states.
DASD requires certain Input/Output functions in order to serve its function as permanent data storage for the host. An I/O channel typically connects the DASD and the host processor. The host processor operating system initiates data transfer with a command to the I/O channel. This is done by a series of Channel Command Words (CCW's) which are forwarded to a DASD controller. The controller interprets the CCW's and commands the DASD to execute the commands. For example a "SEEK" command positions a DASD access mechanism, "SEARCH" commands cause comparison between data sought by the host and data physically stored on the device, a "WRITE" command transfers data from the host to the DASD, and a "READ" command copies data from DASD to the host where it is checked for validity.
DASD devices typically store data on a track, which is a circular path on the surface of a disk on which information is recorded and from which recorded information is read. Typically these disk drives implement a COUNT, KEY, and DATA (CKD) format on the disk drives. For a detailed explanation of CKD architecture, see for example, Marilyn Boyl, Introduction to IBM Direct Access Storage Devices, Science Research Associates Inc., 1981. The format contains a definition of how data is structured in the records contained on the track. A record is a set of one or more related data items grouped together for processing, such that the group may be treated as a unit. Disk drives utilizing the CKD format have a special "address mark" on each track that signifies the beginning of a record on the track. After the address mark is a three-part record beginning with the COUNT field which serves as the record ID and also indicates the lengths of the optional KEY field and the DATA field, both of which follow. Also on the track, there is :normally one Home Address (HA) that defines the physical location of the track and the condition of the track. The HA typically contains the physical track address, a track condition flag, a cylinder number (CC) and a head number (HH). The combination of the cylinder number and the head number indicates the track address, commonly expressed in the form CCHH. The HA contains the "physical track address" which is distinguished from a "logical track address". Some operating systems, such as the IBM Virtual Machine (VM) operating system, employ a concept of "virtual disks" referred to as user mini-disks, and thus it is necessary to employ logical addresses for the cylinders rather than physical addresses. The first record following the HA is commonly a track descriptor record, sometimes referred to as Record 0, or R0. One or more user records follow R0 on the track. The R0 record contains no key field, but may contain either system or user data. The first part of each user record is an "address marker" that enables the controller to locate the beginning of the record when reading data.
Typically, the I/O channel is used by the host to communicate to the controller, the device and track of interest. The channel may also specify the rotational position on the track from which to begin searching for the record having the data field to be read or written. The I/O channel is typically placed in a wait state until the mechanical action of locating the specified track rotational position is complete. The searching may be accomplished in the IBM mainframe environment by specifying a SEARCH parameter. The parameter is typically a five byte field containing two bytes designating a cylinder number (CC), two bytes designating a head number (HH), and one byte designating the record number (R). Using such a DASD device, a physical search of each record on the track designated by the CCHH address is required to locate the record of interest. The SEARCH command is repeatedly issued until the record is located, thus tying up the I/O channel. One skilled in the art can readily appreciate the inherent disadvantage of having the I/O channel unavailable for other tasks while the search and locate process takes place. Thus, the prior art is replete with techniques for reducing such wait states. For example, U.S. Pat. No. 4,603,380 to Easton et al. discloses a method for reducing the volume of database transfers between DASD and cache.
Some DASD controllers known in the art, such as the IBM 3990 DASD controller, have some amount of fast electronic cache memory for storing records that have been written by the host system but not yet: written to the DASD medium by the DASD controller. Such cache is typically of the Non-Volatile Store type, making it less susceptible to data loss due to an interruption of power. Such DASD controllers are capable of performing a "fast write" operation. A "fast write" operation allows the host to write data to the cache and disconnect it from the controller before data is written to disk. In this way the I/O channel is free for other host activities with other devices, including other DASD. In U.S. Pat. No. 4,875,155, to Iskiyan et al., a cache and a DASD buffer store: for use with CKD records staged into electronic memory is disclosed to enable a "fast write" capability. Unfortunately, this patent teaches staging the entire data record from DASD into cache, which takes up considerable space in expensive high speed memory.
Since CKD records may have variable data lengths, the format of a particular track must be known before a "fast write" may occur. In particular, the key and data length information is contained in the COUNT field of the CKD format. Additionally, the number of records on a particular track must also be known before a "fast write" occurs. This is because a track may contain only a maximum number of records. An attempt to "fast write" a record to a track which has no space available must be flagged as an error. Further, an additional function of the format is to determine the validity of each write operation. The control unit uses the format information to signal error conditions such as "no record found", and "invalid track format". In order to accomplish this result, prior art control units required the entire track of data to be resident in cache. This meant the entire data track had to be loaded or staged in cache memory. Only in this way, was it possible to determine the format of the track. Unfortunately, the drawback of this method, as in the ` 155 patent, is that valuable cache memory space is taken up by the entire data track, and the associate mechanical lag time of rotating the disk and moving the read/write heads forces the I/O channel into a wait state.
Fixed block architecture (FBA) devices while having similar physical characteristics to CKD formatted devices, store and address the data, differently. In fixed block devices, the data is typically stored in blocks having equal lengths. Regarding terminology, these blocks are referred to as sectors; however, in the DASD convention, the term "sector" typically refers to a pie-shaped section of the physical disk medium. Thus, the definition of sector will depend on the context of its use. The data is addressed according to the address of the physical sector in which it is stored in an FBA device. In this way, the data may be addressed without the need for a physical search of the device to locate the record. Because of the fixed length of data sectors and the addressing scheme, FBA devices have much higher capacity and better performance than CKD formatted devices. Thus, such FBA devices have gained significant popularity in the computer industry. Unfortunately, as mentioned in the beginning of this background description, the host operating systems and their application programs that are descendants of IBM 360/370 architecture are programmed to expect to locate a data record for reading and writing by physically searching a disk medium according to CKD format. Accordingly, there have been developed techniques for emulating CKD format on FBA devices that are well known in the art.
U.S. Pat. No. 5,206,939 to Yanai et al. discloses a technique for converting CKD formatted records for use with FBA disk drives. This patent discloses a compression technique for representing the COUNT field of every record of CKD formatted data. The disclosed technique requires storing the COUNT field of the first record of a track, and then depending on the relationship of the next COUNT field to the first, storing one or more codes to represent that which has changed from each record to the next succeeding record. The `939 patent discloses allocating 128 bytes for each track to accomplish the index locating of stored information. Unfortunately, this can amount to a rather significant amount of storage in high speed electronic memory, when the number of tracks is accounted for and the number of devices for which this track related information is stored is also accounted for. For example, in a IBM 3390 model 3 (3390-3) which has multiple disk platters arranged in a stacked fashion for housing a plurality of disk mediums, which may each have two surfaces accessed by a plurality of heads mounted on respective actuator arms, there are approximately 50,000 tracks. A typical environment of multiple IBM 3390 disk drives attached to one DASD controller, such as an IBM 3990 controller, has a maximum 64 devices. Thus, this yields 3.2 million tracks which must be accounted for, and at 128 bytes per track this comes to 409.6 million bytes (409.6 megabytes) of information which must be stored in expensive high speed electronic cache memory. Additionally, the compression technique of the `939 patent only works if the track related information does not exceed the maximum storage allotment of about 128 bytes. If this condition is not met, then the entire track must be loaded from disk into cache. Thus, if the disk had been formatted, for example, by VM with minidisks, then the changes in each COUNT field due to the use of logical addressing schemes would easily exceed the maximum allotment. In that case, the entire track would have to be loaded from disk before a "fast write" operation could occur.
Another patent that discloses methods useful for performing "fast writes" to an FBA device storing CKD formatted data is U.S. Pat. No. 5,283,884 to Menon et al., and assigned to the assignee of the present invention. This patent discloses a technique that is useful when the above described pattern (key length equal to zero, equal data lengths, a record number starting at R0 and incrementing by one) and additionally is capable of discerning other patterns. Unfortunately, this patent discloses storing a table entry in non-volatile memory which has an entry for each record. Each entry includes a compressed count field, including two bytes for the data length, two bytes for the CC number, and two bytes for the HH number. The patent discloses a technique for reducing the storage requirement by up to three bytes by storing a compacted version of the CCHH number by taking into account the number of tracks on a cylinder; however, this still requires a three byte entry per record on the track. In the worst case for an IBM 3390, there may be 86 records per track, thus the table for such a track would require 258 bytes of memory. Applying the same calculation from above for the IBM 3390, this would amount to (50,000 tracks per device).times.(64 devices).times.(258 bytes memory per track) or approximately 825.6 million bytes of memory to keep up with all the devices. Clearly, there is a long felt need for a method and/or apparatus to provide for a less costly way to keep up with CKD data records stored on FBA devices and which does not always require the also costly procedure of loading the respective records from disk to cache memory.
The performance of cache memory is characterized by "hit/miss" ratios. The terms "hit" and "miss" have different meanings according to their use in context of a read or write operation. A "hit" means that a read reference to the cache generated by a requesting CPU executable process locates the data item desired in cache, rather than in lower speed disk memory. A read operation "miss" is registered if the data is unavailable in cache memory. A "hit" with respect to a write operation is made when the CPU executable process through the cache manager finds a counterpart location in a partially full buffer to overwrite. A write operation "miss" occurs when an item must be destaged to make room for a write reference or if a track has to be staged to perform the write operation. Thus, it can be seen that the hit/miss ratio for write operations can be improved by providing track format information without having to load an entire data track into memory. Additionally, ratios for read operations may be improved if the track format information also allows for a record to be located without the need to physically search the disk.
International Business Machines Corporation (IBM), in its publication GC26-4519-0 available from IBM Publications as of January, 1990, describes extended COUNT KEY DATA (ECKD) commands. Such data-access commands are usable with the preferred embodiment of this invention, such as the command LOCATE RECORD. This IBM publication is cited as background information to fill out the discussion of the related art and to enable the practice of the invention. This IBM publication does not describe the machine-executed operation of the present invention which solves the above-mentioned problems related to writing or updating tracks of CKD data stored on fixed block architecture devices, without the requirement of having to load the entire data track into cache memory.