Typical direct access storage subsystems (DASS) include a controller connected to one or more disk files. The disk files are also called head-disk-assemblies (HDAs). The controller may be a physically separate device or may be integrated with the HDAs in a single device. The HDAs contain the actual data storage hardware. The controller provides the external computer interface for the subsystem. DASS is used herein to refer to the combination of one or more HDAs with a controller whether they are in separate devices or integrated in a single device.
Each HDA contains one or more platters or disks on which data is recorded. The data is written in concentric circles on the disks which are called tracks. The data on the tracks must be organized according to a set of rules which are typically fixed in the design of the disk system. For example, the design of the disk system may require that the data be written in fixed length records or the design may allow variable length records to be written. Fixed record length designs, often referred to as fixed block architectures (FBA), typically subdivide tracks into sectors. One known technique for writing and reading variable length records is to use the count-key-data (CKD) format. As used hereinafter, `tracks` means tracks or sectors unless otherwise noted. The data on the tracks typically includes user data and system control data.
Because defects may occur in the disk surfaces it is conventional to reserve spare or alternate space on the disks which can be used to logically replace areas with defects. In a device that uses sectors, the additional space will be alternate sectors. Variable record length devices typically use entire tracks for alternates. To distinguish them from alternate tracks, the original tracks are called primary tracks. The design of the disk system must provide a way to establish a linkage between a primary track and an alternate track so that some types of read and write commands which reference the primary track will be executed upon the alternate track. One method of achieving this linkage is to reserve a portion of each track for control information which determines whether an alternate track has been established for that track and, if so, gives the address of the alternate track. There may also be separate control data kept on the disk file which identifies the tracks deemed to be defective. The design typically allows a subset of the available commands to ignore the linkage so that, for example, read and write tests may be performed on the primary track even after the linkage has been established.
Because the use of alternate tracks may have undesirable effects, techniques have been developed for adjusting for defects in a track without using an alternate track. In one scheme control information is written on the track ahead of the detect which allows the system to ignore or skip over the defect. This control information may be called skip-displacement information. Since skip-displacements cannot correct for an unlimited number of defects, it is customary to provide alternate track capability in addition to skip-displacement capability.
In a system which allows skip-displacement information or its equivalent to be used to adjust for defects, it is possible to perform tests on a suspect track to determine exactly where the defects are, then write skip-displacements codes on the disk to correct the problem. Testing the track for errors requires that data be written on the track which destroys the user data that may be on the track. Therefore, prior to testing the suspect track, the user data must be copied to a backup track. If the testing and writing or skip-displacement information successfully adjusts for all of the defects on the track, then the user data can be copied from the backup track back to the original track. If the defects cannot be corrected then the system must use its alternate track technique to replace the bad track. The process of testing the track for defects and writing skip-displacement type information to correct for defects is known as media maintenance (MM). Media Maintenance for fixed block architecture (FBA) devices is accomplished by marking blocks defective and either assigning an alternate block or `slipping` the contents of blocks down a cylinder to use a spare block whose normal location is at the end of a cylinder. Since the proper testing of the suspect track requires that a very large number of read and writes be performed, media maintenance may require several minutes for one track.
The conventional method of using an alternate location in count-key-data (CKD) DASS systems requires the ability to read and write control information at the beginning of the original track as well as on the alternate location. The beginning of the track is reserved for the `track header` which consists of Home Address (HA) and Record Zero (R0) count fields. The HA contains a flag which indicates whether an alternate track has been assigned and the R0 count field is used to record the pointer to all alternate track. For those cases where the part of the track affected by a media defect is in the header, customer data access is lost along with the ability to assign an alternate location for customer usage while media maintenance is being performed.
The MM process may be interrupted by power failures or other system faults which prevent completion of the MM operations. This may result in leaving the device in an unknown state with respect to the MM operations which were in process.
O. Akiba in a published Japanese patent application (JP 02-236747) has purportedly described a method for restarting a duplexed DASD after a power failure in such a way as to insure that the integrity of duplexing is maintained. When a power failure monitoring device detects a power failure, a CPU retrieves the control table of the duplexed DASD and takes out the equivalent information such as the block number of an updating processing under execution, etc., and duplicates and saves it on the nonvolatile memory. When the power source is recovered, the processing is restarted to complete the duplexing.
Y. Katsuki in a published Japanese patent application (JP 02-42519) has purportedly described a method for restarting a computer application after a power failure by storing the processing state on a disk. The IPL program reads the state data from disk and restarts the application.
U.S. Pat. No. 4,648,031 to E. H. Jenner describes a method for restarting a computer system after an interruption using two types of data structures. The first data structure contains the locations of interest in a recovery log for a "work unit". A second structure is maintained for each of the resource managers containing its operational state, and the relative addresses in the recovery log of the beginning and ending of its interest scope. Resource managers may be restarted by reestablishing the state of the managed collections during a current status phase, starting with the most recent check-pointed state advanced by subsequent records of changes in the log to the point of interruption.
In a published European patent application (EP-295424) D. J. Haderle, et al. describe a method for use in database systems which establishes a prior point of consistency including partial transaction rollback in a transaction-oriented system using write-ahead logging. In response to a failure the system determines the point in the log at which REDO processing is to start. Data are gathered, in an analyzer phase, by scanning the log from the last complete checkpoint to the log end to find data on dirty pages and to identify those transactions which were executing at the instant of failure. The activities are repeated for all transactions up to the failure point and log modifications are input on the pages of the log. In the UNDO phase all currently-executing transactions are; rolled back.
Thus, there is a need not solved by the prior art for MM techniques which guarantee completion despite system faults.