Typical disk data storage and retrieval systems (disk systems) include a controller connected to one or more disk files. The controller may be a physically separate device or it may be integrated with the disk file in a single device. The disk files contain the actual data storage hardware. The controller provides the external interface for the disk system. The term disk system is used herein to refer to the combination of one or more disk files with a controller whether they are in separate devices or integrated in a single device. In normal usage the controller is connected to one or more computers. The invention described herein can be used where the controller is connected to multiple computers, but except where otherwise noted the following material assumes that the controller is connected to a single computer. A computer may be connected to many controllers. Typically the computer is operating under the control of a multiprogramming operating system which allows multiple programs (or tasks) to execute in an interleaved manner so that the disk system is, in effect, shared by multiple programs.
To operate the disk system a program running on the the computer sends commands to the controller for execution. The commands may reference or include data, as when a write command is sent which directs the controller to write specified data onto a disk file. The connection between the controller and the computer allows for two-way communication so that the controller can also send status information and data to the computer. The status information returned by the controller includes success or failure information for the commands. An example of a command that might fail is a read command. Typical disk systems have means for detecting errors in read and write actions, as well as, various other functions. It is also possible for the controller's resources to be committed so that it cannot execute a command from the computer. In this case the controller might send status information to the computer indicating that it is busy. Thus, even though the term `command` is used in the ad, the controller is an intelligent device which can reject or refuse to execute the commands as circumstances dictate. A typical controller includes a microprocessor and some form of memory means into which a software program, typically called microcode, is loaded. The actions taken by the controller in response to commands usually involve the microprocessor running or executing some pad of the microcode.
The computer to which the controller is adapted to be connected typically includes a component or subsystem called a channel subsystem which interfaces to the controller. In a channel environment the commands sent to the controller may be called channel commands or channel command words. Channel commands are typically communicated to the controller in a list of connected commands which are called a channel command program or command chain. The channel commands in the channel program are said to be chained together. Chaining commands together results in substantially increased efficiency when compared with execution of individual commands. The usual channel architecture requires that the controller execute the commands in the order in which the commands are placed in the chain and that the controller stop executing the chain if a command fails. Thus, if a seek command is chained to a subsequent write command, failure of the seek command will cause the controller to stop execution of the chain and the write command will not be executed. This type of channel architecture, therefore, implements conditional command execution for all but the first command in a channel command chain. The second and following commands are executed only if all of the previous commands in the chain completed successfully. Typically a command chain has exclusive access to the disk system only while the chain is executing. A computer program seeking to access the disk system may have to send several command chains to achieve a desired result. The command chains from one program may be interleaved with chains from other programs by the computer's operating system.
Each disk file contains one or more platters or disks on which data is recorded. The data is written in concentric circles on the disks which are called tracks. The data on the tracks must be organized according to a set of rules which are typically fixed in the design of the disk system. For example, the design of the disk system may require that the data be written in fixed length records or the design may allow variable length records to be written. Fixed record length designs, often referred to as fixed block architectures (FBA), typically subdivide tracks into sectors. One known technique for writing and reading variable length records is to use the count-key-data (CKD) format. As used hereinafter, `tracks` means tracks or sectors unless otherwise noted. The data on the tracks typically includes user data and system control data. A related collection of user data written on one or more tracks is called a data set.
In order to provide for a tool for management of the resources of the disk system in the multiprogramming environment some disk systems are designed to accept commands which establish an authorization code for the sending program to execute the command sequence which follows. This authorization code is typically a string of bits which are encoded to define which commands are being authorized. In the ad this authorization code is sometimes called the file mask. The file mask can be set in more than one way, but one way is to send an express command to set the file mask followed by the file mask value. There can also be a default or minimum file mask value that is assumed without the need to expressly set a file mask. The controller checks each command against the current file mask to determine whether to execute the command. If the file mask does not authorize the command, the controller will refuse to execute it. For example, if a write command requires file mask authorization in the particular disk system implementation and if the file mask does not authorize the write command, the command will be refused execution. In a system which uses channel command programs a file mask may be set once in each program by chaining a set file mask command to the other commands. The meaning of each value of the file mask and the possible value are determined by the protocol defined in the disk system implementation. Typically the implementation defines values of the file mask as authorizing system control, system supervision, and/or diagnostic commands. Diagnostic commands are a subset of commands which are used to diagnose and correct disk system problems.
Because defects may occur in the disk surfaces it is conventional to reserve spare or alternate space on the disks which can be used to logically replace areas with defects. In a device that uses sectors, the additional space will be alternate sectors. Variable record length devices typically use entire tracks for alternates. To distinguish them from alternate tracks, the original tracks are called primary tracks. The design of the disk system must provide a way to establish a linkage between a primary track and an alternate track so that some types of read and write commands which reference the primary track will be executed upon the alternate track. One method of achieving this linkage is to reserve a portion of each track for control information which determines whether an alternate track has been established for that track and, if so, gives the address of the alternate track. There may also be separate control data kept on the disk file which identifies the tracks deemed to be defective. The design typically allows a subset of the available commands to ignore the linkage so that, for example, read and write tests may be performed on the primary track even after the linkage has been established.
Because the use of alternate tracks may have undesirable effects, techniques have been developed for adjusting for defects in a track without using an alternate track. In one scheme control information is written on the track ahead of the defect which allows the system to ignore or skip over the defect. This control information may be called skip-displacement information. Since skip-displacements cannot correct for an unlimited number of defects, it is customary to provide alternate track capability in addition to skip-displacement capability.
Typical computer operating systems provide for logging of disk errors. By examining the error log, tracks that have potential defects can be identified. In a system which allows skip-displacement information or its equivalent to be used to adjust for defects, it is possible to perform tests on the suspect track to determine exactly where the defects are, then use skip-displacements to correct the problem. Testing the track for errors requires that data be written on the track which destroys the user data that may be on the track. Therefore, prior to testing the suspect track, the user data must be copied to a backup track. If the testing and writing of skip-displacement information successfully adjusts for all of the defects on the track, then the user data can be copied from the backup track back to the original track. If the defects cannot be corrected then the system must use its alternate track technique to replace the bad track. The process of testing the track for defects and writing skip-displacement information to correct for defects is known as media maintenance. Since the proper testing of the suspect track requires that a very large number of read and writes be performed, media maintenance may require several minutes for one track. Existing methods of performing media maintenance copy the user data on the suspect track to a special track on the disk while the suspect track is being tested. This special track is reserved for use by system programs and is not accessible by general user programs. This causes the user data on the suspect track to be unavailable for general use during the entire time that tests are being performed on the suspect track.
The operation of a disk system in a typical multiprogramming environment frequently involves the execution of a sequence of disk access commands which are critical in nature. A critical sequence is one which cannot be safely interleaved with unrestricted disk access commands that are unrelated. The media maintenance process is one example of a process which involves the execution of a critical sequence of commands. When a program must execute a critical sequence of commands, actions must be taken to limit access by other programs to the disk data that is being modified until the critical commands have completed successfully. In the case where a controller is connected to multiple computers a special command is used which causes the controller to reserve the designated disk file for the exclusive use of the requesting computer. This prevents other computers which are sharing the disk file from accessing the disk file until the release command is issued, but does not limit access by other programs running on the same computer. In order to limit disk access by other programs on the same computer additional access limitation schemes must be used. Software locks and enqueueing are well known in the ad for this purpose, but have the inherent weakness that programs may bypass them and defeat the limitation. A new software protection scheme can not be used reliably where there is a requirement that previously existing application programs be allowed to run without modification.
Typically the media maintenance program (MMP) is only one of many programs executing concurrently on the computer. The MMP must take into account that other programs may try to access the disk file containing the suspect track so that disk access commands sent by the MMP may be interleaved with disk access commands from other programs. In the prior ad the MMP is required to use whatever software tools are provided by the operating system to limit access to the data on the disk file for the entire time that the media maintenance process is executing. This is true because the process involves various critical command sequences which can potentially leave the data on the suspect track or the alternate track in a corrupted or unusable state if one of more of the commands in the critical sequence fails to execute properly. For example, writing the alternate track linkage information on the suspect track is a critical step, since failure to write the correct information could cause the original data on the track to become inaccessible without correctly setting the alternate track address. If this step failed and another program attempted to read the suspect track immediately thereafter, the data read could be incorrect or the read command might fail. In a typical MMP in the prior ad, the MMP attempts to limit access during the media maintenance process by reserving the disk file for the exclusive use of the computer on which the MMP is running, allowing no new data set allocations and no access to the data set using the suspect track. Since the MMP can only use the tools provided by the operating system and the disk systems, the MMP has no way of guaranteeing that these limitations will be enforced. The fact that there are significant limitations on the use of the disk file during media maintenance and that media maintenance requires a relatively long time to perform are significant impediments to performing media maintenance using the prior ad.
There are many other disk system processes which also contain critical command sequences. For example, when new models of disk systems are produced it is often desirable to provide an additional mode of operation that is compatible with an older disk system so that a user will have the option of using the new disk system in the same way as the older system. The new disk system is said to emulate the older system. Making the switch into or out of the emulation mode of operation typically requires the execution of a critical sequence of commands which alter the control data on the tracks on a disk file. If this sequence fails all or part of the tracks on the disk file may be unusable by a concurrently executing program which tries to send commands to access that disk file immediately thereafter. Another example of a disk system process that involves critical command sequences is the reformatting of tracks. Track formats fix the total number of bytes which can be written on the track. Changing from a format which allows a certain number of bytes maximum to one which allows a different number of bytes will typically involve critical command sequences.
The prior art of disk system design and operation does not provide any completely satisfactory way to protect the disk system from access during critical command sequences, such those found in a media maintenance program.