This invention relates to a method and apparatus for the path independent reservation and reconnection of devices by CPU's operating in a multiple-CPU, dynamic path allocation, and shared device access system environment. More particularly, this invention relates to a system for reconnecting a device to a CPU after the device has been disconnected from the CPU and completed an operation.
A disk data storage subsystem attached to a large mainframe central processing unit (CPU) of the type manufactured by the IBM Corporation, and other manufacturers, consists of one or more directors, one or more control modules, and one or more disk storage devices. The CPU is attached to one or more of the directors through a system of electronics and cables called a channel. The directors are attached to control modules through a control module interface and the control modules are attached to the devices through a device interface. For the CPU to store or retrieve data on a disk data storage device, it first sends commands to the director and the director sends commands to the control module which actually controls the disk data storage device. Data being written passes from the CPU, over the channel, through the director, through the control module and is written onto the device. Data being read comes from the device, through the control module, through the director, through the channel and into the CPU. The combination of a particular channel, director and control module is called a path.
To improve performance of a disk subsystem, a function known as dynamic pathing, described in Luiz et al., U.S. Pat. No. 4,207,609, is used to allow a CPU to initiate an operation on a given device through one set of a channel, director and control module (a path), and complete the operation at a later time through a second set of a channel, director and control module (a second path). As pointed out by Luiz et al., mechanical operations on a disk data storage device, such as moving the head arm from one location on the disk device to another, do not require that the device remain connected to the CPU. The CPU initiates a seek command to cause the disk head arm to move from one cylinder to another and the CPU disconnects from the device while the mechanical arm movement is in process. After the completion of the seek, the disk must be reconnected to the CPU that started the seek in order to notify the CPU that the seek has been completed so that the CPU can perform a read or write operation at the new location on the disk.
In a dynamic pathing system, the CPU can initiate such a seek command, through a first path including a particular channel, director and control module combination, and when the device has completed the seek, it can be reconnected to the CPU through a second path including a different combination of channel, director and control module. In systems prior to dynamic pathing, the disk would have to be reconnected to the CPU through the exact same path, therefore, if the particular path was busy at the time the device finished its seek operation, the device had to wait util the path was free, thus wasting time. With dynamic pathing, should the path be busy, the device can reconnect through a second path, thus improving the overall performance of the system.
If a device is unable to reconnect to the CPU to complete a seek command, a performance degradation occurs until the reconnection can be made. However, once the reconnection is made, no further degradation occurs. Other commands have a time window within which to reconnect, and if the time window is missed, additional delay occurs. One such command is rotational position sense (RPS). The RPS command tells the disk subsystem to disconnect from the CPU and search for a particular location on a data track which is rotating under the read/write head of a disk device. When the disk device rotates to the proper location, it reconnects to the CPU so that data can be read or written from the location. If a reconnection cannot be performed within a certain time window, the read/write location moves past the read/write head and is lost until it makes another revolution. The time required for an extra revolution is a very significant amount of time in a large mainframe computer system, so any improvement in the ability to reconnect is very important.
In the prior art dynamic pathing systems, after the CPU has initiated an operation such as a seek or RPS on the disk device, the director enters a polling loop wherein it communicates, one at a time, with each control module attached to the director to determine if that control module has any device that has completed a disconnected operation. When one of the control modules responds to the director that one of the devices attached to the control module has completed a disconnected operation, the director then polls the control module to determine the particular device address of the device needing to be reconnected to the CPU. After obtaining the device address, the director uses an internal table to determine which of the channels attached to the director can be used to reconnect the device to the CPU. The director then requests service from the channel, and after receiving an acknowledgment of the service request, the director sends the device address to the CPU, thus completing the connection between the disk storage device and the CPU. In present systems, the time required to complete the polling sequence is a limiting factor into the total number of disk storage devices and controllers which can be connected to a particular director since, if too many devices are attached, the polling loop will be long enough to miss the time window of commands such as RPS. That is, since each control module is polled separately, as more control modules are added to the system, more time is required to complete the polling loop, and should the polling loop time exceed the time window of a given command, the command to that device would fail. In order to prevent failures due to the polling time, the total number of control units on a given subsystem must be limited.
There is need in the art then for a polling system which reduces the time necessary to poll all the control modules from a given director to improve the percentage of reconnects made on the first attempt and also to allow more devices to be attached to each director.