1. Field of the Invention
The present invention relates to a method, system, and program for managing resources of a remote processor.
2. Description of the Related Art
Computing systems often include one or more host computers (“hosts”) for processing data and running application programs, direct access storage devices (DASDs) for storing data, and a storage controller for controlling the transfer of data between the hosts and the DASD. Storage controllers, also referred to as control units or storage directors, manage access to a storage space often comprised of numerous hard disk drives connected in a loop architecture, otherwise referred to as a Direct Access Storage Device (DASD). Hosts may communicate Input/Output (I/O) requests to the storage space through the storage controller.
To maintain availability in the event of a failure, many storage controllers known in the prior art provide redundant hardware clusters. Each hardware cluster comprises a processor complex, cache, non-volatile storage (NVS), such as a battery backed-up Random Access Memory (RAM), and separate power supply to provide connection paths to the attached storage. The NVS in one cluster backs up write data from the cache in the other cluster so that if one cluster fails, the write data in the cache of the failed cluster is stored in the NVS of the surviving cluster. After one cluster fails, all Input/Output (I/O) requests would be directed toward the surviving cluster. When both clusters are available, each cluster may be assigned to handle I/O requests for specific logical storage devices configured within the physical storage devices.
In performing these and other tasks, a cluster can not only execute operations locally using the capabilities of the local cluster itself, but can also make a request to have an operation executed on a remote cluster in the storage controller system. Since the capabilities of the remote cluster are typically limited, it is often desirable that the local cluster refrain from requesting too many remote operations which could result in the capabilities of the remote cluster being exceeded.
Various techniques have been proposed for limiting or “throttling” the requesting of remote operations on a remote cluster. One such technique allows only a single remote operation to proceed on the remote cluster. Once the remote cluster responds that the remote operation is complete, the local cluster is permitted to request another remote operation. As a consequence, the remote cluster handles a single remote operation at a time.
FIG. 1 shows an example of a multiple cluster system 10 comprising a first cluster 12 communicating with a second cluster 14 over a bus 16. In this example, an application program 18 operating under an operating system 20 of the first or local cluster 12 instructs a mail manager 22 to send a remote operation request to the second or remote cluster 14. The mail manager 22 folds the remote operation request into a mail message and stores the mail message containing the remote operation request in a memory area 24 of the remote cluster 14. The memory area 24 referred to in FIG. 1 as a “incoming mail queue” functions as a queue of mail messages, some of which include remote operations waiting to be executed.
Each remote operation is executed on the remote cluster 14. The remote cluster 14 has a mail manager 26 which examines the operation code of the remote operation in each mail message stored in the queue 24 in the order in which they are stored in the queue 24. Using the operation code, the mail manager 26 invokes the remote operation. Once the remote operation is initiated, the mail message entry is removed from the queue 24 and a mail message is sent back to the local cluster 12 indicating that an additional remote operation may be sent to the remote cluster 14.
The mail manager 22 of the local cluster 12 is kept apprised by the mail manager 26 of the remote cluster 14, of how many mail message entries remain in the queue 24 of the remote cluster 14. In one prior system, the mail manager has a counter 37 which keeps a count of the permissible number of mail messages which may be sent to the other cluster 14 and stored as entries in the incoming mail queue 24 before mail messages are removed from the queue 24. Thus, each count of the counter 37 may be thought of as a “credit” permitting the sending of a mail message to the other cluster 14.
The maximum count or credits of the counter 37 is equal to the total capacity or total number of entries of the incoming mail queue 24. As mail messages are sent to the other cluster 14, the credits of the counter 37 are decremented by the mail manager 22. The mail messages may include remote operation requests. As these and other mail messages are processed and removed from the incoming mail queue 24, the mail manager 26 so informs the mail manager 22 and the credits of the counter 37 are incremented. Once the capacity of the queue 24 is reached as indicated by the counter 37 indicating that all available credits have been used up, the mail manager 22 of the local cluster withholds sending new mail messages to the remote cluster queue 24 until additional credits are applied to the counter 37, indicating that slots have become available in the queue 24. In the meantime, the mail manager 22 stores remote operation requests and other mail in an outgoing mail queue 28 until the remote operation requests can be sent as mail messages to the remote cluster 14.
The second cluster 14 similarly has one or more application programs 38 operating under an operating system 40, which instructs the mail manager 26 to send a remote operation request to the first cluster 14 in the form of a mail message. The mail manager 26 stores the mail message containing the remote operation request as an entry in an incoming mail queue 44 of the first cluster 12. Operation of the remote operation request is invoked by the mail manager 22 of the first cluster 12. Once the incoming mail queue 44 of the first cluster 12 becomes full as indicated by a credit counter 46, the mail manager 26 of the second cluster 14 stores the mail messages in an outgoing mail queue 48 until additional space becomes available in the queue 44.
As previously mentioned, one purpose of redundant clusters is to ensure that if one cluster fails, the storage controller or other device may continue to operate. In such redundant applications, it is often desired that at least one cluster operate at all times so that operation of the device is not interrupted. As a result, when upgrading the software or code of the device, the software is often upgraded on one cluster while the other cluster continues to run. Then, the upgraded cluster is restarted and the software on the other cluster is upgraded. As a consequence, there may be intervals when the software code running the two clusters may be not be at the same level on both clusters. When the software on a cluster is upgraded and the cluster is restarted or booted, the booted cluster may inform the other cluster of the software level of the booted cluster. In one prior art system, this software level information may take the form of a version number of the loaded software.