1. Field of the Invention
The present invention relates to management of code updates in a cluster.
2. Description of Related Art
In certain computing environments, multiple host systems may communicate with a control unit, such as an IBM Enterprise Storage Server (ESS)®,which provides access to storage devices, such as interconnected hard disk drives through one or more logical paths. (IBM and ESS are registered trademarks of IBM). The interconnected drives may be configured as a Direct Access Storage Device (DASD), Redundant Array of Independent Disks (RAID), Just a Bunch of Disks (JBOD), etc. The control unit, also known as a cluster, may include duplicate and redundant processing nodes, also known as processing complexes, to allow for failover to a surviving processing complex in case one fails. The processing complexes may access shared resources such as input/output (I/O) adapters, storage adapters and storage devices.
A processing complex may perform various processing operations such as input/output operations or other computation, for example. A failover operation can automatically switch input/output or other processing operations from the processing complex which failed or is shut down for servicing, to another processing complex. Once the processing complex which failed or is being serviced is ready to resume operation, operations may be transferred back after a failback operation.
To update the software or other code being executed by the processing complexes of a server cluster, one processing complex may be quiesced, causing the input/output or other processing operations of that processing complex to be taken over by the other processing complex or complexes of the server cluster in a failover operation. The code of the quiesced processing complex may then be updated. Following updating of the code for a particular processing complex, that processing complex may resume performing processing operations after a failback operation, upon which processing operations are transferred from another processing complex. This updating procedure may be repeated for the remaining processing complexes of the server node.
The resources of each processing complex may be divided into a number of logical partitions (LPAR), in which a computer's processors, memory, and hardware resources are divided into multiple environments. Each environment can be operated independently, with its own operating system and applications. Logical partitioning of a processing complex adds flexibility in workload management on a single server, with the ability to partition the single machine into many logical servers with their own sets of system resources. The resources in each partition may be combined in various amounts and combinations. Also, the number of logical hardware partitions that can be created depends on the hardware system.
Dynamic Logical Partitioning (DLPAR) extends the capability of LPAR by providing the ability to logically attach and detach the resources of a processing complex to and from the operating system of a logical partition without rebooting. This resource allocation can occur not only when activating a logical partition, but also while the partitions are running. Processor, memory, I/O adapter and other partition resources can be released into a “free pool,” acquired from that free pool, or moved directly from one partition to another within a processing complex, in various amounts or combinations. However, each partition generally has at least one processor, memory, an I/O adapter associated with a boot device, and a network adapter.
The movement of an LPAR resource from one hardware partition to another within a processing complex may be managed by a supervisor module. To transfer a partition resource, the supervisor module can send a network request to the logical partition which “owns” the partition resource, asking that logical partition to release the particular partition resource and put it into a quiesced state. In this manner, the partition resource is stopped, and placed under control of a hypervisor module. The supervisor module can send a command to the hypervisor, instructing it to reallocate the partition resource from the prior logical partition to another logical partition. In addition, the supervisor module can send a network request to the other logical partition, instructing it to acquire the partition resource from the hypervisor module and configure it for use by that other logical partition.