Servers, such as data storage servers, have become complex and involve various hardware such as data storage media, storage controllers, memories, and the accompanying power systems, cooling systems, etc.
Storage controllers control access to data storage media and memories in response to read and write requests. The storage controllers may direct the data in accordance with data storage devices such as RAID (redundant array of independent disks), JBOD (just a bunch of disks), and other redundancy and security levels.
As an example, an IBM® ESS (Enterprise Storage Server) such as a DS8000 has redundant clusters of computer entities, cache, non-volatile storage, etc., called “central electronics complexes” or “CECs”. The CECs may be partitioned into logical partitions or field images running within the system, where each partition is also redundant, including partitions within each of the CECs.
The resources within the system are shared by the field images and controlled employing a rack power control module (RPC) which may configure the system (controlling the power supply and cooling sequencing and operation, etc.).
The rack power control module is also redundant, and each rack power control module is capable of controlling the same hardware.
In order to avoid having multiple field images (or computer entities) controlling the same resources, a Master lock is used to select one of the field images to manage the shared hardware resources through the rack power control modules, as the field images operate independent of each other, and there is no communication between them. For example, a race situation exists in which each field image needs to communicate to every shared hardware resource that it can and separately race for the Master lock. When various field images are attempting to obtain the Master lock that is shared between two RPC's, various kinds of problems can arise, such as communication failure(s) between the field images and RPC's, and contention problems when the various field images attempt to obtain the Master lock at the same time. Multiple locks may be obtained such that the system cannot tell which field image is the master and may confuse the desired recovery actions.