1. Field of the Invention
The present invention relates to managing multiprocessor operations.
2. Description of the Related Art
In certain computing environments, multiple host systems may communicate with a control unit, such as an IBM Enterprise Storage Server (ESS)®, for data in a storage device managed by the ESS receiving the request, providing access to storage devices, such as interconnected hard disk drives through one or more logical paths (IBM and ESS are registered trademarks of IBM). The interconnected drives may be configured as a Direct Access Storage Device (DASD), Redundant Array of Independent Disks (RAID), Just a Bunch of Disks (JBOD), etc. The control unit may be a multiprocessor type system. For example, the control unit may include duplicate and redundant processing complexes, also known as clusters, to allow for failover to a surviving cluster in case one fails.
There are various types of multiprocessor systems. In one type, processors may each have their own memory and cache. The processors may run in parallel and share disks. In one type of multiprocessor system, each processor may run a copy of the operating system and the processors may be loosely coupled through a Local Area Network (LAN), for example. Communication between processors may be accomplished through message-passing.
In another type of multiprocessor system, the processors may be more tightly coupled, such as connected through a switch or bridge. Communication between the processors may be accomplished through a shared memory, for example.
In yet another type of multiprocessor system, only one copy of the operating system may run across all of the processors. These types of multiprocessor systems tend to be tightly coupled inside the same chassis with a high-speed bus or a switch. Moreover, the processors may share the same global memory, disks, and Input/Output (I/O) devices.
Should a shared resource such as a shared disk fail, the processors of the multiprocessor system may simply cease using the failed disk. In a multiprocessor system in which one copy of the operating system runs across the processors of the system, recovery operations may be readily coordinated. For example, if a shared resource such as a bridge may be restored by one of the processors performing recovery operations such as resetting the bridge, those recovery operations may be coordinated amongst the various processors by the common operating system of the processors.
Each processor in a multiprocessor system may also have a cache in which one or more lines of a shared memory may be cached. Thus, two or more caches may have copies of the same line of shared memory. If one processor changes the data in a line of shared memory cached in the caches of other processors, the various caches may have different, incorrect versions of the line of shared memory. As a result, the cached data may no longer be “coherent” with respect to other caches or the shared memory.
Various cache coherency protocols may be employed to synchronize data amongst several caches. One cache coherency protocol marks each cache line with one of four states, Modified, Exclusive, Shared, or Invalid (MESI). A cache line marked as being in the Modified state indicates that the cache line was modified and therefore the underlying data in the line of shared memory is no longer valid. A cache line marked as being in the Exclusive state indicates that the cache line is only stored in that particular cache and has not yet been changed. A cache line marked as being in the Shared state indicates that the particular cache line may be stored in other caches of the other processors. A cache line marked as being in the Invalid state indicates that the cache line is invalid.
Snooping logic is typically employed utilizing a particular coherency protocol to provide for cache coherency. Snooping logic in the processor may broadcast a message over a common bus line shared by the other processors, informing the other processors each time a processor modifies data in its cache. The snooping logic may also snoop on the bus looking for such messages from other processors.
When a processor detects that another processor has changed a value at an address existing in its own cache, the snooping logic invalidates that entry in its cache in accordance with various protocols including the MESI protocol. The invalid state marking of the cache line can inform the processor that the value in the cache is not valid. As a result, the processor can look for the correct value in the shared memory or in another cache.