This application is related to our copending patent application entitled xe2x80x9cGATE CLOSE BALKING FOR FAIR GATING IN A NONUNIFORM MEMORY ARCIHTECTURE DATA PROCESSING SYSTEMxe2x80x9d, filed of even date herewith and assigned to the assignee hereof.
The present invention generally relates to data processing systems, and more specifically to fair gating in a nonuniform memory access (NUMA) architecture.
Data processing systems invariably require that resources be shared among different processes, activities, or tasks in the case of multiprogrammed systems and among different processors in the case of multiprocessor systems. Such sharing is often not obvious within user programs. However, it is a necessity in operating systems, and is quite common in utility programs such as database and communications managers. For example, a dispatch queue is typically shared among multiple processors in a multiprocessor system. This provides a mechanism that allows each processor to select the highest priority task in the dispatch queue to execute. Numerous other operating systems tables are typically shared among different processes, activities, tasks, and processors
Data processing systems invariably require that resources be shared among different processes, activities, or tasks in the case of multiprogrammed systems and among different processors in the case of multiprocessor systems. Such sharing is often not obvious within user programs. However, it is a necessity in operating systems, and is quite common in utility programs such as database and communications managers. For example, a dispatch queue is typically shared among multiple processors in a multiprocessor system. This provides a mechanism that allows each processor to select the highest priority task in the dispatch queue to execute. Numerous other operating systems tables are typically shared among different processes, activities, tasks, and processors
Serialization of access to shared resources in a multiprocessor system is controlled through mutual exclusion. This is typically implemented utilizing some sort of hardware gating or semaphores. Gating works by having a process, activity, or task xe2x80x9cclosexe2x80x9d or xe2x80x9clockxe2x80x9d a xe2x80x9cgatexe2x80x9d or xe2x80x9clockxe2x80x9d before accessing the shared resource. Then, the xe2x80x9cgatexe2x80x9d or xe2x80x9clockxe2x80x9d is xe2x80x9copenedxe2x80x9d or xe2x80x9cunlockedxe2x80x9d after the process, activity, or task is done accessing the shared resource. Both the gate closing and opening are typically atomic memory operations on multiprocessor systems.
There are typically two different types of gates: queued gates and spin gates. Semaphores are examples of queued gates. When a process, activity, or task attempts to xe2x80x9cclosexe2x80x9d a queued gate that is already closed, that process, activity, or task is placed on a queue for that gate, and is dequeued and activated when the gate is subsequently opened by some other process, activity, or task. Queued gates are typically found in situations where the exclusive resource time is quite lengthy, especially in comparison with the time required to dispatch another process, activity, or task.
The second type of gate is a xe2x80x9cspinxe2x80x9d gate. When a process, activity, or task attempts to xe2x80x9cclosexe2x80x9d a spin gate that is already closed, a tight loop is entered where the processor attempting to close the spin gate keeps executing the xe2x80x9cclosexe2x80x9d instruction until it ultimately is opened by another processor or the processor decides to quite trying. Note that xe2x80x9cspinxe2x80x9d gates assume a multiprocessor system since the processor xe2x80x9cspinningxe2x80x9d trying to xe2x80x9cclosexe2x80x9d the spin gate is depending on another processor to xe2x80x9copenxe2x80x9d the gate. Spin gates are typically found in situations where the exclusive resource time is fairly short, especially in comparison with the time required to dispatch another process, activity, or task. They are especially prevalent in time critical situations.
As noted above, the instructions utilized to open and close gates, in particular spin gates, typically execute utilizing atomic memory operations. Such atomic memory modification instructions are found in most every architecture supporting multiple processors, especially when the processors share memory. Some architectures utilize compare-and-swap instructions to xe2x80x9cclosexe2x80x9d gates. The Unisys 1100/2200 series of computers utilizes Test Set and Skip (TSS) and Test Clear and Skip (TCS) to close and open spin gates.
The GCOS(copyright) 8 architecture produced by the assignee herein utilizes a Set Zero and Negative Indicators and Clear (SZNC) instruction to xe2x80x9cclosexe2x80x9d a spin gate and a Store Instruction Counter plus 2 (STC2) instruction to subsequently xe2x80x9copenxe2x80x9d the spin gate. The SZNC sets the Zero and Negative indicators based on the current value of the gate being xe2x80x9cclosedxe2x80x9d. It then clears (or zeros) the gate. The next instruction executed is typically a branch instruction that repeats executing the SZNC instruction if the gate being closed was already clear (or contained zero). Thus, the SZNC instruction will be executed repeatedly as long as the spin gate is closed, as indicated by having a zero value. The gate is opened by another processor by storing some non-zero value in the gate cell. In the GCOS 8 architecture, execution of the STC2 instruction to xe2x80x9copenxe2x80x9d a gate guarantees that the xe2x80x9copenedxe2x80x9d gate will contain a non-zero value.
Memory configuration in multiprocessor shared-memory systems have typically been a uniform memory configuration. Each processor has the same chance to access any given memory location, and in particular, to access any given spin gate. This results in a certain relative xe2x80x9cfairnessxe2x80x9d in accessing the spin gate. Thus, when a spin gate is xe2x80x9copenedxe2x80x9d, all competing processors are on essentially equal footing in xe2x80x9cclosingxe2x80x9d the gate.
This is not the case when a Cache Coherent NonUniform Memory Access (CC-NUMA) architecture is implemented. CC-NUMA architectures are discussed in detail in xe2x80x9cIn Search of Clustersxe2x80x9d, Second Edition, by Gregory F. Pfister, incorporated herein by reference. xe2x80x9cLockingxe2x80x9d or xe2x80x9cGatingxe2x80x9d is discussed starting on page 179. In a CC-NUMA architecture, some processors may have a preferential access to the spin gate. For example, the spin gate may reside in high-speed cache memory for one or more processors. The processors with immediate access to the cache memory can typically gain sufficient access to the spin gate to close it, at the expense of processors without such immediate access. The result of this is that in certain situations where multiple processors are competing for ownership of a shared resource, processors with the slower access to exclusive ownership of the spin gate can be locked out for extended periods of time by processors having faster access to the shared gate. A number of different symptoms have been noticed that indicate the occurrence of this situation. For example, in certain situations different timers may expire prior to the requesting processor acquiring or successfully closing the spin gate.
A cache siphon is where the cache copy of a block of memory is moved from one cache memory to another. When more than one processor is trying to get write access to the same word or block of memory containing a gate at the same time to close the gate, the block of memory can xe2x80x9cping pongxe2x80x9d back and forth between the processors as each processor siphons the block of memory containing the gate into its own cache memory in order to try to close the gate.
This potential for unfairness is exacerbated by attempts to improve the memory access of the waiting processor by first snooping the gate word in order to avoid unnecessary cache siphons. The delay introduced by the snoop can give processors in a common locality a significant time advantage for update acquisition of the cache block containing the spin gate.
It would be useful in CC-NUMA systems to have available xe2x80x9cfairxe2x80x9d gate opening and closing functionality so that processors with slower access to exclusive ownership of a shared resource are not frozen out by processors with faster access to the shared resource.