1. Technical Field
The present invention relates in general to data processing system and, in particular, to coherency management in a data processing system. Still more particularly, the present invention relates to a processor, data processing system and method supporting improved coherency management of castouts in a cache hierarchy of a data processing system.
2. Description of the Related Art
A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and which generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Cache memories are commonly utilized to temporarily buffer memory blocks that might be accessed by a processor in order to speed up processing by reducing access latency introduced by having to load needed data and instructions from system memory. In some multiprocessor (MP) systems, the cache hierarchy includes at least two levels. The level one (L1), or upper-level cache is usually a private cache associated with a particular processor core and cannot be directly accessed by other cores in an MP system. Typically, in response to a memory access instruction such as a load or store instruction, the processor core first accesses the upper-level cache. If the requested memory block is not found in the upper-level cache or the memory access request cannot be serviced in the upper-level cache (e.g., the L1 cache is a store-though cache), the processor core then access lower-level caches (e.g., level two (L2) or level three (L3) caches) to service the memory access to the requested memory block. The lowest level cache (e.g., L2 or L3) is often shared among multiple processor cores.
A coherent view of the contents of memory is maintained in the presence of potentially multiple copies of individual memory blocks distributed throughout the computer system through the implementation of a coherency protocol. The coherency protocol, for example, the well-known Modified, Exclusive, Shared, Invalid (MESI) protocol or a variant thereof, entails maintaining state information associated with each cached copy of the memory block and communicating at least some memory access requests between processing units to make the memory access requests visible to other processing units.
In order to synchronize access to a particular granule (e.g., cache line) of memory between multiple processing units and threads of execution, load-reserve and store-conditional instruction pairs are often employed. For example, load-reserve and store-conditional instructions have been implemented in the PowerPC® instruction set architecture with operation codes (opcodes) associated with the LWARX/LDARX and STWCX/STDCX mnemonics, respectively (referred to hereafter as LARX and STCX). Execution of a LARX instruction by a processor loads a specified cache line into the cache memory of the processor and sets a reservation flag and address register signifying the processor has interest in atomically updating the cache line through execution of a subsequent STCX instruction targeting the reserved cache line. The cache then monitors the storage subsystem for operations signifying that another processor has modified the cache line, and if one is detected, resets the reservation flag to signify the cancellation of the reservation. When the processor executes a subsequent STCX targeting the cache line reserved through execution of the LARX instruction, the cache memory only performs the cache line update requested by the STCX if the reservation for the cache line is still pending. Thus, updates to shared memory can be synchronized without the use of an atomic update primitive that strictly enforces atomicity.
The state of the reservation flag and the caching of the reserved cache line are independent, meaning that the reservation flag is not reset automatically if the associated cache line is removed from the cache of the reserving processor, for example, by a castout operation. If a reserved cache line that is castout from a cache memory is subsequently modified by a processor other than the reserving processor, the reservation will be automatically canceled through conventional coherency communication if the reserved cache line is in a state other than Modified. However, because a Modified cache line can be updated “silently” (i.e., without inter-cache coherency communication), special provision must be made for such cache lines in order to ensure that a STCX that should fail does not succeed.
In one prior art implementation, one coherency protocol addressed the above operating scenario by permitting a cache line (including a reserved cache line) to be castout from an upper level cache to a lower level cache in the same coherency state as the cache line was held in the upper level cache and by requiring the coherency state of the cache line to be downgraded, if applicable, from an exclusive ownership state (e.g., Modified (M)) to a shared ownership state (e.g., Tagged (T)) if the cache line were again obtained by an upper level cache from the lower level cache. The enforced downgrade ensures that any pending reservation for the cache line is canceled in the event that a different processor attempts to update the cache line while holding the cache line in the exclusive ownership state.