This disclosure relates generally to data processing and, more specifically, to a multicopy atomic store operation in a data processing system.
A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data, and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of addressable memory in the multiprocessor computer system and which generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
In data processing systems that implement weak (or weak consistency) memory models, instructions may be arbitrarily re-ordered by the processor cores for execution as long as dependencies are observed and the operations are not otherwise restricted from being executed out-of-order. In addition, memory updates may be non-multicopy atomic, meaning that any given memory update may propagate to differing processors at differing times instead of becoming visible to all processors (other than the initiating processor) at the same time. In such data processing systems, out-of-order execution of memory access instructions can be restricted and multicopy atomicity of like size memory updates can be enforced through the use of barrier (or synchronization) instructions. As is known in the art, a barrier instruction prevents execution of subsequent memory access instructions (e.g., store and/or load instructions following the barrier instruction in program order) until all prior memory access instructions (e.g., any load or store instructions preceding the barrier instruction in program order) are resolved.
In a conventional data processing system that implements a snoop-based coherence protocol, the memory access ordering indicated by a barrier instruction is enforced by the processor core that executes the barrier instruction initiating broadcast of a barrier operation on the system interconnect to all processing units of the data processing system. In response to snooping the barrier operation on the system interconnect, the processing units provide appropriate coherence responses to ensure that the barrier operation is not permitted to successfully complete until the relevant memory accesses preceding the barrier have resolved. Once the barrier operation successfully completes, the initiating processor core is permitted to continue execution of memory access instructions following the barrier instruction.
In the presence of like size memory accesses (e.g., all accesses to any given memory location are made by accesses of the same size), ordering a store access preceding a barrier instruction with subsequent memory accesses following the barrier instruction and ensuring that the store data written by the stores access is propagated completely prior to allowing any of the subsequent memory accesses to initiate restores the appearance of multicopy atomicity to the processing units that may subsequently consume the store data written to the memory subsystem by the store access. That is, the ordering provided by the barrier instruction provides the same effect as if the store data were made simultaneously available to all processing units (despite any actual variance in the timing of data availability due to system topology and/or the structure and operation of the cache hierarchies). However, the present disclosure recognizes that use of conventional barrier instructions cannot fully restore multicopy atomicity in data processing systems implementing a weak memory model if mixed-size conflicting accesses are permitted.