1. Technical Field
The present invention relates generally to data processing, and in particular, to synchronization of processing in a data processing system. Still more particularly, the present invention relates to the virtualization of barrier synchronization registers in a data processing system.
2. Description of the Related Art
A conventional multiprocessor (MP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of memory in the multiprocessor computer system directly addressable by the processing units and which generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Cache memories are commonly utilized to temporarily buffer memory blocks that might be accessed by a processor in order to speed up processing by reducing access latency introduced by having to load needed data and instructions from memory. In some multiprocessor (MP) systems, the cache hierarchy includes at least two levels. The level one (L1), or upper-level cache is usually a private cache associated with a particular processor core and cannot be accessed by other cores in an MP system. Typically, in response to a memory access instruction such as a load or store instruction, the processor core first accesses the directory of the upper-level cache. If the requested memory block is not found in the upper-level cache, the processor core then accesses lower-level caches (e.g., level two (L2) or level three (L3) caches) for the requested memory block. The lowest level cache (e.g., L3) is often shared among several processor cores.
In such conventional MP systems, large workloads can be dispatched efficiently by harnessing the processing power of multiple of the processing units to execute several program-managed threads or processes in parallel. The multiple threads or processes can communicate data and control messages through the shared memory hierarchy.
When input values for operations to be executed by some processing unit are results (i.e., output values) of the processing performed by other processing units within the shared memory multiprocessor environment, the processing of the data-dependent operations introduces additional complexity. For example, in order for the first processor to obtain the results to be utilized as input values, the second processor must first store the output values to the shared memory hierarchy so that the first processor may then retrieve the results from memory. In addition, the execution of instructions of the first and second processors must be synchronized to ensure that the first processor is accessing the appropriate results in the shared memory hierarchy and not some prior, stale data values. Conventionally, the synchronization of processing by multiple processing units is accomplished via a single mirrored architected hardware register known as a barrier synchronization register (BSR) within each processing unit. However, as recognized herein, the availability of only a single resource such as a BSR to synchronize multiprocessing operations limits the virtualizability of workloads.