Memory accesses in a computer system typically follow a given set of rules to ensure that all agents in the system are working with the correct version of the data. The most rigid set of rules, referred to as strong ordering, requires among other things that all loads to a given agent are observable by other agents in the order in which they were issued, i.e., the program order.
While strong ordering guarantees correctness, it does so at the expense of performance. Processors can achieve high performance for loading information from memory if they are allowed to follow weaker rules. The weak ordering rule gives the processor the freedom to choose the highest performance way to load data from memory.
The weak ordering rule allows instructions to be executed out of order. This rule enhances processor's performance because instructions can be executed as soon as resources are available, avoiding wasteful idle periods.
However, at a particular reference time, the program running on a given agent may need to guarantee that all previous loads (or reads) from memory have been observed by all other agents. In addition, the program may also want to ensure that all loads by an agent subsequent to a particular reference time will not be observable before any previous loads. In essence, an agent may want to synchronize all loads issued by itself with respect to a particular timing point.
Prior art synchronization methods include the use of input/output (I/O) instructions, privileged instructions, uncacheable memory references, serializing instructions, and locked instructions. These methods implement the synchronization as part of their primary functions. However, these methods have a number of drawbacks. First, they all require the use of at least one register, taking away valuable storage resources. Second, they are slow due to the time spent to perform the primary function. Third, except serializing and locked instructions, other methods are privileged and not available to the applications users.
Therefore there is a need to provide an efficient method to synchronize load operations using minimal hardware resources.