1. Field of the Invention
This invention relates to microprocessors and, more particularly, to process synchronization between processors in a multiprocessor system.
2. Description of the Related Art
Modern microprocessor performance has increased steadily and somewhat dramatically over the past 10 years or so. To a large degree, the performance gains may be attributed to increased operating frequency and moreover, to a technique known as deep pipelining. Generally speaking, deep pipelining refers to using instruction pipelines with many stages, with each stage doing less, thereby enabling the overall pipeline to execute at a faster rate. This technique has served the industry well. However, there are drawbacks to increased frequency and deep pipelining. For example, clock skew and power consumption can be significant during high frequency operation. As such, the physical constraints imposed by system level thermal budget points, and the increased difficulty in managing clock skew may indicate that practical limits of the technique may be just around the corner. Thus, industry has sought to increase performance using other techniques. One type of technique to increase performance is the use of multiple core processors and more generally multiprocessing.
As computing systems employ multiprocessing schemes with more and more processors (e.g., processing cores), the number of requesters that may interfere or contend for the same memory datum may increase to such an extent that conventional methods of process synchronization may be inadequate. For example, when a low number of processors are contending for a resource, simply locking structures may provide adequate performance to critical sections of code. For example, locked arithmetic operations on memory locations may be sufficient. As the scale of multiprocessing grows, these primitives become less and less efficient. To that end, more advanced processors include additions to the instruction set that include hardware synchronization primitives (e.g., CMPXCHG, CMPXCHG8B, and CMPXCHG16B) that are based on atomically updating a single memory location. However, we are now entering the realm in which even these hardware primitives may not provide the kind of performance that may be demanded in high-performance, high processor count multiprocessors.
Many conventional processors use synchronization techniques based on an optimistic model. That is, when operating in a multiprocessor environment, these conventional processors are designed to operate under the assumption that they can achieve synchronization by repeatedly rerunning the synchronization code until no interference is detected, and then declare that synchronization has been achieved. This type of synchronization may incur an undesirable waste of time, particularly when many processors are attempting the same synchronizing event, since no more than one processor can make forward progress at any instant in time. As such, different synchronization techniques may be desirable.