1. Field of the Invention
The present invention relates generally to computer systems having one or more microprocessors that are capable of speculative or out-of-order processing of instructions, and in particular, relates to the synchronization of the processing of memory instructions by these microprocessors or between their threads.
2. Background Information
Typical computer systems use a single central processing unit (CPU), known as a microprocessor. This microprocessor executes the programs stored in main memory by fetching their instructions, examining them, and then executing them one after another.
In some applications, multiple processors are utilized. A singularly complex task can be broken into sub-tasks. Each subtask is processed individually by a separate processor. For example, in a multiprocessor computer system, word processing can be performed by one processor that handles the background task of printing a document, while a different processor handles the foreground task of interfacing with a user typing on another document. This use of multiple processors allows various tasks or functions, and even multiple applications, to be handled by more than a single CPU, thereby enhancing system efficiency and speed.
Utilizing multiple processors has the added feature that two or more processors may share the same data stored within the system. However, care must be taken to maintain processor ordering. That is, a sequence of xe2x80x9cwritesxe2x80x9d (sometimes referred to as a xe2x80x9cstoresxe2x80x9d) generated by any processor in the system should be observed in the same order by all other processors. For example, a processor P1 can perform a write operation W1 to a location 1, followed by a write operation W2 to a location 2. The location 2 contains a flag that signals that the data in the location 1 is valid. A processor P2 can continuously perform a xe2x80x9creadxe2x80x9d (sometimes referred to as a xe2x80x9cloadxe2x80x9d) operation R2 on the location 2 until the flag becomes valid. After the flag is observed valid, the processor P2 performs a read operation R1 on the location 1 to read the data. Thereafter, the processor P2 can perform a xe2x80x9cmodifyxe2x80x9d operation to change the data. In order for this algorithm to successfully execute in a multiprocessor system, the order in which the read operations R1 and R2 are performed by the processor P2 should be consistent with the order of the write operations W1 and W2 performed by the processor P1.
Further, since the data is being shared between the two processors and for data consistency purposes if both processors P1 and P2 have the capability of performing read, modify, and write operations, the two processors should not be allowed to perform operations on the data simultaneously. That is, while the processor P1 is in a process of reading, modifying, or storing the data, and the processor P2 should not be allowed to concurrently read, modify, or store that data. If the processor P2 is not constrained in this manner, incorrect data or results may be generated.
Further complicating the use of multiprocessors is the fact that processors often contain a small amount of dedicated fast memory, known as a xe2x80x9ccache,xe2x80x9d to increase the speed of operation. As information is called from main memory and used by a processor, the information and its address are stored in a small portion of cache, which is usually static random access memory (SRAM). Because these caches are typically localized to a specific processor, these multiple caches in a multiprocessor computer system can (and usually do) contain multiple copies of a given data item. Any processor or other agent accessing a copy of this data should receive a valid or updated data value, and data being written from the cache back into memory must be the current data. In other words, cache coherency must be maintained by monitoring and synchronizing data written from the cache to memory, or data read from memory and stored in the cache.
Processor ordering and cache coherency based on correct data are important for high-performance processors that utilize out-of-order processing or speculative processing, especially for multiprocessor systems having these types of processors. In out-of-order processing, a software program is not necessarily executed in the same sequence as its source code was written. In speculative processing, branch prediction is performed pending resolution of a branch condition. Once the individual microinstructions have been executed, its results are stored in a temporary state. Finally, macroinstructions are xe2x80x9cretiredxe2x80x9d once all branch conditions are satisfied or once out-of-order results are determined to be correct. The success of these two types of processing methods depends in part on the accuracy, consistency, and synchronization of the data that they read, modify, and write.
Accordingly, given the use of multiple processors, caches, and out-of-order or speculative processing, there is a need to improve ordering of memory instructions and transactions by microprocessors.
According to one aspect of the invention, a method is provided which includes acquiring ownership of a memory location that stores data, performing an atomic operation directed towards the data, preventing other operations directed towards the data while the atomic operation is performed, and releasing ownership of the memory location after performing the atomic operation.