1. Field
The present disclosure pertains to the field of processing systems. More particularly, the present disclosure pertains to a memory ordering technique for a multiprocessing system.
2. Description of Related Art
Improving the performance of computer or other processing systems generally improves overall throughput and/or provides a better user experience. One technique of improving the overall quantity of instructions processed in a system is to increase the number of processors in the system. Implementing multiprocessing (MP) systems, however, typically requires more than merely interconnecting processors in parallel. For example, tasks or programs may need to be divided so they can execute across parallel processing resources.
Another major challenge in an MP system is maintaining memory consistency (also known as coherency). Memory consistency is the general requirement that memory remain sufficiently updated to supply a current copy of memory contents to a requesting processor or other device. Maintaining memory consistency is complicated by the use of internal caches and other data structures that store data for more efficient access than is typically available from other (e.g., external) memory circuits.
A system may maintain memory consistency using hardware or using a combination of hardware and software techniques. The hardware provides a particular memory ordering guarantee, a guarantee that the hardware will maintain the sequential nature of program memory accesses (to at least some selected degree) at some selected point in the system hierarchy. Software may be used in some systems to supplement hardware-provided memory ordering by forcing additional ordering restrictions at desired times. The memory ordering scheme implemented is a design choice involving a tradeoff between hardware complexity, software complexity, and the desired ability to cache and buffer data.
One prior art technique that represents a compromise between weakly ordered memory consistency models and very restrictive consistency models is xe2x80x9cprocessor consistencyxe2x80x9d. The processor consistency model is a known prior art model which allows limited reordering. One implementation is used in some prior current processors (see, e.g., U.S. Pat. No. 5,420,991). Memory ordering constraints for one embodiment of a prior art processor consistency memory model system are shown in FIG. 1a. 
According to block 100 of FIG. 1a, the prior art system ensures that stores from each individual processor in the system are observed in order by all other processors. In other words, individual stores from a particular processor are not re-ordered with respect to each other. As indicated in block 102, the system ensures that loads from each processor appear to execute in order. In some systems, optimizations may be done; however, load data appears to be returned to the computation-performing unit in order to avoid altering the ordering relationships between the system loads and stores. On the other hand, if the load data being returned has not been altered by non-globally-observed stores, the order of that the load data is returned may be varied, and the data still appears to be returned in order.
Additionally, as indicated in block 104, the system ensures that loads and stores to the same address are globally ordered. Thus, all agents in the system observe loads and stores to the same address in the same order. The consequences of the constraints of blocks 100-104 are discussed in greater detail (see FIGS. 4a-b) as some embodiments of the present invention include these constraints as well.
Finally, as indicated in block 105, stores to different addresses by different processors are globally ordered except that each processor can observe its own stores prior to observing stores from other processors. This prior art constraint is further contrasted with the present invention below (see FIGS. 4c-4e for implications of this prior art constraint). Some systems (e.g., systems based on the Profusion Chipset from Intel Corporation of Santa Clara) may require substantial hardware to ensure reasonably efficient ordered global observation of different stores to different memory locations by different processors.
Moreover, memory ordering overhead continues to grow dramatically as systems which implement traditional memory ordering models are scaled up to meet additional processing challenges. Consequently, there is a continuing need for memory ordering techniques that allow improved efficiency while maintaining a predetermined memory ordering protocol such as processor consistency.