1. Field of the Invention
Embodiments of the present invention relate to computer system memory, in particular the management of cache memory.
2. Related Art
With direct memory access (DMA), an input/output (I/O) system can issue read requests and writes directly to main memory without passing through the central processing unit (CPU). However, if the I/O system uses DMA to write to main memory, and changes data cached previously by the CPU, then the CPU will not receive the new data unless the CPU fetches the data from main memory. Also, for DMA reads, the CPU cache may contain more recent data than main memory, and so the I/O system will not receive the new data unless it reads the cache instead of main memory. Multiprocessor systems, particularly systems referred to as shared-memory simultaneous multiprocessor (SMP) architectures, have to deal with similar types of scenarios. The MESI (Modified, Exclusive, Shared, Invalid) protocol is a popular cache consistency (coherency) protocol that addresses these issues. A modification of the MESI protocol is the MOESI (Modified, Owned, Exclusive, Shared, Invalid) protocol. These protocols are known in the art.
Constraints are also applied to memory operations to prevent processors and DMA systems from reordering memory operations at will. If, for example, each processor could reorder memory operations for optimization, code sequences that work on a single processor would not work on multiprocessor systems. One type of constraint can be referred to as sequential consistency. With sequential consistency, the legal orders of memory operations are those that are indistinguishable from strict interleaving of the operations from each thread of control. For example, for two threads, the operations of one thread can be interleaved with those of the other thread, but the order of operations within each thread is preserved.
There are different classes of high-level processor architectures with regard to operation reordering, optimization and speculation. One such class can be referred to as a lumped in-order architecture, and another such class as a lumped out-of-order architecture. With lumped in-order architectures, instructions are lumped into instruction groups that can be committed and rolled back atomically. Different instruction groups are committed sequentially and in order, but within an instruction group, arbitrary reordering and optimization can occur. Full speculation is possible within an instruction group, but speculation, reordering and optimization across instruction groups is limited. With lumped out-of-order architectures, instructions are lumped into instruction groups that can be committed and rolled back atomically. Different groups can execute out of order but are committed in order.