1. Field of the Invention
The present invention generally relates to multiprocessor systems and, more particularly, to a novel technique for examining coherence request processing in a multiprocessor system.
2. Description of the Prior Art
To achieve high performance computing, multiple individual processors have been interconnected to form multiprocessor computer systems capable of parallel processing. Multiple processors can be placed on a single chip, or several chips—each containing one or several processors, forming so-called “compute nodes”, which interconnect into a multiprocessor computer system.
Processors in a multiprocessor computer system use private cache memories because of their short access time (a cache is local to a processor and provides fast access to data) and to reduce the number of memory requests to the main memory. However, managing caches in a multiprocessor system is complex. Multiple private caches introduce the multi-cache coherency problem (or stale data problem) due to multiple copies of main memory data that can concurrently exist in the caches of the multiprocessor system.
The protocols that maintain the coherence between multiple processors are called cache coherence protocols. Cache coherence protocols track any sharing of data blocks (e.g. lines, block and words) between the processors. For example, MESI is a common coherence protocol where every hardware cache line can be in one of four states: modified (M), exclusive (E), shared (S), or invalid (I). Line states are changed by memory references issued by the processors.
In a coherent multiprocessor system, a memory reference issued by one processor can affect the caches of other processors. For example, when a processor stores to a line, the coherence mechanism must insure that eventually all caches either have the new data or have no data for that line at all. This generally involves inter-processor communication for testing the state of the line in the various caches and changing the state, if necessary. Commonly, such inter-processor communication is conducted by passing packets containing coherence protocol actions and responses between processors.
One group of cache coherence protocols is referred to as snooping. In a snooping cache coherence approach, no centralized system coherence state is kept, but rather each cache keeps the sharing status of data blocks locally. The caches are usually on a shared memory bus, and all cache controllers snoop (monitor) the bus to determine whether they have a copy of the data block requested. A commonly used snooping method is the “write-invalidate” protocol. In this protocol, a processor ensures that it has exclusive access to data before it writes that data. On each write, all processors snoop on the bus and check their caches to see if the address written to is also located in their caches. If so, the data corresponding to this address are invalidated. If two or more processors attempt to write the same data simultaneously, only one of them wins the race, causing the other processors' copies to be invalidated.
When a cache coherence request is not properly handled, which may occur for several reasons, an error is introduced in the system. This error may manifest itself much later in the processing, or not at all. Achieving proper handling of coherence requests in a multiprocessor system is one of the biggest challenges in a multiprocessor design. Designers and programmers employ various techniques called debugging to determine the source or sources of any errors.
Sometimes, in debugging a multiprocessor system, it is advantageous to be able to control coherence traffic by having control over coherence events being transferred between processors to enable easier debugging of a multiprocessor coherence mechanism. In a uniprocessor environment, single-stepping is a widely used approach for debugging uniprocessor systems, used to understand their behavior, and detect errors. For example, U.S. Pat. No. 6,986,026 issued to Roth, et al describes a technique for causing a single processor to process one instruction at a time. Uniprocessor single-stepping is executed by taking an exception after each instruction or by invoking an emulator. Roth's disclosure does not describe techniques for debugging a multiprocessor system, and does not describe how to debug coherence requests.
It is desirable to be able to single-step coherence events transferred between processors in a multiprocessor system. Thus, coherence events which are active at a certain processor cycle in a compute node could be processed in a single-step, allowing for designers and programmers to easily troubleshoot multiprocessor systems.
Having set forth the limitations of the prior art, it is clear that what is required is a technique for monitoring coherence event processing in a multiprocessor system.