1. Field of the Invention
The present invention generally relates to multiprocessor computer systems.
2. Description of the Prior Art
To achieve high performance computing, multiple individual processors have been interconnected to form multiprocessor computer systems capable of parallel processing. Multiple processors can be placed on a single chip, or several chips—each containing one or several processors—interconnected into a multiprocessor computer system.
Processors in a multiprocessor computer system use private cache memories because of their short access time (a cache is local to a processor and provides fast access to data) and to reduce the number of memory requests to the main memory. However, managing caches in a multiprocessor system is complex. Multiple private caches introduce the multi-cache coherency problem (or stale data problem) due to multiple copies of main memory data that can concurrently exist in the caches of the multiprocessor system.
The protocols that maintain the coherence between multiple processors are called cache coherence protocols. Cache coherence protocols track any sharing of data blocks between the processors. For example, MESI is a common coherence protocol where every hardware cache line can be in one of four states: modified (M), exclusive (E), shared (S), or invalid (I). Line states are changed by memory references issued by the processors.
In a coherent multiprocessor system, a memory reference issued by one processor can affect the caches of other processors. For example, when a processor stores to a line, the coherence mechanism must ensure that eventually all caches either have the new data or have no data for that line at all. This generally involves a good deal of inter-processor communication for testing the state of the line in the various caches and changing the state, if necessary. Commonly, such interprocessor communication is conducted by passing packets containing coherence protocol actions and responses between processors, herein referred to as coherence events.
One group of cache coherence protocols is referred to as snooping. In a snooping cache coherence approach, no centralized system coherence state is kept, but rather each cache keeps the sharing status of data blocks locally. The caches are usually on a shared memory bus, and all cache controllers snoop (monitor) the bus to determine whether they have a copy of the data block requested. A commonly used snooping method is the “write-invalidate” protocol. In this protocol, a processor ensures that it has exclusive access to data before it writes that data. On each write, all processors snoop on the bus and check their caches to see if the address written to is also located in their caches. If so, the data corresponding to this address are invalidated. If two or more processors attempt to write the same data simultaneously, only one of them wins the race, causing the other processors' copies to be invalidated.
When a cache coherence event is not properly handled, which may occur for several reasons, an error is introduced in the system. This error may manifest itself much later in the processing, or not at all. Achieving proper handling of coherence events in a multiprocessor system is one of the biggest challenges in a multiprocessor design. Designers and programmers employ various techniques called debugging to determine the source or sources of any errors.
Sometimes, in debugging a multiprocessor system, it is advantageous to be able to control coherence traffic. It is desirable to be able to have control over coherence requests being presented to a processor to enable easier debugging of a multiprocessor coherence mechanism. It is desirable to be able to insert specific coherence events, whose behavior can be observed by examining the states of various memory elements after their processing.
U.S. Pat. No. 6,986,026 describes a technique for causing a single processor to process one instruction at a time. Processor single stepping is executed by taking an exception after each instruction or by invoking an emulator. That patent does not describe a technique for debugging a multiprocessor system, and does not describe how to debug coherence events.
Having set forth the limitations of the prior art, it is clear that what is required is a technique for debugging coherence event processing in a multiprocessor computer system.