1. Technical Field
The present invention is directed to a method and apparatus for verification of coherence for shared cache components in a system verification environment.
2. Description of Related Art
Every computer architecture defines consistency and coherence rules for storage accesses to memory locations. Consistency refers to the ordering of all storage access events within a processor. Coherence refers to the ordering of competing storage access events from different processors. The most restrictive form of consistency and coherence rules is sequential consistency rules. Sequential consistency rules limit the performance of programs by requiring all storage accesses to be strictly ordered based on the order of instructions in the processor and across all processors. Several new techniques have relaxed this requirement, under certain conditions, and allow storage accesses within a processor and across different processors to be performed out-of-order. Any required consistency is enforced by the use of synchronization primitives which are an integral part of these architectures.
For example, the PowerPC™ architecture permits the hardware to be aggressive by using a weak consistency scheme which, under certain conditions, allows storage accesses to be performed out-of-order. The following are examples of weak consistency rules used with the PowerPC™ architecture:    Rule 1: Dependent loads and stores from the same processor must perform in order and all non-dependent accesses may perform out-of-order, unless a synchronization operation, such as an Acquire or Release operation, is present to explicitly order these loads and stores. By dependent, what is meant is that these loads and stores are to overlapping addresses or there is some explicit register-dependency among them.    Rule 2: Competing loads and stores from different processors can perform in any order. As a result, these loads and stores must be made non-competing by enclosing them within critical sections using lock and unlock routines. By competing, what is meant is that these loads and stores are to overlapping bytes and at least one of them is a store.The PowerPC™ architecture defines memory coherence rules as follows:    Rule 3: All accesses to a particular location are coherent if all stores to the same location are serialized in some order and no processor can observe any subset of those stores in a conflicting order.    Rule 4: All values loaded by a processor accessing a location in a specified interval should be a subsequence or the sequence of values held by the location in that interval. That is, a processor can never load a “new” value first and later load an “older” value.
The coherence rules described above are better explained with reference to the following example. Consider storage accesses to location A in a two-way PowerPC™ symmetric multiprocessor (SMP) system:
Processor 0Processor 1LD, R1, AST, 1, ALD, R2, AST, 2, AST, 3, AST, 4, ALD, R3, A
Under the coherence rules stated above, the load into R1 on processor 0 can contain the values 1, 2 or 4 but not 3. If processor 0 loads the value 2 into R1, then it can load 2 or 4, but not load 1 or 3, into R2. In addition, if processor 0 loads 2 into R2, it can load 3 or 4, but not 1 or 2, into R3.
In order to preserve the memory coherence requirement described above, most PowerPC™ multiprocessor implementations use the write-invalidate protocol. This protocol allows multiple readers and at most one writer for each memory location. Stores are ordered sequentially by each processor requesting write access on the system bus. When a processor obtains write access, it broadcasts an invalidation message to all other processors on the system bus. Each processor that receives this message invalidates its copy of the data. A processor that has been granted write access proceeds to write to its cache copy. When necessary, the processor uses synchronization operations to ensure that this copy is visible to all other processors.
Most system designs are represented by a model written in a hardware description language (HDL) that can later be transformed into an integrated circuit chip. The model is extensively verified through simulation before it is sent for fabrication, which is referred to as a tape-out. Since the fabrication process is highly expensive, it is necessary to keep the number of tape-outs to a small number. In order to minimize the number of tape-outs, a good simulation plan containing a wide range of tests that cover various aspects of the system is necessary. In addition, robust verification tools, such as fast simulators, deterministic and random test generators, and a checker that checks both consistency and coherence violations in the design, are necessary.
Verification of storage access rules grows in complexity, especially in a weakly ordered system where some sequences of a program may perform out-of-order and some must perform in order. Further, the complexity significantly increases when verifying these ordering rules in a multiprocessor system. Described hereafter are two commonly used checking schemes which represent two extremes of the spectrum of known checkers. The first is a static checker and the second is a classic checker.
The static checker, depicted in FIG. 1, is very easily portable between systems with little or no changes. As shown in FIG. 1, the static checker 120 consists of a simple functional simulator 130 and a comparator 140. A test case (test) is input and the static checker 120 computes expected values of all locations in the system. The test case is also input to the model under test 110 and actual values for all register, cache and memory locations in the system are obtained from the simulated model. The actual values are then compared to the expected values using the comparator 140. If a mismatch occurs, it may be due to a coherence violation. Since the functional simulator 130 is only a simple reference model of the system, the functional simulator 130 can only compute deterministic values. As a result, test cases that may cause race conditions due to competing accesses are not permitted. The static checker 120 requires that multiprocessor test cases perform storage accesses only to non-overlapping bytes such that the expected results are deterministic.
There are several limitations to the use of a static checker. First, multiprocessor test cases are restricted to performing stores only to non-overlapping bytes, as stated previously. Second, checks are limited to data values available at the end of the test case run. Third, there is no provision to verify storage access ordering of events. In the static checker 120, synchronization accesses, such as Acquire and Release, can be competing and requires special test cases to be developed to ensure their correct operation. With such limitations, it is possible for the model under test 110 to correctly complete the test case but still contain ordering and coherency violations that escape detection. In order to detect these violations, several billion cycles and sequences may need to be run such that these violations propagate to the end of the test case.
FIG. 2 illustrates the other end of the spectrum of checkers, i.e. the classic checker. The classic checker is written with an intimate knowledge of the system being verified with this knowledge being sued to shadow various interfaces and states of the system. As a result, the classic checker is not as portable as the static checker and many modifications may be necessary in order to use the classic checker with a new system.
The classic checker is designed to be functionally equivalent to the actual system and thus, provide comprehensive coverage of the system under test. As shown in FIG. 2, the model under test 260 includes a plurality of state machines 270–290. The classic checker 210 is designed to be functionally equivalent to the model under test by including a plurality of shadow state machines 220–240 and a comparator 250. The equivalency between the classic checker 210 and the model under test 260 is accomplished by matching the state of the classic checker 210 with the model under test 260 at all times. The comparator 250 compares the states of the state machines 270–290 and shadow state machines 220–240 to determine if there is any mismatch. If there is a mismatch, then the mismatch may be due to a coherency violation.
The classic checker 210 may execute in real time or run as a post-processor. In either scenario, the classic checker requires detailed access to the internal functions of the model under test 260 to extract state information and compare it with its own shadow state information. Some of the features of the classic checker include that it permits unrestricted storage accesses to dependent and competing locations, it verifies ordering and coherency for all operations in the system, it verifies the state of the caches and memory at all times, and it verifies that all requests and responses from various system components are correct.
Despite its capabilities, however, the classic checker fails to consistently deliver its objectives due to constant design changes and instances (transient states) when the checker is unable to exactly determine the state of the actual design. As a result, the classic checker has a much higher cost and is not portable to other systems.
Current markets demand faster development cycles. Neither static checkers nor classic checkers provide the time and accuracy requirements for achieving these faster development cycles without increasing the cost of verification. Static checkers often miss ordering and coherency violations and thus, are not as accurate as classic checkers. Classic checkers, however, require extensive development times and are not portable to new circuit designs or changes in the design for which the classic checker was developed. As a result, some ordering and coherency violations often escape detection by checkers and are only identified after fabrication of the integrated circuit chip. This leads to more tape-outs and increased cost.
Thus, there is a need for an improved coherency checker such that portability of the checker is preserved without loss in the coherency checking ability of the checker.