Field of the Invention
The present invention generally relates to computer systems, and more particularly to a method of verifying the design of a computer system having a resource such as a cache memory which is shared among multiple devices such as processors and accelerators.
Description of the Related Art
When a new computer system (or subsystem) is designed, it is important to ensure that the design is going to work properly before proceeding with fabrication preparation for the integrated circuit devices making up the system, and their assembly into the finished product. A variety of tests can be performed to evaluate the design, but simulation remains the dominant strategy for functionally verifying high-end computer systems. A design-under-test is driven by vectors of inputs, and states encountered while walking through the sequence are checked for properties of correctness. This process is often performed by software simulation tools using different programming languages created for electronic design automation, including Verilog, VHDL and TDML.
The verification process should include simulation of any shared resources in the design. One typical shared resource in a computer system is a cache memory. In a symmetric, multi-processor (SMP) computer, all of the processing units are generally identical, that is, they all use a common set or subset of instructions and protocols to operate and generally have the same architecture. A processing unit includes a processor core having registers and execution units which carry out program instructions in order to operate the computer, and the processing unit can also have one or more caches such as an instruction cache and a data cache implemented using high-speed memory devices. Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from system memory. These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip. Each cache is associated with a cache controller that manages the transfer of data between the processor core and the cache memory. A processing unit can include additional caches such as a level 2 (L2) cache which supports the on-board (level 1) data and instruction caches. Higher-level caches can store a much larger amount of information (program instructions and operand data) than the on-board caches can, but at a longer access penalty. Multi-level cache hierarchies can be provided where there are many levels (L3, L4, etc.) of serially connected caches. The higher-level caches are typically shared by more than one processor core.
In systems which share resources such as cache memories, it is important to ensure that the resource is accessed in a consistent manner among all of the devices that use the resource, e.g., that one processor is not given a value for a cache entry that is inconsistent with a value for the same entry that was given to another requesting processor. A system that implements this consistency is said to be coherent. Different coherency protocols have been devised to control the movement of and write permissions for data, generally on a cache block basis. At the heart of all these mechanisms for maintaining coherency is the requirement that the protocols allow only one requesting device to have a “permission” that allows a write operation to a given memory location (cache block) at any given point in time. To implement cache coherency in a computer system, processors typically communicate over a common generalized interconnect or bus, passing messages indicating their need to read or write memory locations. When an operation is placed on the interconnect, all of the other processors can monitor (snoop) this operation and decide if the state of their caches can allow the requested operation to proceed and, if so, under what conditions. One common cache coherency protocol is the MESI protocol in which a cache block can be in one of four states, M (Modified), E (Exclusive), S (Shared) or I (Invalid). There are many, more complicated protocols which expand upon the MESI protocol. Other features may be implemented with the coherency protocol, such as cache intervention which allows a cache having control over a requested memory block to provide the data for that block directly to another cache requesting the value (for a read-type operation), bypassing the need to write the data to system memory.
Cache coherency may extend beyond processor cores. As computational demands have increased, computer systems have adapted by relying more on other hardware components for specific tasks such as graphics (video) management, data compression, or cryptography. These components are generally referred to as hardware accelerators, and have their own specialized processing units according to their particular functions. The control software for an accelerator typically creates an accelerator-specific control block in system memory. The accelerator reads all the needed inputs from the accelerator-specific control block and performs the requested operations. Some of the fields in the accelerator control block might for example include the requested operation, source address, destination address and operation specific inputs such as key value and key size for encryption/decryption. The usage of system memory by an accelerator leads to the necessity of ensuring coherency with the accelerator in addition to the processors. One low-cost approach to such an accelerator coherency system is described in U.S. Pat. No. 7,814,279.