The present invention relates to a cache coherency mechanism.
As is well known in the art, cache memories are used in computer systems to decrease the access latency to certain data and code and to decrease the memory bandwidth used for that data and code. A cache memory can delay, aggregate and reorder memory accesses.
A cache memory operates between a processor and a main memory of a computer. Data and/or instructions which are required by the process running on the processor can be held in the cache while that process runs. An access to the cache is normally much quicker than an access to main memory. If the processor does not locate a required data item or instruction item in the cache memory, it directly accesses main memory to retrieve it, and the requested data or instruction item is loaded into the cache. There are various known systems for using and refilling cache memories.
In order to rely on a cache in a real time system, the behaviour of the cache needs to be predictable. That is, there needs to be a reasonable degree of certainty that particular data items or instructions which are expected to be found in the cache will in fact be found there. Most existing refill mechanisms will normally always attempt to place in the cache a requested data item or instructions. In order to do this, they must delete other data items or instructions from the cache. This can result in items being deleted which were expected to be there for later use. This is particularly the case for a multi-tasking processor, or for a processor which has to handle interrupt processes or other unpredictable processes.
A computer system may have more than one processor, and each processor may have its own cache. Alternatively, a processor may have a plurality of CPUs, each with its own cache. However, these caches will commonly access a single main memory resource.
FIG. 7 illustrates a case where there are two processors CPU1, CPU2 each with their own cache CACHE1,CACHE2. The caches share a single memory resource MEM. FIG. 8 shows what can happen in such a situation. Consider an address in main memory 1010. This maps onto cache location 10 in both CACHE1 and CACHE2. The value V3 stored at address 1010 had an initial value of X, and the value V3=X was initially stored at cache location 10 in both of the caches. At that stage, the data item V3 was xe2x80x9cvisiblexe2x80x9d, that is either processor accessing address 1010 would retrieve from its cache the value V3=X. However, the CPU1 has executed a process, modified the value V3=Y and returned this to the location 10 in CACHE1. Now, the value V3=X in main memory is xe2x80x9cdirtyxe2x80x9dxe2x80x94 it no longer reflects the current value of V3. Moreover, the value V3=X in CACHE2 is xe2x80x9cstalexe2x80x9dxe2x80x94 it differs from the true value. Clearly, this situation needs to be rectified before CPU2 attempts to retrieve V3, because otherwise it will wrongly retrieve V3=X.
Thus cache coherency control is required to ensure that several processors and devices can correctly share memory. This can be achieved by:
1. Automatic coherency. Additional hardware guarantees that loads can retrieve the most recently written value regardless of which processor or device wrote it. Note that a functional, but low performance, implementation of automatic coherency is to disable the cache. Such additional hardware is referenced COHERE in FIG. 7.
2. Software coherency. Special code sequences are used in the program to control the transfer of data between cache and memory. They allow precise control of coherency and efficient use of the cache.
The visibility of data depends on whether the cache is automatically coherent or not. If the cache is not automatically coherent then only the contents of memory and its own cache are visible to a processor. Software has to cooperate to ensure that data is written to memory when appropriate. If the cache is automatically coherent then the most recently written value by any processor will be visible to all other processors.
Visibility definitions.
Visible A data item is visible to a processor if a load from the data item""s address will return that item.
Stale A data item is stale if the value in the cache is different from the last value written.
Dirty A data item is dirty if it has been modified in the cache with respect to main memory.
In a situation where a process wishes to clear a location in the cache, but the process does not have access to the address stored at that cache location, existing software coherency techniques require usage of a special, privileged mode of processor operation termed kernel mode. In a normal user mode it is not possible in such a circumstance to render the cache coherent using software coherency techniques other than by transfer into kernel mode.
According to one aspect of the present invention there is provided a cache coherency mechanism in a computer system comprising a processor, a cache and main memory wherein a plurality of addresses in main memory have access to each location in the cache, wherein a process being run by the processor includes a cache coherency instruction which specifies (i) an operation to be executed on the contents of a location in the cache and (ii) an address in main memory, the operation being executed for the contents of the location in the cache which would be filled by an access to said address in main memory, if the running process normally has access to said address in main memory, regardless of whether or not the contents of the specified address in main memory are held at that location in the cache.
The contents of each cache location can comprise an address in memory and an item stored at that address in main memory. The whole or part of the address in main memory can be held. The item may be a data item or an instruction.
The cache coherency mechanism defined above has the advantage that it is not necessary to request the cache coherency operation to be executed on a particular address stored in the cache. The instruction can specify any address which would map onto that cache location, and the processor can execute the instruction if it would normally have access to that address. Thus, any protection modes are automatically taken into account because, if the executing process does not have access to the specified address in main memory of the cache coherency instruction, the cache coherency operation will not be executed.
One type of cache coherency instruction is a flush instruction which writes back to the address in main memory held at that cache location, the item held at the cache location.
Another type of cache coherency instruction is a purge instruction which clears the contents of that cache location.
The cache coherency instruction can specify a sequence of addresses in main memory and operate for the contents of a set of locations in the cache which would normally be filled by accesses to the addresses in the sequence. Alternatively, a sequence of cache coherency instructions can be executed, each specifying one address in main memory.
The cache can be partitioned into a plurality of cache partitions, wherein the cache partition containing the relevant location in the cache is determined in dependence on the specified address in main memory. More details of a particular cache partitioning implementation may be obtained from our earlier Application No. 09/014,194, now U.S. Pat. No. 6,295,580.
The main memory can be organised in pages, each page comprising a sequence of addresses. In that case, the cache coherency instruction can specify a page in main memory for which the operation is to be executed, the operation being executed for each of the sequence of addresses in the specified page.
In that case, if the number of addresses in each page is always greater than the number of locations in one of the cache partitions, it can be determined that a cache partition can always be fully cleared by specifying a page.
The cache, or each cache partition, can be direct mapped. However, other associativities are possible.
The invention also provides a computer system comprising:
a processor for running a process by executing a sequence of instructions;
a main memory which holds said instructions and data for said instructions; and
a cache connected in a memory access path between the processor and the main memory and having a plurality of storage locations, wherein a plurality of addresses in the main memory have access to each storage location,
wherein the sequence of instructions for execution by the processor includes a cache coherency instruction which specifies (i) an operation to be executed on the contents of a storage location in the cache and (ii) an address in the main memory, wherein the specified operation is executed for the contents of the storage location in the cache which could be filled by an access to said specified address in main memory, if the running process normally has access to said address in main memory, regardless of whether or not the contents of the specified address in main memory are held at that location in the cache.
The invention further provides a method of modifying the coherency status of the contents of a cache with respect to items held in a main memory, wherein a plurality of addresses in main memory have access to each location of the cache, the method comprising:
executing a cache coherency instruction which specifies (i) an operation to be executed on the contents of a location in the cache and (ii) an address in main memory;
responsive to said cache coherency instruction, executing the specified operation for the contents of the location in the cache which could be filled by an access to said address in main memory, if the running process normally has access to said address in main memory, regardless of whether or not the contents of the specified address in main memory are held at that location in the cache.
The invention further provides an instruction set for a computer system which includes a cache coherency instruction which specifies (i) an operation to be executed on the contents of a location in a cache and (ii) an address in main memory, the cache coherency instruction causing the specified operation to be executed for the contents of the location in the cache which could be filled by an access to said address in main memory, only if the running process normally has access to said address in main memory, regardless of whether or not the contents of the specified address in main memory are held at that location in the cache.
In the preferred embodiment, the processor has a user mode of operation and a privileged (kernel) mode of operation. Cache coherency instructions are executable in the user mode.