This invention relates to processor systems which include a cache memory and a main memory and in particular to the aliasing of entries in the main memory to avoid the reading of incorrect cache entries by a processor.
In the current state of technological development, memories such as are employed, for example, for the storage of packet data in network switches and similar contexts in general operate at a lesser rate than microprocessors. More particularly, microprocessors can request memory accesses at a rate approximately an order of magnitude faster than a rate at which random accesses can be performed in a typical memory. Accordingly, a microprocessor is commonly provided with a cache memory, that is to say a small very fast memory that can, for example, retain copies of recently used memory values or which can retain copies of data such as data packets written to memory. The cache memory operates transparently to the programmer, automatically deciding which values to keep and which to overwrite. It may, though need not be, be implemented on the same chip as the microprocessor. Caches are beneficial particularly where programs display a property known as xe2x80x98localityxe2x80x99 which means that any particular time they tend to execute the same instructions many times on the same areas of data.
Since a processor can operate at a high clock rate only when the memory items it requires are held in the cache, the overall system performance depends strongly on the proportion of memory accesses which cannot be satisfied by the cache. An access to an item which is in the cache is called a xe2x80x98hitxe2x80x99. An access to an item which is not in the cache is a xe2x80x98missxe2x80x99. Systems are designed so that a fetch instruction from the microprocessor is initially directed to the cache and only if the cache returns a xe2x80x98missxe2x80x99 will the microprocessor need to have recourse to the main and slower memory.
Caches may be organized so that a single cache may store both copies of instructions and copies of xe2x80x98dataxe2x80x99. For the sake of convenience, and to avoid ambiguity in the use of the term xe2x80x98dataxe2x80x99, it will be presumed that the xe2x80x98dataxe2x80x99 other than instruction data will be packet data, and particularly header data (such as addresses, or status words) since in a network switch it is this type of data which needs to be examined or processed, particularly for the purpose of determining the destination or destinations of a packet.
The simplest form of cache is a direct mapped cache. In such a cache a line of data is stored along with an address tag in a memory which is addressed by some portion of the memory address; this portion is known as the index. To check whether or not a particular address location is stored in a cache, the index address bits are used to access the cache entry. The top address bits are then compared with the stored tag. If they are equal, the item is in the cache. The xe2x80x98lowestxe2x80x99 address bits can be used to access the desired item within the line.
It is often preferable to employ what is known as set-associative cache, because the simple direct-mapped cache is subject to xe2x80x98contentionxe2x80x99. An N-way set associative cache (N being an integer greater than one) is constituted by N direct mapped caches operating in parallel. An address presented to the cache may find its data in any of the N direct mapped caches or xe2x80x98setsxe2x80x99, so each memory address may be stored in one of several places. The access time for a multiple-way set-associative cache is slightly longer than that of the simple direct-mapped cache, the increase being due to the need to multiplex the data from the sets.
A set-associative cache having multiple associativity may typically comprise up to four direct-mapped caches operating in parallel, the associativity being then four. Associativities greater than four are feasible but are in general not preferred because the benefits of going beyond four-way associativity are small and in general do not warrant the extra complexity incurred.
When a new data item is to be placed in a multiple-way set-associative cache, a decision must be taken as to which of the individual caches the data item is to be placed in. It is known to employ random allocation, a least recently used (LRU) algorithm and a round-robin algorithm. As will be seen, if a multiple-way set-associative cache is employed in the present invention it is necessary to employ a round-robin allocation scheme. Such a scheme maintains a record of which location was last allocated and allocates new data to the next individual cache in a cyclic sequence.
In practical systems, a DMA (Direct Memory Access) device is often used to copy data items into main data memory on the system bus. In such systems there is no coupling between the DMA device and cache, so the cache will receive no signal indicating that the contents of main data memory have changed. The contents of the cache will thus be inconsistent with the contents of the main memory. It is therefore important that the cache is disabled for areas of memory where the DMA device copies in new data. The disabling of the cache results in a reduction in performance of processor access to this input or output data.
Some processors use input and output data intensively; an example is a network processor constituted by a RISC (Reduced Instruction Set Computer). It is desirable to enable or facilitate caching of input and output data without the loss of performance.
When a packet of data arrives at an input port, the input port will assert a signal (rx_packet_ready). This signal is sent to the DMA device. On receiving the assertion of rx_packet_ready the DMA device will copy the packet from the input port to a packet buffer in the main buffer memory on the system bus. Once the DMA device has completed copying the packet it will inform the processor via an interrupt signal. The processor will respond to the interrupt by reading a (known) register in the DMA device which will return the memory address pointer of the packet buffer. The address pointer is used by the processor to access the start of the packet data.
The first time that the processor accesses packet data, the cache will recognize that it has not read data from this memory location and will make a copy of the data and store it in the cache. The cache data is therefore initially consistent with the data in the buffer memory.
The processor will complete processing the relevant packet data, with recourse to the copy of the data in the cache memory. After the processing of the packet data the processor may direct the DMA device to copy the packet to a port.
Some time later when a new packet arrives at a port, the specific packet buffer will be reused. The DMA device will copy the new packet to that packet buffer and will notify the processor and pass to the processor the same address pointer to the packet buffer. However, the cache will assume that it holds the data in cache and will therefore return the wrong data to the processor.
A software solution to the problem is possible. More particularly, the processor could flush the cache before it reads data for a new packet. However, the present invention is directed to providing a hardware solution which does not require flushing of the cache in these circumstances and is independent of the software for the microprocessor.
The present invention relies on the aliasing of an entry in the main memory. In particular, where the associativity of the cache is N, the aliasing in memory should be at least N+1.
In such a scheme, each time a DMA controller loads a new packet into the packet buffer it will increment the address pointer to the next alias of the packet buffer in memory. The cache identifies whether it stores data simply from the memory address. Since the address pointer is for a different address, the cache will recognize it as a new address and will load the data afresh into the cache. The data will be loaded into the cache in the same index line but in a different associativity set. Since there are a limited number of aliases the pointer will eventually wrap around to the initial value. The original cache data is guaranteed to be overwritten provided that the associativity of the cache is less than the multiplicity of aliasing in the main memory and either the cache is a direct-mapped cache (i.e. has single-way set associativity) or the cache has multiple-way set-associativity and the cache replacement algorithm is xe2x80x98round-robinxe2x80x99.
Further features of the invention will be apparent from the following detailed description with reference to the accompanying drawings.