This application is a national stage application of international application PCT/GB98/03377, filed Nov. 11, 1998, which claims priority from Great Britain Patent application No. 9724031.1, filed Nov. 13, 1997.
The present invention relates to cache memories.
In most current computers, the general-purpose or main memory is generally based on DRAM or similar memory chips which will maintain their contents for relatively long periods (though in fact they may require periodic refreshing to prevent loss of their contents).
The speed of such main memory (DRAMs and the associated circuitry) is substantially slower than the speed of typical modem processors. The use of cache memories has therefore become common. A cache memory is essentially a small associative memory which operates in parallel with the main memory. The principle underlying the cache technique is that if a word is accessed by a program, there is a strong chance that the program will soon want to access the same word again. The cache memory retains words which have been accessed by the program so that they are more quickly available for subsequent accesses.
For a memory read, the address is sent to the cache to see whether the desired data is in cache. If it is, it is read from the cache. If not, a xe2x80x9cread-missxe2x80x9d, the address is sent to the main memory to obtain the word: once the word has been read, it is normally written into the cache so that it will be available from the cache (although it would be possible for data to be flagged as xe2x80x9conce-onlyxe2x80x9d data, which a suitably arranged cache controller would not store in cache memory). In practice, the address is usually sent to the main memory in parallel with it being sent to cache, with the main memory read being aborted if it is found that the word is in cache. For a write, the word and its address are sent to the cache and the main memory in parallel, to ensure that the cache contents are always consistent with the main memory contents. (Optionally, writes may be buffered into the cache memory, the main memory, or both.)
Cache controllers which can detect whether data is in the cache memory, and organize updating of the cache are well known. It is also known that processors tend to access blocks of adjacent data, so many cache controllers are arranged to update in bursts; for example, when data is not in the cache, the controller will typically read a block of data surrounding the requested data, the boundaries of the block typically being chosen based on the physical architecture of the memory, to optimize performance.
The fact that the words accessed by a program have a strong tendency to occupy addresses close to each other allows an improvement in the way that a cache memory is operated, in a conventional burst mode cache controller. On a cache miss, when an address is being accessed and is found not to be in the cache, not only that address but adjacent addresses as well can be copied from the main memory into the cache. That is, a block of words is copied as a single burst.
The inventors have found that, even with a burst mode cache controller, performance may be degraded when the processor attempts to read a block of data that is not found in the cache. Following the first read miss when the block of data is requested, a conventional burst cache controller may initiate an update burst, in which a block of data is read into the cache. This will result in the processor waiting until surrounding data, including some which may not be required, is read into the cache. If a very small block size is chosen, then there is a fair chance that the program will want to access groups of words which extend beyond the size of the burst. That will result in repeated bursts copying adjacent blocks of words. Since each block copying involves main memory access overheads, that is less efficient than using a few large bursts. However, if a very large block size is chosen, there is a good chance that the program will only want to access a relatively small part of the block. That will also result in inefficiency, since part of the burst time will be used in copying undesired parts of the block. The block size must be chosen as a compromise in the light of these considerations. Afterwards, assuming the block read coincides with data requested, the processor will be supplied with data from the cache, but the initial delay may be significant.
Simpler cache controllers may supply the data from main memory, and then repeat the search in the cache for each subsequent word requested in the block; this can lead to poor performance on a long block of data, where each address is checked with the cache contents, then requested from memory, the initialisation of the memory address taking several clock cycles more than the actual data read operation.
The general object of the present invention is to provide a cache memory controller which alleviates the above problems.
EP-A-0782079 describes a mechanism to allow burst access by a processor to a non cacheable area of processor address space, typically IO locations, for improved performance. U.S. Pat. No. 5,423,016 describes how, in some circumstances, prefetching of instructions may be detrimental to system performance and suggests the use of an additional block buffer to alleviate this loss of performance. Data is first loaded into this block buffer and only transferred to the cache memory in the event of when the next cache miss occurs. Data in the block buffer is immediately available to the instruction processor. EP-A-0509676 describes a system for discovering instructions which have or cause a low cache hit ratio, keeping these instructions in a private memory, and preventing such instructions from being cached in subsequent accesses. U.S. Pat. No. 5,625,794 describes a cached disc controller application. The cache has distinct modes of operation which are selected by recording and analysing the usage for the datasets that it loads. Subsequent data accesses will be fetched from cache or disc depending on these usage statistics. It intuits the data characteristics. U.S. Pat. No. 5,586,296 is similar to EP-A-0509676 and it relates to an improved cache control mechanism which uses a xe2x80x9chistoryxe2x80x9d buffer to determine whether or not to cache particular data.
The invention effectively consists of a cache controller which includes means for sequential monitoring of the addresses being accessed by the processor and a means for supplying data from the main memory while those addresses are sequential. Cache systems which prefetch data are usually very effective; however, depending upon the programs, prefetching may be detrimental to system performance if the prefetched data is not subsequently used. The invention balances the prefetching of data with the memory system latency. The cache supplies data to the processor if it has the data within it, but in the event of a cache miss, where the cache does not contain valid data, the processor is responsible for fetching the data that it wants and the cache will enter a snoop mode whereby it keeps a copy of the data (whether instructions or operands) that the processor is fetching. The cache is purely passive in this role and does not initiate any memory accesses itself. During processor write operations the cache acts as a write-through cache, and the cache does not initiate any memory accesses of its own.
This mode of operation is effective when coupled with a memory system which provides write buffers and read ahead buffers. The combination of these additional buffers and the cache mode of operation results in a system which provides coherent main storage with minimal latencies due to prefetching.
Since the reading of sequential data from main memory is relatively fast once the initial address has been set up, the processor can be supplied with data relatively rapidly, by effectively discounting the cache as a source of data for the remainder of the block read. Thus, in some cases, the processor will be supplied with data from main memory even when the data is available in the cache. It is somewhat surprising that a performance improvement can be gained by disregarding the cache in these circumstances, but it is found that an overall improvement can result from removing the overheads associated with initiating a non-sequential memory access.
The updating burst automatically ends on the occurrence of a non-sequential address or a write, because the reading of sequential addresses from the main memory must then terminate. The updating burst will also end in the event of a gap or pause in the sequence of addresses, although the cache controller may in some cases be arranged to ignore gaps or pauses of less than a certain length.
It is important to note that in this system, once a cache updating burst has been initiated, cache intervention is suspended. Words are read from successive addresses in the main memory, and are passed simultaneously to the processor and the cache. As noted above, the major delays in main memory accesses are in the initial setting up of the main memory, taking several clock cycles. Once a main memory access is under way, reading of a sequence of successive addresses then occurs in successive clock cycles, i.e. one clock cycle per access. Thus for as long as the cache updating burst is maintained, the access rate from the main memory will be comparable with the access rate to the cache.
It is also important to note that the present cache memory does not require any modifications to the processor or the main memory.
The processor issues its address requests in the usual way. It knows that the time taken for response to those requests is variable, and it automatically waits for responses when necessary. When it issues the read request on which there is a cache miss, the response will be delayed. If it happens to continue by issuing a continuous sequence of read addresses following sequentially on from the address which produced the original miss, it will find that the responses follow on with little or no delays. So as far as the processor is concerned, the responses will appear to be cache hits. As soon as the sequence ends, the next processor request will be dealt with as either a true cache hit or a cache miss.
As far as the main memory is concerned, when a cache updating burst occurs, it will see a continuous sequence of sequential read requests, and will respond to these in the usual way; obviously, there will be no aborts interrupting the sequence. When the sequence ends, it will receive a non-sequential address which it will respond to in the usual way, and that response may or may not be aborted, depending on whether that memory access is a read with a cache hit.