The present invention relates to cache structures for computers and in particular to a cache structure that allows dynamic control of the size and configuration of the data block fetched by the cache from memory.
Standard electronic computers include a processor, executing arithmetic and logical instructions, and a memory system communicating with the processor and holding instructions and data used by the processor. Typically, the memory system will include a range of memory types from disk drives to solid state memory each reflecting a different trade-off between storage cost (per data word), access speed and ultimately storage capacity. A hierarchy is formed of these devices with data being moved from the generally larger and slower memory devices to the smaller and faster memory devices at times when frequent access to the data by the processor is needed.
Cache memory (henceforth termed xe2x80x9ccachexe2x80x9d) is solid-state memory in direct communication with the processor typically both on and off the processor chip. Data is moved to the cache from a larger solid-state memory (henceforth termed xe2x80x9cmemoryxe2x80x9d) to provide faster access to that data by the processor.
The effectiveness of cache depends on how well it is managed. Time saved by faster access between the processor and the cache can be lost if the desired data is not in the cache (a cache xe2x80x9cmissxe2x80x9d) and an updating of the cache from the memory must be performed prior to the data being available to the processor.
For this reason, proper management of the cache attempts to ensure that data is moved to the cache from the memory prior to being needed by the processor. This can be done by moving not only the data requested by the processor, but also data having addresses near the address of the data requested by the processor. The expectation is that requests of data by the processor will cluster in address. The data moved to the cache upon a cache miss will be termed the xe2x80x9cfetch blockxe2x80x9d.
Larger fetch blocks reduce the number of cache misses (until cache pollution causes the miss rate to rise again). Larger fetch blocks, however, also increase the traffic between the memory and the cache reducing performance of the system. Accordingly, computer designers attempt to pick a fetch block size effecting a compromise between the competing requirements of minimizing cache misses and minimizing superfluous traffic between the memory and the cache.
The present inventors have recognized that the tradeoffs between avoiding cache misses and minimizing data traffic between the cache and memory can be improved by dynamically changing the fetch block size based on historical measurement of the success of previous fetch block sizes in satisfying processor requests. The fetch blocks may include data from discontinuous address ranges.
The statistics about the success of a fetch block size will depend on the particular data contained in the fetch block (and thus generally the address of the data in the memory) and hence statistics about the fetch blocks must be linked to particular memory addresses. Nevertheless, simulations indicate that this storage overhead is justified for large cache sizes based on performance gains.
Specifically, the present invention provides a cache structure for a computer having a processor and associated memory. The cache structure includes a cache communicating with the memory for receiving data from the memory and communicating with the processor for providing data to the processor. The cache is divided into blocks, each holding data from an address range of the memory, and each block is divided into sub-blocks. The cache structure also includes a xe2x80x9csubblock use tablexe2x80x9d having entries indicating which subblocks have had their data used by the processor since the block was loaded. A xe2x80x9cfetch size controllerxe2x80x9d provides a fetch size value for a given address range of the memory based on the subblock use table for the data of the given address range. xe2x80x9cMiss processing circuitryxe2x80x9d responds to a request from the processor for data in a given address range (when the data are not found in the cache) by loading the requested data into a number of subblocks of a block of the cache determined by the fetch size value for that address range.
Thus it is one object of the invention to provide for a dynamically changing fetch block size for updating the cache based on statistical data as to how well a previous fetch block size was utilized by the processor. Generally, if the subblock use table shows a large number of subblocks of the block being accessed by the processor, a larger fetch block size is chosen.
The fetch size value may be a single bit and the number of subblocks may be selected from the group consisting of one subblock and all of the subblocks of the block.
Thus it is another object of the invention to provide for an extremely low overhead dynamic system in which only two sizes of fetch block are used.
The fetch size controller may determine the fetch size value by comparing the number of subblocks of the block of the cache having their data used by the processor against a predetermined threshold.
Thus it is another object of the invention to provide a simple metric for determining effectiveness of a fetch block size that may be used to decide dynamically the size of future fetch blocks for data of a particular memory address range.
The fetch size controller may determine the fetch size value for a given address range based on the subblock use table for data previously loaded for the given address range over several previous loadings of the given address range.
Thus it is another object of the invention to provide for a greater statistical base in making a dynamic fetch block size determination by looking at several cycles of use of data from a particular address range.
The fetch size controller may determine the fetch size value for a given address range based on whether the number of subblocks of the block of the cache having their data provided to the processors since the block was last loaded principally exceed or fall short of a predetermined threshold for a predetermined number of loadings of the given address range.
Thus it is another object of the invention to provide for a simple statistical evaluation of the success of different fetch block sizes that may be implemented in fast hardware and that may evolve with use toward increasing or decreasing fetch block size.
In an alternative embodiment, the cache and subblock use table may be associated with a xe2x80x9cfetch pattern controllerxe2x80x9d which analyzes patterns of subblock use indicated by the subblock use table for a given address range to provide a fetch pattern associated with the given address range. In this case, the miss processing circuitry responds to a request from the processor for data of the given address range that is not in the cache by loading the requested data into particular subblocks of a block of a cache according to the fetch pattern and the request.
Thus it is another object of the invention to provide for a dynamic changing of fetch block size that does not require the subblocks having contiguous address ranges.
The fetch pattern may be the pattern of the entry of the subblock use table associated with the given address range including a subblock holding the requested data.
Thus it is another object of the invention to provide a simple determination of a fetch pattern when discontinuous subblocks are indicated but one that always includes the actual requested data from the processor.
The cache structure may include a xe2x80x9cprevious subblock use tablexe2x80x9d having at least one entry indicating which of the subblocks of the block of the cache have had their data provided to the processor since the block was previously loaded. The fetch pattern controller may then compare the patterns of the subblock use between the subblock use table and the previous subblock use table for a given address range to determine the fetch pattern.
Thus it is another object of the invention to provide a simple mechanism for evaluating historical correlations between successful fetch blocks holding discontinuous subblocks.
The fetch pattern controller may evaluate the hamming distance between the entries of the subblock use table and the previous subblock use table and compares that hamming distance to a predetermined threshold in determining the fetch pattern.
Thus it is another object of the invention to provide a simple metric for correlation of discontinuous subblock patterns that may be easily implemented at the chip level. As before, this process may be extended over a number of loadings of the cache for the given address range and may allow both for evolution toward discontinuous subblock fetching or continuous block fetching as the historical statistics would indicate.
The foregoing and other objects and advantages of the invention will appear from the following description. In this description, reference is made to the accompanying drawings, which form a part hereof, and in which there is shown by way of illustration, a preferred embodiment of the invention. Such embodiment does not necessarily represent the full scope of the invention, however, and reference must be made therefore to the claims for interpreting the scope of the invention.