1. Field of the Invention
This invention relates to the field of computer systems, and in particular to a system and method for minimizing memory access delays via the use of memory partitioning, sequential prefetch, and multiple independent buffers.
2. Description of Related Art
A variety of techniques are commonly available for minimizing the effects of the delay associated with retrieving program code and data from memory elements. Generally, program and data items are stored in a memory device that is external from the processor, and the time to access an item from the external memory is substantially longer than the time to access an item from memory that is collocated with the processor (internal memory). For ease of reference, the term memory is used herein to denote storage means having a relatively slow access time relative to the speed of the processor, and the term buffer is used to denote storage means having short access time relative to the speed of the processor.
A common technique is the use of a cache buffer. When an item in memory is accessed, a block of memory containing the item is read into a cache that is local to the processor. Subsequently addressed items that are also contained in the block of memory that has been read into the cache are accessed directly from cache, thereby avoiding the delay associated with an access to an item stored in memory. When a subsequently addressed item is not in cache, the appropriate block of memory is read into cache, incurring the memory access delay. The larger the size of the cache, the more likely it will be that an addressed item will be within the cache. Other parameters may also affect the likelihood of an item being within the cache. For example, one routine may repeatedly call another routine. If the two routines are in proximity with each other, they may both lie within the cache, and no memory access delays will be incurred; otherwise, a memory access will be required with each call and return between the routines. Commonly, multiple independent caches are used, so that different blocks of memory, from potentially distant parts of memory, can be stored. In the example of one routine repeatedly calling another, one cache may contain the first routine, and another cache may contain the second routine, and an access to either routine, via the corresponding cache, will avoid a memory access delay. A particular problem with cache buffering occurs when routines such as loops extend across the boundary between blocks. Regardless of the size of the routine, both blocks will be required to be stored, in two caches. To minimize the likelihood of routines extending across boundaries, the block/cache size is typically large, thereby reducing the number of boundaries.
To be effective, cache buffering generally requires fairly large cache buffers, typically in the order of hundreds or thousands of bytes. An alternative to conventional cache buffering is prefetch buffering, wherein subsequent instructions are read from memory into a buffer, while the processor is accessing a prior instruction from the buffer. Because the contents of the buffer are continually updated based on the address of the current instruction being executed, or based on a subsequent branch instruction, the size of the prefetch buffer can be substantially less than the size of a cache buffer and yet achieve the same effectiveness. The efficiency of a prefetch scheme can be further enhanced by applying predictive techniques to conditional branch instructions, to optimize the likelihood that the appropriate code is in the prefetch buffer when the conditional branch instruction is executed. For example, loop structures can be identified, and the prefetch algorithm can be structured to assume that the program will return to the start of the loop more often than it will exit the loop, and thereby place the instruction at the start of the loop immediately after the conditional branch instruction that controls whether the loop is re-executed or exited. Only when the conditional branch instruction results in an exit will the processor be delayed, while the instructions after the loop are loaded into the buffer from memory.
In both the cache and prefetch buffering approaches, the time required to execute a program is substantially indeterminate, because the likelihood of a required item being in the local buffer is indeterminate, and therefore the number of times a memory access will be required is indeterminate.
It is an object of this invention to provide a memory access technique that is substantially deterministic. It is a further object of this invention to provide a memory access technique that is efficient with regard to the size of the internal buffer. It is a further object of this invention to provide a memory access technique that is efficient with regard to overall memory access time. It is a further object of this invention to provide a memory access technique that can be combined with other memory access techniques, such as caching.
These objects and others are achieved by providing a memory access architecture and technique that employs multiple independent buffers that are configured to store items from memory sequentially. The memory is logically partitioned, and each independent buffer is associated with a corresponding memory partition. The partitioning is cyclically sequential, based on the total number of buffers K, and the size of the buffers N. The first N memory locations are allocated to the first partition; the next N memory locations to the second partition; and so on until the Kth partition, and the allocation is repeated. The next N memory locations, after the K*N memory locations allocated to the K partitions, are allocated to the first partition; the next N locations are allocated to the second partition; and so on. When an item is accessed from memory, the buffer corresponding to the item""s memory location is loaded from memory, and a prefetch of the next sequential partition commences to load the next buffer. During program execution, the xe2x80x98steady statexe2x80x99 of the buffer contents corresponds to a buffer containing the current instruction, one or more buffers containing instructions immediately following the current instruction, and one or more buffers containing instructions immediately preceding the current instruction. This steady state condition is particularly well suited for executing program loops, or a continuous sequence of program instructions, and other common program structures. The parameters K and N are selected to accommodate typically sized program loops.