1. Field of the Invention
The present invention relates generally to computer processors and more particularly to the addressing of memory locations in computer processors.
2. Related Art
FIG. 1A shows a prior art computer system 100, including a processor 110, a system memory 120 and a cache memory 130. Processor 110, system memory 120 and cache memory 130 are connected by means of buses 115 and 125. In addition, a bank of registers 140 is provided on processor 100. Often times computer processors, such as processor 100, perform operations on data which is stored in sequences of orderly spaced memory locations of system memory 120. The contents of portions of system memory 120 can be mapped to cache memory 130 to speed up memory access time. Examples of such operations include displaying an image on a screen of a computer system, performing a calculation on a spreadsheet, etc. When these operations are performed, the computer processors repeatedly execute a same instruction on data stored in consecutive memory locations. For example, a running total of expenses entered on a spreadsheet, where data representing each entry into the spreadsheet is stored in a sequence of memory locations orderly spaced at a predetermined interval (e.g., every 1, 2, 4 . . . locations) starting at an arbitrary address in system memory 120, can be computed as shown in FIG. 1B. First, the starting address of the sequence of memory locations is stored in a first register (R1) of processor 110 in stage 150. The contents of the memory location at that memory address (i.e., the first entry in the spreadsheet) are stored in a second register (R3) of processor 110 in stage 160. A displacement value is added to R1 in stage 170, so that, at the end of stage 170, R1 contains the address of the second entry of the spreadsheet. The contents of the memory location pointed to by R1 are retrieved and stored in a third register (R2) in stage 180. The contents of registers R2 and R3 are then added in stage 190 and the result is stored into R3. Stage 195 then determines whether the last entry in the spreadsheet has been added to the running total, in which case the operation terminates. Otherwise, stages 170-195 are repeated until all entries have been added to the running total.
Since retrieving the contents of each memory location in the sequence from system memory 120 requires a substantial amount of processing time, a cache memory 130 can be used to speed up the process. Cache memories are typically faster and more expensive than other computer memories and are used to temporarily store a subset of the information stored in system memory 120. If the data used most frequently by processor 110 is stored in cache memory 130, the time required to perform operations on processor 110 can be substantially reduced. Several schemes are used to control which data is stored in the cache memory. When the processor executes an instruction referencing the contents of a location in the computer memory, the cache memory is first checked to see whether the contents of the memory location are already stored in the cache memory. Data stored in cache memory 130 can be read directly from cache memory 130 without accessing system memory 120. However, if the data is not stored in cache memory 130, the contents of the memory location must be retrieved from system memory 120 and (optionally) stored in cache memory 130.
When an operation is to be conducted on a sequence of orderly spaced memory locations, such as in the running total example above, it is desirable to load as many of the block memory locations into cache memory 130 as possible. However, using the addressing techniques of prior art processors, when an instruction referencing a sequence of memory location is decoded by processor 110, the address of the memory location is read from a register of the processor 110. The address is then compared to the tag field of the cache memory entries to determine whether the contents of the memory location are already stored in cache memory 130 and the data is then retrieved either from cache memory 130 or from system memory 120. However, even in cases where the sequence is already stored in cache memory 130, steps 150-195 still need to be performed sequentially (i.e., as part of the critical path). In multiscalar processors (i.e., processors that can execute multiple instructions in parallel), this scheme results in an inefficient use of processor resources, due to the length of the critical path.