1. Field of the Invention
The present invention relates to register addressing in a data processing apparatus, and is particularly relevant to the field of digital signal processing, although its use is not restricted to digital signal processing.
2. Description of the Prior Art
Typically, a data processing apparatus, such as a microprocessor, is arranged to apply instructions received to data items read from memory. A processor core is provided within the microprocessor to process the instructions, and generally the microprocessor will have a plurality of registers in which data items are preloaded prior to them being required by the processor core. As the instructions are received by the microprocessor, they will typically pass through an instruction decoder before being passed on to the processor core for processing. The processor core will then apply the decoded instruction to the relevant data item read from its registers. The result of the processed instruction can then be written back to one of the registers, or provided directly to a memory such as a cache memory.
It is often the case in data processing apparatus that a specific data item is required for a number of instructions. Hence, that data item must be retained in one of the registers until all such instructions have been processed, and the instructions must all refer to the particular register in which that data item is located. Additionally, it is common for a range of instructions (hereafter referred to as an instruction loop) to be repeated a number of times, and for certain data items to be reused by different instructions within the instruction loop. This is particularly commonplace in digital signal processing apparatus, where algorithms such as a block filter algorithm contain an instruction loop which needs to be repeated a number of times in order to be applied to a set of data items. Any particular data item may be used a number of times by different instructions within the instruction loop.
Instruction loops are used extensively in the relatively complex arithmetic and logical operations that digital signal processing apparatus are required to perform on often high volumes of data in order to produce a real time output data stream. Typical applications of digital signal processing techniques include mobile telephones required to perform real time transformation between analogue audio signals and coded digital data for transmission.
An instruction will typically include one or more operands identifying registers containing the data items required by the instruction. For example, a typical instruction may include two source operands and one destination operand, identifying two data items required by the instruction, and a register in to which the result of the instruction should be placed. In situations where the instruction forms part of an instruction loop to be applied repetitively to a set of data items as discussed above, it is necessary for each instruction's operand to refer to the correct register every time the instruction is executed. This requirement often means that the same instruction has to be reproduced a number of times in the instruction loop code to refer to different registers each time. For example, if four data items are loaded in to the registers and, as part of an instruction loop, an instruction has to be applied to each of these four data items, then the instruction has to be reproduced four times within the instruction loop, the operand of the instruction referring to a different register each time. This clearly has a detrimental effect on code density.
In microprocessor design, it has been known in certain situations to use the concept of logical register references, and to then map these to actual physical register references. For example, superscalar processors often implement register renaming to allow speculative and out of order execution of instructions. The processor has more registers than the programmer's model provides, and a mapping table is employed to map the logical registers to their current physical equivalents. When the processor speculatively executes instructions it assigns physical register numbers to hold the results. Other instructions may also speculatively read these results. Several possible instruction `flows` may thus execute in parallel. At some point, the processor will determine which one of the multiple streams is the correct one, generally based on the outcome of a conditional branch. At that point the mapping table is permanently updated and the incorrect versions of any registers are discarded. The purpose of this register renaming is to increase instruction throughput whilst being completely transparent to the programmer. It will be apparent that it does not help with the above problem of code density.
As another example of the use of logical and physical register references, the Advanced Micro Devices (AMD's) Am29200 microprocessor incorporates 192 general purpose registers. 128 of these registers are designated as local registers, and an additional register is designated as a local register stack pointer, the stack pointer register providing an offset into the 128 local registers. Hence, whenever any instruction references a local register, it uses the value of the stack pointer register to calculate an absolute register number. Thus, for example, if an instruction wishes to access local register 1 (the local register sequence starting with local register 0), and the stack pointer points to absolute register number 131 (this actually being the 4th register in the sequence of 128 local registers according to the Am29200 design), the instruction will actually access the register identified by absolute register number 132.
An instruction can be executed on the Am29200 microprocessor to change the value in the stack pointer register, after which the revised value in the stack pointer register will be used for any instructions referencing the local registers. This mapping of the register requested by the instruction to the absolute register number within the 128 local registers will be employed for every subsequent instruction until another instruction is executed to alter the stack pointer register.
Hence it can be seen that the above AMD design allows a certain degree of remapping in instances where an instruction needs to be repeated for a number of different data items in different registers. However, it requires that a separate instruction be issued to change the value in the stack pointer register each time the register identified in the operand of an instruction is to be mapped to a different local register. This causes significant overhead in certain situations, for example where an instruction loop is to be repeated a number of times upon a shifted set of data items, such as is typical in digital signal processing apparatus. The overhead becomes particularly unacceptable when the instruction loop only contains a small number of instructions, since prior to the loop being repeated, a separate instruction is required to change the value in the stack pointer register. Hence, for example, if a single instruction is to be repeated ten times so as to be applied to data items 1 to 10, then prior to each execution of the instruction, a separate instruction would be required to change the value in the stack pointer register. Hence, an instruction loop containing a single instruction effectively becomes an instruction loop having two instructions, this clearly adding a large overhead to the execution of the instruction loop.