1. Field of Invention
The present invention relates to a method and a circuit implementation for multiple-word transfer between a processor register file and a memory subsystem.
2. Description of Related Art
Common compiling or assembly programming practice often needs to access continuous memory locations in memory subsystems, whose smallest addressable unit is one byte, in a sequence, such as function stack pushing and popping, copying of a chunk of memory from one location to another, etc. Most modern processors access one memory location by one instruction. As a result, transferring multiple words can only be achieved by a sequence of one-word transfer instructions. For example, a word load instruction can move one word, which contains 4 bytes, from a memory location to a processor register one at a time. Three of such word load instructions, which can have a format like the following, together perform 3-word transfer.
ld_word r1, [BASE].ld_word r2, [BASE+4].ld_word r14, [BASE+8].
In the above example, ld_word refers to a word load instruction, r1, r2 and r14 refer to the indexes of the processor registers which are to receive the values of the corresponding addressed memory locations, and BASE refers to the index of the processor register whose content is used as the base address of the memory locations. The base address is usually added with another value called offset to form the final memory address. In common programming practice, when multiple memory locations are accessed, registers involved are usually in consecutive indexes. Further, special registers, such as general data pointer (dp), frame pointer (fp), function call return pointer (rp) and stack pointer (sp) are often involved.
However, using a sequence of one-word transfer instructions to achieve multi-word transfer may lead to larger program code size and lower performance.
Some processors with a small register file can access multiple words in an instruction by specifying a selection mask for registers involved in memory accesses. One memory word access needs one register as the source (for memory store) or the destination (for memory load) of the corresponding memory access.
For example, a multi-word transfer instruction in a 16-register (which are numbered from r0˜r15) processor can be specified as:
ld_multi_word 0b0100000000000110, [BASE].
In this example, 0b0100000000000110 is a full 16-bit selection mask to specify that registers r1, r2 and r14 are used in the transfer.
However, this instruction specification mechanism is impossible for processors with N registers and M-bit instruction format if N is greater than or equal to M because the register selection mask itself needs N bits and leaves no room to encode other information in the instruction format. It is inefficient for such processors if M-N is a small positive number, which indicates the bits remaining in the instruction format to encode other information. For example, the most common processors with 32-bit instruction format have 32 general-purpose processor registers, and such instruction format is impossible to incorporate full register selection mask.
Therefore, it is useful to have a mechanism for memory transfer instruction in processors with many registers for moving multiple words between consecutive memory locations and processor registers to achieve smaller program code size, improved instruction access bandwidth and higher performance.