1. Technical Field
The technical field of the present specification relates in general to a method and system for data processing and in particular to a method and system for efficiently executing a multiple-register instruction. Still more particularly, the technical field of the present specification relates to a method and system for efficiently executing a multiple-register instruction, which permit multiple registers within a processor to be accessed within a single cycle.
2. Description of the Related Art
A typical state-of-the-art processor comprises multiple execution units, which are each optimized to execute a corresponding type of instruction. Thus, for example, a processor may contain a fixed-point unit (FXU), a floating-point unit (FPU), a branch processing unit (BPU), and a load-store unit (LSU) for executing fixed-point, floating-point, branch, and load and store instructions, respectively. In addition, the processor may include a number of architected register for temporarily storing instruction operands and result data, as well as on-board cache memory and an interface unit for interfacing the processor to a data processing system bus.
While processing data, it is frequently necessary to transfer large blocks of data between memory and the architected registers within a processor. For example, to perform mathematical operations on matrices, the values of the matrix elements must be loaded into the architected registers of the processor for subsequent use by the FPU or FXU. In order to simplify such transfers of large data blocks, some processors support load and store multiple instructions, which load data to and store data from multiple architected registers. Processors supporting load and store multiple instructions may also support string instructions that are utilized to transfer large blocks of data between multiple architected registers and unaligned memory addresses (i.e., memory addresses that are neither doubleword aligned nor word aligned).
While string instructions and load and store multiple instructions (all hereinafter referred to simply as multiple-register instructions) simplify the transfer of large blocks of data between memory and a processor's architected registers from a programming standpoint in that only a single instruction is required, multiple-register instructions often have a greater latency and take longer to execute than a sequence of individual load or store instructions that produce the same result. One reason for the inefficiency of the execution of multiple-register instructions in prior art processors is that such processors typically permit all load and store instructions, including multiple-register instructions, to access only a single architected register each cycle.
As should thus be apparent, an improved method and system are needed for executing multiple-register instructions with greater efficiency.