1. Technical Field
The present invention relates to the field of microprocessors and more particularly, relates to a method and apparatus for rotate circuit.
2. Description of Related Art
It is well known in the data processing art to provide data processing systems with means for rotating multi-bit binary data. Rotation of data is typically used in data field manipulation operations such as field extraction, insertion, or data alignment. For example, use of a rotator for data alignment is described below.
Current microprocessors typically employ cache memory to improve the operating performance of the microprocessor. Both data and instructions are cached in many modern microprocessor designs. Such caching techniques are well known in the art. However, one problem frequently encountered in cached processor designs is data misalignment.
Cache memory is generally arranged in blocks, or lines, consisting of several bytes of memory. For example, in the exemplary IBM xe2x80x9cPowerPCxe2x80x9d architecture, each cache block consists of two words, each word consisting of four bytes, for a total of 8 bytes per block. Each word of each block is individually addressable.
FIG. 1 shows an example of a cache 100 that is n bytes wide. Cache 100 includes blocks 0 and 1, each consisting of words 0 and 1. Word 0 of block 0 consists of bytes 0-3, word 1 consists of bytes 4-7, word 0 of block 1 consists of bytes 8-B, and word 1 consists of bytes C-F.
The execution of certain instructions can cause data in the cache to be misaligned as will be described with respect to FIG. 1. For example, on the execution of a load word instruction, address data from two general purpose registers (xe2x80x9cGPRsxe2x80x9d) is added, and data is retrieved from the cache at the resulting address and stored into a third general purpose register. To illustrate how such an instruction can cause data in the cache to become misaligned, it is assumed that the load word instruction at issue requires two addresses stored in GPR 1 and GPR 2, respectively, to be summed and the data from the cache at the resulting address to be stored in GPR 3. If GPR 1 equals 0, and GPR 2 equals 1, then the word beginning at address 1 in block 0 of cache 100 will be written in GPR 3. As shown in FIG. 1, this word comprises bytes 1-4 which are stored partly in word 0 and partly in word 1. Thus, to store this word in GPR 3, two reads from cache 100 are required. In the first read, bytes 0-3 are retrieved from word 0. IN the second read, bytes 4-7 are retrieved from word 1. This data is then merged to form a single word comprising bytes 1-4, and stored in GPR 3. Of course, to properly merge the desired data from words 0 and 1, the relevant bytes must be aligned. Therefore an alignment circuit or rotator must be employed as is well known in the art.
Sometimes, 32-bit instructions must be performed on a 64-bit machine thus requiring a 64-bit rotator to perform 32-bit rotation. In some computer architectures, it is required that the higher order 32 bits of the 32-bit rotation result to have the same values as the lower order 32 bits. A common method to implement this requirement is that, when a 64-bit rotator does 32-bit rotation, 32-bit rotate data inputs are duplicated. That is, the 32-bit rotate data inputs are applied to the higher order 32 bits as well as to the lower order 32 bits, and rotated. However, this results in the increase of the data input load and/or penalty on the speed of the rotation. Therefore, a faster method of performing 32-bit rotation on a 64-bit machine with a lower data input load is desirable.
The present invention provides a dual mode rotator capable of performing 32-bit and 64-bit rotation. According to a preferred embodiment, the dual mode rotator includes a first, second, and third rotator units wherein each rotator has a plurality of inputs and outputs. The inputs of the second rotator are operatively connected to the corresponding outputs of the first rotator unit. The inputs of the third rotator unit are operatively connected to the corresponding outputs of the second rotator. Responsive to selection of 32-bit rotation mode, the upper half of the inputs to the first rotator are zero and the lower half of the outputs of the third rotator are replicated in the upper half of the outputs of the third rotator.