The present invention relates to computers, and more particularly, to shifter units for use therein.
Computer processors are typically optimized for a particular word size. For example, processors optimized for 32 or 64 bits are commercially available. Processor registers and data paths are likewise designed to efficiently process operands that are as wide as the design word width. While narrower operands can be processed in the same registers and data path, they use the data path less efficiently.
The precision of the data for a large class of applications is often less than the word size in bits of the computers on which the data is manipulated. In such systems, it is advantageous to pack multiple data entries into a single word. This reduces the storage needed for the data and, reduces memory access time, since memory is normally accessed in units of words. Much of this advantage is, however, lost if one must rearrange the items in a word, as the additional processing time may be longer than the memory access times that would be incurred if the data were not packed into words.
For example, image applications often involve a large number of small data words. For example, black and white images are often represented as arrays of pixel values in which each pixel is an 8-bit integer representing the intensity of one point in the image. A 1000xc3x971000 pixel image requires 1 million bytes of storage. To provide efficient storage and movement of image data, the data is often packed into larger words. For example, in a computer system optimized for 64 bit data words, 8 pixel values may be packed into each 64-bit word. This allows 8 pixels to be moved from memory in a single memory cycle, as well as reducing the number of memory words needed for storing the image. Unfortunately, when computations are to be performed on individual pixel values, some form of unpacking operation must be utilized to isolate the individual value from the remaining pixel values.
In general, a conventional computer processor takes two operands. The operands are usually stored in two registers. The processor performs a transformation on these operands specified by an instruction and then writes the results back to another register. If multiple operands are packed into a register, it is often desirable to be able to rearrange, or permute, these operands within the register. In prior art systems, such rearrangements are time-consuming since the processors are optimized to treat the contents of a register as a basic unit. In general, a specified field in a register may be extracted only by a series of shifting and masking operations. For example, a general permutation of 4 data items in a register typically requires a significant number of instructions on most general-purpose processors.
Broadly, it is the object of the present invention to provide an improved functional unit.
It is a further object of the present invention to provide a functional unit that can generate any permutation, with or without repetitions, of the sub-fields of a data word in a single instruction.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.
The present invention is an apparatus for operating on the contents of an input register to generate the contents of an output register which contains a permutation of the contents of the input register. The apparatus partitions the input register into a plurality of sub-words, each sub-word being characterized by a location in the input register and a length greater than one bit. In response to an instruction specifying a rearrangement of the input register, the present invention directs at least one of the sub-words in the input register to a location in the output register that differs from the location occupied by the sub-word in the input register. The ordering of the sub-words in the output register differ from the order obtainable by a single shift instruction. In the preferred embodiment of the present invention, the invention is implemented by modifying a conventional shifter comprising a plurality of layers of multiplexers. The modification comprises independently setting the control signals for at least one of the multiplexers in at least one of the layers.