The present invention relates to data processing and, more particularly, to instructions executable by data processors. A major objective of the present invention is to provide parallel subword compare instructions that improve processor performance in certain situations common in multimedia applications.
Much of modern progress is associated with advances in computer technology. Typical computers include one or more processors that perform operations on data in accordance with a program of instructions. Associated with each microprocessor is an instruction set, i.e., a set of instructions that the microprocessor can execute. The performance of a processor in the context of a particular application is largely dependent on how efficiently the tasks demanded by the application can be characterized using the processor's instruction set.
As computers have become more powerful, they have been confronted with ever more demanding applications, such as real-time video manipulation. Video is typically presented as a synchronous stream of images. Each image can be described as a two-dimensional array of picture elements (pixels). Each pixel is typically described by one multi-bit (e.g., 8-bit) color value per color dimension. Many applications, e.g., video compression, group the pixels into blocks (e.g., 8×8-pixel blocks).
Common binary image operations (such as comparisons between blocks of the same or different images) can require 2×64×8 (two blocks, 64 pixels per block, 8-bits per pixel) bits of operand data to be handled concurrently. Using the increasingly prevalent 64-bit processors, a binary image block comparison can be implemented using eight pairs of 64-bit registers for the operands, with additional registers being used for storing intermediate and final results.
In all, sixty-four comparisons are called for. Each comparison results in one bit of information. In total, the sixty-four comparisons can result in a single 64-bit word that can be stored in a single result register. How efficiently these sixty-four comparisons can be made is dependent on the instruction set associated with the microprocessor. When a word compare instruction is used, operands with eight data bits and filled out with fixed values must be generated. Then words are compared. Then a one-bit value is stored in the result register at a suitable bit position. Successive results must be stored in appropriate positions in the result register without erasing prior results. Thus, sixty-four compare instructions and many other instructions are required to complete an image block compare using word compare instructions.
Some processors provide for a parallel subword compare instruction. As applied to a pair of 64-bit registers, eight pairs of 8-bit subwords stored in the registers can be compared in parallel to generate an 8-bit result that can be, for example, stored in the eight least-significant bit positions of a result register. This result can be shifted to more-significant bit positions to make room for the results of parallel subword compare operations on other pairs of registers. To complete an image block compare, eight parallel subword instructions, seven shift instructions and seven OR instructions (to combine results) are required, for a total of twenty-two instructions.
The total number of instructions is dependent on the particular implementation of the parallel subword compare instruction. For example, an alternative parallel subword compare instruction, used in the Intel Itanium processor, can store eight replicas of each subword compare result in the corresponding subword location of the result register. In this case, many more ancillary instructions are required to arrange the results of the eight parallel subword compare instructions in a single result. Accordingly, the total number of instructions required for an image block compare might be more than double that given for the first example.
In either case, the number of instructions required for implementing an image block compare is larger than desired. If the instructions are performed serially, the time consumed is excessive. If the instructions are performed in parallel, the ability of the processor to perform other operations in parallel with the image block compare operation is limited.
Of course, if a 16-register parallel subword compare instruction were available, the image block compare could be performed in one instruction. However, such an instruction would be complex to implement from a hardware standpoint and perhaps overly specialized from a software standpoint. What is needed is a parallel subword compare instruction that permits more efficient image block compares, as well as related operations.