The present invention relates to data processing and, more particularly, to instructions executable by data processors. A major objective of the present invention is to provide parallel subword compare instructions that achieve improved processor performance in certain situations common in multimedia applications.
Much of modern progress is associated with advances in computer technology. Typical computers include one or more processors that perform operations on data in accordance with a program of instructions. Associated with each microprocessor is an instruction set, i.e., a set of instructions that the microprocessor can execute. The performance of a processor in the context of a particular application is largely dependent on how efficiently the tasks demanded by the application can be characterized using the processor's instruction set.
As computers have become more powerful, they have been confronted with ever more demanding applications, such as real-time video manipulation. Video is typically presented as a synchronous stream of images. Each image can be described as a two-dimensional array of picture elements (pixels). Each pixel is typically described by one multi-bit (e.g., 8-bit) color value per color dimension. Many applications, e.g., video compression, group the pixels into blocks (e.g., 8×8-pixel blocks).
Common binary image operations (such as comparisons between blocks of the same or different images) can require 2×64×8 (two blocks, 64 pixels per block, 8 bits per pixel) bits of operand data to be handled concurrently. Using the increasingly prevalent 64-bit processors, a binary image block comparison can be implemented using eight pairs of 64-bit registers for the operands, with additional registers being used for storing intermediate and final results.
In all, sixty-four comparisons are called for. Each comparison results in one bit of information. In total, the sixty-four comparisons can result in a single 64-bit word that can be stored in a single result register. How efficiently these sixty-four comparisons can be made and combined is dependent on the instruction set associated with the microprocessor. When a word compare instruction is used, operands with eight data bits and filled out with fixed values must be generated. Then words are compared. Then a one-bit value is stored in the result register at a suitable bit position. Successive results must be stored in appropriate positions in the result register without erasing prior results. Thus, sixty-four compare instructions and many other instructions are required to complete an image block compare using word compare instructions.
Some processors provide for a parallel subword compare instruction. As applied to a pair of 64-bit registers, eight pairs of 8-bit subwords stored in the registers can be compared in parallel to generate an 8-bit result that can be, for example, stored in the eight least-significant bit positions of a result register. This result can be shifted to more-significant bit positions to make room for the results of parallel subword compare operations on other pairs of registers. To complete an image block compare, eight parallel subword instructions, seven shift instructions and seven OR instructions (to combine results) are required, for a total of twenty-two instructions.
The total number of instructions is dependent on the particular implementation of the parallel subword compare instruction. For example, an alternative parallel subword compare instruction, used in the Intel Itanium processor, can store eight replicas of each subword compare result in the corresponding subword location of the result register. In this case, many more ancillary instructions are required to arrange the results of the eight parallel subword compare instructions in a single result. Accordingly, the total number of instructions required for an image block compare might be more than double that given for the first example.
A related patent application, Ser. No. 10/403,977 filed 2003 Mar. 31, discloses parallel subword compare instructions that cause results to be stored at different subword locations with result registers so that results do not have to be shifted before they are combined. In a targeting approach, an instruction can specify a subword location for a result; in a shifting approach, the previous contents of a result register are shifted to a new subword location so that the results of a current operation can be stored in the original location without losing the previous results. In a replicating approach, complete replicas of the results are stored at multiple subword locations. Results from multiple parallel subword instructions can be combined using OR or MIX instructions without separate “shift instructions”. This results in a ⅓ to ⅔ saving in the number of instructions required to combine a series of parallel compare instructions.
While the foregoing approach provides a substantial advance in computer performance, further advances are desired.