1. Field of the Invention
The present invention relates to computers that have a superscalar architecture or that can execute multiple instructions in parallel. More specifically, the present invention relates to the implementation of the operation of generating the absolute value of a signed value by a computing device using instructions processed in parallel.
2. The Prior Art
In computing devices, there is an increasing need for better performance in the implementation of video signal processing functions to meet the demand for video conferencing, 3-D visualization, animation, electronic publishing, video service, etc. To adequately provide these graphics capabilities, the computing device must signal process very large quantities of data with a very fast throughput to implement real time processing of video images, as well as image compression and decompression. For example, in a "multi-media" desktop, it is imperative that processors accommodate high speed graphics, video processing, and image compression/decompression to execute multi-media applications.
The amount of information contained in graphics or video signals for manipulation by signal processing is not trivial. For example, a digital NTSC signal generates approximately 10.4 million pixels per second. Since each pixel contains information for three colors, the total amount of information is more than 30 million pieces of data per second. At a CPU clock rate of 200 megahertz, only 20 clock cycles are available for processing each pixel. This results in less than 7 clock cycles available per color component.
It is difficult to implement real time processing of video images, as well as image compression and decompression because graphics and video processing may include several operations on a single color component of a single pixel. Accordingly, sheer throughput is not enough for efficient manipulation of the video image in real time.
This throughput problem can be lowered by processing the graphics or video data in parallel. For example, computing devices having a superscalar architecture are able to execute several instructions concurrently. As such, a single signal processing operation may be executed in one clock cycle because the instructions directing the computing device to implement the signal processing operation may be executed in parallel. The increase in speed achieved by the superscalar architecture, however, depends upon the fact that the instructions directing the computing device to perform the desired signal processing operation are capable of being executed in parallel.
When the computing device is directed to implement the desired operation with a branch instruction, the advantages of the superscalar architecture provided by parallel processing are not being used to their fullest extent even when the superscalar architecture provides for dynamic branch prediction. When a branch instruction is one of the instructions employed to direct the computing device to perform an operation, the instructions following the branch instruction are dependent upon the branch instruction, and cannot be executed until the branch has been executed.
Further, it does not matter whether the branch is taken or not, because the branch instruction must still first be executed prior to the execution of the following instructions. Accordingly, directing a computing device with a superscalar architecture to perform a signal processing operation by using a branch instruction is not an effective way to increase the signal processing throughput of the computing device.
In the real time processing of video images, as well as image compression and decompression video signal processing, there are several operations that the computing device is repeatedly instructed to perform. Because these operations are performed repeatedly, the instructions that are used to direct the computing device to perform these operation should be those that direct the computing device to perform the desired operation most efficiently. To do otherwise, will significantly degrade the signal processing capability of the computing device.
In real time processing of video images, one of the operations that must be made repeatedly in performing digital signal processing is setting signed values to their absolute value. For example, the dominant operation for video compression algorithms such as MPEG 2 or H261 is motion estimation. Most motion estimation takes advantage of the minimal changes in the position of images from one frame to the next. To perform motion estimation, hundreds of comparisons for a region of an image to determine a motion value that minimizes the estimation error must be made. The error is calculated by summing the differences for each pixel in the region between a reference frame and an a newer frame. The signal processing required to determine the area error includes the performance of subtractions, additions, loads, and the setting of assigned values to their absolute values.
In the prior art, the instructions directing the computing device to perform the operation of setting a signed number to its absolute value have included a branch instruction. As discussed above, using a branch instruction to direct a computing device that can execute instructions in parallel is inefficient, and as such, does not take advantage of the critical increase in the throughput of data in video signal processing provided by a parallel processing computing device. Accordingly, there is need to implement the operation of generating the absolute value of a signed value by a computing device using instructions that can be processed in parallel.