1. Field of the Invention
This invention relates to a method and apparatus for optimizing overflow checking and reduction of data signals represented as signed 16-bit integers.
2. Description of Related Art
The explosion of graphics, audio, and video ("multimedia") related applications in computer systems has fueled efforts in improving processor efficiency with regard to processing multimedia signals. Multimedia signals include audio and pixel ("picture elements") signals, among other things, and which may be sufficiently represented using binary data of no more than eight bits of resolution. Binary data having greater widths may also be used but are often limited to intermediate results for advanced data manipulation since such data formats lead to an increase load on instruction execution, resulting in slower rates of data manipulation by a processor.
A computer system running a video application may represent color pixels through four signed 16-bit signals, with each signed 16-bit signal representing the three primary color values of red, green, and blue and an intensity value. This translates to a large amount of data signals required to represent an image for display on a computer screen even when restricting pixel data widths to eight bits. For example, to display a digital NTSC video signal in real-time on a computer monitor requires a pixel rate of 10.4 million pixels per second. With three data signals to manipulate per pixel, this translates to about 30 million pieces of data to manipulate per second. A processor clock rate of 200 million MHz would only have 20 clock cycles available for processing each pixel which is less than seven clock cycles for each primary color value.
Manipulating pixel data that are represented using signed 16-bit integers usually requires that the resulting pixel data remain within the maximum negative and positive boundaries of a signed 16-bit integer. A signed 16-bit integer in two's compliment format has a maximum range boundary of 32767 and a minimum range boundary of -32767. For example, when scaling or rotating images, it is necessary to combine the incoming signal being processed with other internally generated signal data in order to obtain the resulting pixel data. This ensures that if an overflow state does occur, the resulting pixel data is reduced to a value supported by the data format in which the pixel is represented. If either of the range boundaries is exceeded by the resulting pixel data, the resulting pixel data is reduced to within the maximum or minimum range boundaries of 32767 and -32767, respectively.
In the past, checking resulting pixel data for an overflow condition included using conditional branches. For example, in one such method branch operations in the programming language "C" are used in the following manner. ##EQU1## The resulting pixel data, which is represented as the variable "dst," is compared with the upper range boundary of 32767. If the resulting pixel data exceeds the upper range boundary, then the upper range boundary value is transferred into the resulting pixel data. Otherwise, a conditional branch occurs which bypasses the execution of the second operation. The third operation compares the resulting pixel data with the lower range boundary of -32767. If the resulting pixel data exceeds the lower range boundary, then the lower range boundary value is transferred into the resulting pixel data. Otherwise, another conditional branch occurs which bypasses the execution of the third operation, i.e., the resulting pixel data falls within the range boundaries.
The use of conditional branches in a superscalar pipelined processor decreases processor execution throughput because the branches interrupt the pipeline processing of instructions. Also, conditional branches usually require processors to perform a memory fetch from intermediate or main memory in the event of a cache miss. Since intermediate or main memory is typically much slower than an instruction register which is used to process the instructions, the time to process the conditional branches takes much longer to complete than instructions that do not require fetches from intermediate or main memory. Thus, not only does the processor incur an increase in fetch latency but it also takes an efficiency hit due to the fact that the pipelining of instructions has been interrupted by the branches.
Accordingly, it would be desirable to provide an apparatus that ensures resulting pixel data remain within the range boundaries of a signed 16-bit integer without the use of conditional branches in an instruction. This advantage is achieved by performing two shift operations, two logic multiplications, one addition, one load, and one logic addition to obtain a result that is within the range boundaries of a signed 16-bit integer, improving the instruction throughput of a processor.