The sign(x) function: ##EQU1##
is encountered in many common applications. PA1 F[v][u]=F'[v][u] .A-inverted.u, v except u=v=7 ##EQU6##
In applications involving the use of single instruction single data (SISD) processors, the sign(x) function is frequently implemented as a series of logical tests implemented as individual processor instructions, e.g., a greater than test followed by an equals test. If the output of any test in the series is true, the next test need not be performed since, in a SISD embodiment, the output of the sign(x) function can be generated from a true outcome of any one of the logical tests (&gt;, =, &lt;) used to implement the function.
Accordingly, using a common SISD processor, the sign(x) function can be determined with relative ease using software and hardware supported logic tests. For this reason, among others, application designers have felt little need to avoid the use of the sign(x) function when designing applications including, for example, video processing operations.
One standard for the coding of motion pictures, commonly referred to as the MPEG-2 standard, described in ISO/IEC 13818-2 (Nov. 9, 1994) Generic Coding of Moving Picture and Associated Audio Information: Video (hereinafter referred to as the "MPEG" reference), relies heavily on the use of discrete cosine transforms, data quantization and motion compensated prediction to code video data. In this patent application, references to MPEG-2 compliant data streams and MPEG-2 compliant inverse quantization operations are intended to refer to data streams and inverse quantization operations that are implemented in accordance with the requirements set forth in the MPEG reference.
The MPEG reference describes in detail the processes involved in decoding a video bitstream that is compliant with the MPEG-2 standard. Many processes are involved in the decoding of a video bitstream. Important to the development of low cost video decoders, are methods for efficient implementation of these processes. One of these process involved in decoding an MPEG-2 image is called inverse quantization.
Quantization is the process that is used in the digital processing of signals, e.g., video encoding, in which an element from a finite set of digital codewords is used to represent approximately, the value of a sampled signal. The digital codewords that are produced by the quantization process for an input sample represent an approximation of the original amplitudes of the signal being processed.
Inverse quantization is the opposite process of quantization. The inverse quantization process takes as its input a digital codeword from a finite set of codewords and produces a so called reconstruction level that is an approximation of the original amplitude of the sample.
The MPEG-2 standard defines methods for the inverse quantization of DCT coefficients. A significant problem encountered when trying to implement the MPEG-2 inverse quantization process is the computation of the sign(x) function required for inverse quantization.
The inverse quantization of one 8.times.8 block of coefficients, in accordance with the MPEG-2 standard, is described by equations (2)-(6) below. ##EQU2##
where: ##EQU3##
QF[v][u] is a two dimensional array of digital codewords or quantized DCT coefficients, W[w][v][u] is a quantizer matrix, and quantizer_scale is a common scaling factor used for one or more macroblocks. The parameters v and u are used to index each DCT coefficient and the parameter w depends upon the coding type (INTRA or NON-INTRA) and the color component (luminance or chrominance). Following this step, the results undergo a saturation stage to ensure that the reconstructed values lie within the allowed range. This is shown in the equation 5 below. ##EQU4##
The final step in the inverse quantization process is to perform the mismatch control as shown below: ##EQU5##
The steps that are described by equations (2)-(6) are required for an inverse quantization process that is truly compliant with the MPEG-2 standard. Table I, illustrated in FIG. 1, shows the approximate number of discrete operations that are required to perform one particular known MPEG-2 inverse quantization operation on a block of 64 coefficients representing 64 values to be processed. Note that in Table 1, it is assumed that 2 compare operations are used to implement the sign(x) function for each processed coefficient.
Notably, while the mismatch control operation expressed as equation (6) appears to be the most complicated of all the steps in the MPEG-2 inverse quantization processes, it actually requires the least amount of computation, about 10% of the total. While the sign(x) function appears to be much less complicated than the mismatch control, the cost in terms of required computations for that function is about 20% of the total number of computations required.
Accordingly, when attempting to reduce the number of computations required to implement an inverse quantization operation, the sign(x) function presents an area where there is potential for improvement in terms of the number of computations which need to be performed.
To increase computational efficiency and through put, single instruction, multiple data, (SIMD) processor designs and systems are becoming more common. SIMD architectures allow the processing of multiple data elements simultaneously by treating a single n bit word as comprising, e.g., k, multiple distinct sub-words which are to be processed separately. A well-designed SIMD architecture system allows considerable performance advantages of more traditional Single-Instruction Single Data (SISD) architecture systems. An example over a SIMD architecture is the MMX technology that is currently in usage in the microprocessor area.
For purposes of explanation, suppose that there is a system based on a SIMD architecture that operates on four data samples at the same time. In such a system the data samples would have to be presented to the processing unit in the arrangement shown in the diagram of FIG. 2. Here, one word that is n-bits in length, contains four sub-words, each n/4-bits in length. Accordingly, even though one n-bit word is presented, e.g., to the processor, there are actually four pieces of data that are embedded in that word. When presented to the SIMD processing unit, each of these quarter-words is treated independently of the others. The independent processing of data elements included in a single word is one of primary features of SIMD processing.
As an example of SIMD processing, suppose that it is desired to multiply two sets of numbers, {a, b, c, d} and {e, f, g, h} to produce {a.multidot.e}, {b.multidot.f}, {c.multidot.g} and {d.multidot.h}. In the exemplary SIMD architecture, it is possible to set up two data elements similar to the ones shown in FIG. 4. One of these would contain the set {a, b, c, d} and the other would contain the set {e, f, g, h}. They may be presented to the SIMD processing unit for the desired multiplication. The processing unit will treat the four quarters of the input data words as independent quantities during the computation. An important consequence of this is that if the multiplication for any of the quarters overflows, the overflow will not affect the adjacent quarter. The four multiplications occur simultaneously which provides a tremendous increase in performance over a SISD processing unit operating at the same clock rate. It can be seen from this example that the SIMD architecture is extremely beneficial for processing multiple pieces of data in parallel.
Implementing the sign(x) function in a SISD processor embodiment as a series of processor instructions is relatively straight forward. However, it becomes comparatively complicated to implement the sign(x) function in a SIMD processor environment.
The complexity of implementing the sign.(x) in a SIMD architecture results from the fact that a true result of a SIMD (&lt;, =, or &gt;) operation applied to the elements of an n-bit word may result in different outcomes for each of the n-bit subwords. Accordingly, when implementing a sign(x) function in a SIMD processor, usually at least two logic tests, each requiring one processor clock cycle, must be performed to determine the appropriate value for each of the sub-words in an n-bit word. Thus, when performing a sign(x) operation in a SIMD environment using software and conventional processor logic operations, it usually requires two or more processor clock cycles to generate the desired sign(x) output.
In the case of video decoding, and particularly real time video decoding, it is desirable to reduce the number of clock cycles required to decode a video signal thereby increasing throughput for a given processor speed. Accordingly, particularly in video decoder embodiments, it is desirable to implement the sign(x) function in a manner that requires the minimum possible number of clock cycles for the function to be performed.
In view of the above discussion, it becomes apparent that there is a need for new and improved methods of implementing the sign(x) function. It is desirable that any new methods be capable of performing the sign(x) function efficiently, in terms of the number of processor instructions which must be performed. It is also desirable that the sign(x) function be capable of being performed using relatively few processor clock cycles. In addition, it is desirable that any new methods and apparatus for implementing the sign(x) function be well suited for use in SIMD architectures and SIMD processors in particular.
New SIMD and SISD processor instructions capable of taking advantage of the processing capabilities of any new methods and apparatus are also desirable.