International Standards followed for viewing movies pictures and video signals according to the MPEG format define the overall need for encoding and decoding video signals, which representation may take the form of successive individual pictures. Each picture may be treated as a two-dimensional array comprising picture elements, known as pels. As is conventionally known and as shown in FIG. 1 as prior art, a sequence of operations 1 must be performed before moving pictures can be viewed. A source 2 of video images that are unencoded may exist in a variety of forms, such as the CCIR 601 format. As is conventionally known by International Standards, the input video signal is digitized and represented according to luminance and two color difference signals (Y, Cr, Cb). Some type of preprocessing 3 is done on the source 2 to convert the video data into an appropriate resolution for subsequent encoding 4. For example, subsampling of the color difference signals (Cr, Cb) is done with respect to the luminance by a 2:1 ratio in both the vertical and horizontal directions. The signal is then reformatted, if necessary, as a non-interlaced signal. During encoding 4, a picture type must be determined for each picture in the sequence. The encoder may then estimate motion vectors for each 16-by-16 macroblock in a picture. Depending upon the picture-type used, one or more vectors are needed, and reordering of the picture sequence is necessary prior to encoding. After the encoding process, the bitstream may be transferred to a storage medium 5. In order to view the moving picture, a decoder 6 must be used to access the video bitstream. Subsequent to decoding, postprocessing 7 of the video signal is done in order to display 8 the moving pictures.
Although International Standards require the encoder to be aware of both the capacity of the decoder buffer, and the need for the decoder to match the rate of the media to the rate of filling a buffer for holding successive pictures, the International Standards do not specify the encoding process. Rather, it merely indicates the syntax and semantics of the bitstream and the signal processing done in the decoder. As such, there are many options available for encoding the video signal.
In general FIG. 2 shows a fictional block diagram for an encoder. As shown in FIG. 2, the modules of interest include a discrete cosine transform (DCT) 10, inverse discrete cosine transform (DCT−1) 12, quantization (Q) 14, inverse quantization (Q−1) 16, and variable length coding (VLC) 18.
In a digital system, quantization, which may be represented as a matrix table Z[i], means the division of a range of values into a single integer, code or classification, as indicated by the following Equation (1):                                           Z            ⁡                          [              i              ]                                =                                    d              ⁡                              [                i                ]                                                    q              ⁢                                                           ⁢                              ρ                ⁡                                  [                  i                  ]                                                                    ,                              w            ⁢                                                   ⁢            h            ⁢                                                   ⁢            e            ⁢                                                   ⁢            r            ⁢                                                   ⁢            e            ⁢                                                   ⁢            i                    =                      0            ⁢                                                   ⁢            …            ⁢                                                   ⁢            63                          ,                            (        1        )            where d[i] represents a block of data in matrix format, and qρ[i] represents the quantization matrix table taken either from a default table of standard values or constructed as needed by one of ordinary skill in the art. The variable i represents the number of data entries within the respective matrix, and is selected to range from 0 to 63 in order to represent 64 bits for illustrative purposes. At the register level, this division operation is realized with a multiplication operation followed by a shift operation, as indicated by following Equation (2):                                           Z            ⁡                          [              i              ]                                =                                    d              ⁡                              [                i                ]                                      ×                          1                              q                ⁢                                                                   ⁢                                  ρ                  ⁡                                      [                    i                    ]                                                                                      ,                              w            ⁢                                                   ⁢            h            ⁢                                                   ⁢            e            ⁢                                                   ⁢            r            ⁢                                                   ⁢            e            ⁢                                                   ⁢            i                    =                      0            ⁢                                                   ⁢            …            ⁢                                                   ⁢            63                          ,                            (        2        )            where 1/qρ[i] is a multiplier, that is, a number less than one represented as a fixed point found by taking the inverse function of qρ[i] and completing a shift up (<<) by a certain number of bits. Thus, the inversion operation may be represented according to the following equation,                                                         q              ⁢                                                           ⁢                              ρ                ⁡                                  [                  i                  ]                                                      _                    =                                    1                              q                ⁢                                                                   ⁢                                  ρ                  ⁡                                      [                    i                    ]                                                                        ⁢                          <<              shift                                      ,                            (        3        )            where the shift up of a certain number of bits is done in order to represent an integer value. Accordingly, Equation (1) may be realized by,Z[i]=d[i]×{overscore (qρ[i])}>>shift,  (4)where a shift down (>>) of a certain number of bits is performed in response to the shift up previously implemented for the inversion in Equation (3).
The present inventors have observed that conventional encoders and quantization systems lack speed and require more instruction cycles than are necessary. It would thus be ideal to improve the speed of encoding by using faster processors for computing the quantization. Certainly, by using a faster processor, as for example a parallel processor, fewer instruction cycles are required to perform the quantization of Equation (4). What is needed is a way to implement the quantization using the physical constraints of the parallel processing architecture.
Yet, further problem encountered with using parallel processors concerns performing the quantization so as to achieve a high precision of coefficients. With conventional parallel processors, this challenge stems from the physical processor constraint whereby the multiplier and shift operations and registers cannot exceed a certain number of bits. Accordingly, what is needed is a way to obtain the highest precision of the quantizer implemented with parallel processors by using the largest multipliers, that is, with a maximum number of bits, followed by the largest feasible shift in bits.