1. Field of the Invention
The present invention generally relates to apparatus and a method for performing a decoding process, and more particularly, to apparatus and a method for performing vedio decoding process in parallel.
2. Description of Related Art
According to most video standards, e.g., JPEG, MPEG-1, MPEG-2, H.263, the VLD, IZ, IQ, and IDCT processes are required for video decoding. The VLD process is to look up a table according to codewords retrieved from a data stream, so as to decode the codewords and obtain DC or AC coefficients representing the codewords. The IZ process is to arrange the decoded coefficients into an N×N block in an appropriate order. The IQ process is to amplify and recover the decoded coefficients so as to obtain real DC/AC coefficient values. The IDCT process is to transform the N×N block from coefficient values in a frequency domain into pixel values in a space domain. Among all of the above video decoding processes, the VLD and IDCT processes are most time consuming.
In order to improve the decoding efficient according to the aforementioned video decoding characteristics, some algorithms focus on how to improve the method of looking up table with respect to the VLD decoding process or the calculation of the IDCT process. However, although those algorithms may accelerate the decoding speed, the efficiency improved thereby is still not satisfactory.
Another conventional method is to perform the video decoding processes in parallel for shortening the decoding time in accordance with the characteristic that a processor supports a very long instruction word (VLIW), instead of concerning to accelerate the decoding speed. A typical parallel processing structure is usually constructed to perform the VLD, IZ, IQ processes in parallel.
In detail, in a video decoding process, the VLD process is to retrieve codewords from a data stream and decode a set of Run Level values including a Run value and a Level value by looking up an appropriate table, in which the Run value represents an amount of 0 appeared before the coefficient, and the Level value represents a value of the coefficient, and therefore a DC or AC coefficient represented by the codewords can be calculated accordingly.
Taking an 8×8 block as an example, when performing a VLD process and supposing that the data stream is 111011010010 . . . , the first codeword retrieved from the data stream would be 1110. Table 1 as shown below is taken as a reference table to be looked up, so as to obtain a Run value of 2, and a size value of 3 corresponding to the codeword, in which the size value of 3 indicates that the value of the last 3 bits is the Level value. In such a way, the coefficients are obtained as 0, 0, and 6. Next, the second codeword retrieved from the data stream is 100, which can be decoded to obtain a Run value of 1 and a size value of 2, in which the size value of 2 indicates that the value of the last 2 bits is the Level value. Accordingly, the decoded coefficients are increased as 0, 0, 6, 0, 2. Likewise, the decoding process is repeated until the whole 8×8 block, including 1 DC coefficient and 63 AC coefficients, are all decoded. Generally, there are a large amount of combinations of the Run value and Size value, and therefore the reference table usually contains a large amount of data. Moreover, the way to look up coefficients one by one and compare data usually consumes a relatively long time.
TABLE 1Run/SizeCodeword0/1000/20101/10111/21002/11012/21102/31110
After the VLD process is completed, the IZ and IQ processes are then performed. A main objective of the IZ process is to distribute VLD coefficients into the 8×8 block according to the corresponding positions in a zig-zag order table. The IQ process is to amplify and recover the VLD coefficients to real DC/AC coefficients according to the corresponding quantization values in a quantization table. The IZ and the IQ processes relate to simple memory access and arithmetic calculation only, which can be completed together in a single step, so as to accelerate the decoding speed and simplify the complexity of program.
As discussed above, the conventional parallel processing structure is to perform the VLD, IZ and IQ processes in parallel, that is, when a cluster of a processor performs a VLD process to an Nth coefficient, another cluster of the processor performs an IZ process and an IQ process to the N−1th coefficient, in which N is a positive integer.
For example, FIG. 1 is a schematic diagram illustrating a conventional parallel processing structure for video decoding. FIG. 2 is a schematic diagram illustrating time allocation of the conventional parallel processing structure for video decoding. Referring to FIG. 1 and FIG. 2, in a first stage, cluster 0 firstly decodes a first codeword of the data stream, so as to obtain a Run value and a Level value of the first codeword, and finally provides a calculated coefficient to cluster 1. In the meantime, the cluster 1 has no data for processing, and therefore performs an operation of clearing block.
Then, in a second stage, the cluster 0 decodes a second codeword of the data stream, so as to obtain a Run value and a Level value of the second codeword. In the meantime, the cluster 1 has already obtained the coefficient corresponding to the first codeword from cluster 0, and therefore is subject to perform an IZ process and an IQ process. Because the coefficients obtained by the IZ and IQ processes are coefficient values in a frequency domain, the coefficients are required to be stored in a memory temporarily. When all the coefficients in the 8×8 block have been calculated, the coefficients are processed with the IDCT together, so as to obtain pixel values in a space domain.
As described in the foregoing, although the conventional parallel processing structure is able to achieve the effect of parallel processing so as to accelerate the decoding speed, it still requires a lot of additional time for IDCT process, and therefore the improvement is still limited.