The Discrete Cosine Transform
A two dimensional DCT converts an input matrix of N×N spatial domain elements to a matrix of N×N DCT coefficients. In many standard video compression schemes N=8. A two dimensional DCT may be implemented by applying a one dimensional DCT over the rows of the input matrix to provide a row-transformed matrix and them applying a one dimensional DCT over the columns of the row-transformed matrix. Those of skill in the art will appreciate that the same result is achieved when applying a one dimensional DCT over the columns of the input matrix to provide a column-transformed matrix and then applying a one-dimensional DCT over the column-transformed matrix to provide the matrix of N×N DCT coefficients.
A plurality of fast algorithms for performing the DCT was introduced by C. Loeffer, A. Ligtenberg and G. S. Moschytz (“Practical fast 1−d DCT algorithms with 11 multiplications”, in Proceedings of ICASSP 1989, pp. 988-991.). Loeffer et al suggest a four stage DCT for conversion of input matrix of 8×8. Each stage is executed in series while the calculation within each stage may be calculated in parallel. The mentioned above article includes a graphical illustration of four-stage algorithms. One of their algorithms may also be illustrated by the following sets of equations:A0=I0+I7; A1=I1+I6; A2=I2+I5; A3=I3+I4; A4=I3−I4; A5=I2−I5; A6=I1−I6; A7=I0−I7.B0=A0+A3; B1=A1+A2; B2=A1=A2; B3=A0−B3; B4,B7=ROT(C3)[A4,A7]; B5, B6=ROT(C3)[A5,A7];C0=B0+B1; C1=B0−B1; C2,C3=ROT(√{square root over (2)}C1)[B2,B3]; C4=B4+B6; C5=B7−B5; C6=B4−B6; C7=B5+B7;O0=C0; O1=C7+C4; O2=C2; O3=√2C5; O4=C1; O5=√2C6; O6=C3; O7=C7−C4
Whereas each of these equation sets corresponds to a single stage, I0–I7 are the inputs signals to the DCT transform, O0–O7 are further divided by the constant √{square root over (8)} to provide the outputs of the DCT, A0–A7 are the intermediate results of the first stage of the DCT; B0–B7 are the intermediate results of the second stage of the DCT; C0–C7 are the intermediate results of the third stage of the DCT; and the ROT denoted a rotation operation. A rotation operation by k C n is illustrated by the following equations, whereas E1 and E2 are the inputs of the rotation operation while F1 and F2 are the outputs of the rotation operation:F1=E1*k*cos(nπ/2N)+E2*k* sin(nπ/2N)F2=−E1*k*sin(nπ/2N)+E2*k*cos(nπ/2N)
The upper equation is also referred to as UP_ROT, whereas the lower equation is also referred to as LOW_ROT.
Loeffer et al suggested four other first stages that are illustrated by the
following equation sets:A0=I0+I7; A1=I1+I2; A2=I1−I2; A3=I3+I4; A4=I3I4; A5=I5+I6; A5=I5−I6; A7=I0−I7.A0=I0+I7; A1=I5+I1; A2=I6+I2; A3=I3+I4; A4=I3−I4; A5=I1−I5; A6=I2−I6; A7=I0−I7.A0=I0+I3; A1=I6+I1; A2=I5+I2; A3=I0−I3; A4=I7+I4; A5=I2−I5; A6=I1−I6; A7=I4−I7.A0=I0+I4; A1=I6+I1; A2=I5+I2; A3=I3+I7; A4=I0−I4; A5=I2−I5; A6=I1−I6; A7=I3−I7.
For convenience of explanation these first stages are referred to as S12, S13, S14 and S15 respectively.
Loeffer et al further suggests to reverse the second and third stages of the even part (the part that calculates outputs O1, O3, O5 and O7). They also suggested sixteen additional combinations of a sequence that includes the second, third and fourth stages of the even part.
It is noted that the inverse DCT (IDCT) can be implemented by the same algorithms, but in reverse order.
Video Compression Schemes
Digital video must be extensively compressed prior to transmission and storage, as each picture includes multiple pixels, and each pixel has three color-difference multi-bit values.
Standard compression schemes (such as the MPEG compression standards, JPEG, H.263 and others) utilize multiple compression techniques to achieve a very significant compression ratio.
JPEG compression scheme includes the following steps: (i) color space conversion—converting a matrix of RGB pixel values to a matrix of luminance and chrominance values (YUV); (ii) Spatial transform of applying a Discrete Cosine Transform (DCT) upon the YUV matrix to provide a matrix of frequency coefficients, each frequency coefficient describes how much of a given spatial frequency is present; (iii) quantization, during which each spatial coefficient is divided by a quantizing factor such that small spatial coefficients are truncated to zero; (iv) zig-zag scanning and run-length coding the quantized matrix, for achieving a compressed representation of the quantized matrix, as a typical quantized matrix includes many zero-value coefficients; and (v) variable length coding, such as Huffman coding, to provide a compressed matrix.
K. Froitzheim and H. Wolf, “A knowledge based approach to JPEG acceleration”, IS&T/SPIE Symposium on Electronic Imaging: Science and Technology. San Jose, USA, Feb. '95. Offers an algorithm to reduce the complexity of JPEG compliant image processing. Froitzheim et el suggests to reduce the amount of operations involved in calculating IDCTs by applying the IDCT transform only to rows or columns of a matrics of DCT coefficient (or semi-transformed matrix) that include at least one non-zero element. Accordingly, before applying a row-wise IDCT, the rows of the matrix are checked to locate rows that include at least one non-zero element. Before applying a column-wise IDCT, the columns of the matrix are checked to locate columns that include at least one non-zero element.
Various compression schemes (such as the MPEG compression schemes) further improve compression rates by addressing a sequence of video frames and taking advantage of the temporal redundancy. Accordingly, compressed video includes target video elements (such as 8×8 blocks, slices, or frames) that may be predicted by a reference video element and additional information representative of the difference between the reference video element and the target video element. This prediction is also termed motion compensation.
For example, MPEG-2 standard defines three types of frames, I-frames, B-frames and P-frames. I-frames are independent in the sense that they include the entire information that is required for displaying a picture. A P-frame is decoded in response to information embedded within a previous frame, while a B-frames is decoded in response to information embedded within both a preceding and succeeding frame. The prediction is done in the picture domain on an 8×8 block basis. Each 8×8 target block is compared to the content of the reference frame (the previous frame in the case of the P-frame) to find the best matching group of 8×8 reference elements (e.g.—the reference block). The offset between each 8×8 target block and the reference block is embedded within a motion vector. It is noted that the reference block may not be aligned with the 8×8 blocks of the reference frame, and may intersect with up to four blocks (e.g.—reference blocks).
N. Merhav and V. Bhaskaran acknowledge that in order to perform various manipulations on compressed video streams there is a need to include motion compensation elements within blocks. They suggest an algorithm for motion compensation, at “A Fast Algorithm for DCT-Domain Motion compensation”, HP Labs Technical Reports, HPL-95-I7, 1995 and at U.S. Pat. No. 5,708,732. As noted above, the target block may intersect with multiple reference blocks. In the case of 8×8 blocks, the horizontal offset (as well as the vertical offset) between the target block and a reference blocks can vary between 0 and 7. The offset is embedded within the motion vector. Merhav and Bhaskaran suggest to implement the insertion of motion compensation elements by performing matrix multiplications, whereas some of the matrix are pre-calculated and are selected in response to the horizontal as well as the vertical offset between the target block and an reference block. It is noted that this algorithm is based upon multiple matrix multiplications and is relative complex and resource consuming.