The present invention relates to video signal encoding and decoding, and, more particularly, is directed to high precision encoding and decoding of orthogonally transformed coefficients with efficient compression.
Recently, orthogonal transformation techniques have been used to encode a video signal. One such orthogonal transformation is a discrete cosine transformation (DCT). In a two-dimensional DCT, pictures or images represented in the video signal are divided into blocks having a predetermined pixel count, and then each block is orthogonally transformed into a block of coefficients.
FIG. 1 shows a two-dimensional block of DCT coefficients corresponding to an image block of 8 pixels.times.8 lines. The coefficient F(0,0) corresponds to a direct-current component representing an average luminance value of the two-dimensional block.
Row coefficients such as F(1,0), F(2,0) . . . F(6,0), F(7,0), and similarly F(1,1), F(2,1) . . . F(6,1), F(7,1), represent high-frequency components in the vertical direction of the two-dimensional block. Column coefficients such as F(0,1), F(0,2) . . . F(0,6), F(0,7), and similarly F(1,1), F(1,2) . . . F(1,6), F(1,7), represent high-frequency components in the horizontal direction of the two-dimensional block.
The DCT encoding technique utilizes the two-dimensional correlation within an image to concentrate encoded signal power around a specific frequency component. The amount of information needed to represent the image can be compressed significantly if only coefficients distributed in this concentration are encoded.
For example, in the case of a flat picture, its blocks exhibit good autocorrelation, that is, amplitude levels of the pixels in the block are almost equal to each other. Therefore, DCT coefficients corresponding to the low-frequency components in the block, such as F(0,0), F(1,0), F(0,1), F(1,1), have large values while most of the other coefficients have very low or zero values. Accordingly, Huffman encoding, which compacts series of contiguous identical coefficients, significantly compresses the amount of information needed to represent the image.
A standard for encoding video signals with motion, popularly known as MPEG1, has been defined by ISO-IEC/JTC1/SC2/WG11. In the MPEG1 technique, a picture is either an "intra picture", meaning that it is encoded as a standalone picture, or an "inter picture", meaning that it is predictively encoded relative to at least one other picture.
The structure used for representing a video signal in MPEG1 format will now be explained with reference to FIG. 2.
As shown in FIG. 2, a block layer comprises luminance and chrominance blocks having 8 lines.times.8 pixels.
A macroblock layer comprises the luminance and chrominance blocks grouped into macroblocks (MB), that is, four luminance blocks Y0, Y1, Y2, Y3, and two chrominance blocks Cb and Cr, at the same spatial position of a picture as the luminance blocks. The six blocks in each macroblock are transmitted in the following sequence: Y0, Y1, Y2, Y3, Cb, Cr. Decisions as to what prediction data is to be used and whether or not a prediction error is to be transmitted are made for each of these block units.
A slice comprises a single macroblock or a plurality of macroblocks appearing in the scanning direction of the picture. At the head of a slice, differential values of direct-current component coefficients and motion vectors in the picture are reset. The first macroblock includes data indicating a position in the picture so as to allow for recovery in the event of an error. Accordingly, a slice can have any arbitrary length and start position which can be changed if an error occurs during transmission.
A picture layer comprises frames or fields of an image. A picture includes at least one slice. Each picture is an I (intra field), P (predictive), B (bidirectional) or D picture, depending on the technique used to encode it. An I picture is encoded relative to itself, that is, without motion compensation relative to a previously encoded picture. A P picture is encoded with forward-prediction relative to a previously encoded I or P picture which temporally precedes the P picture being encoded. A B picture is encoded with bidirectional-prediction relative to two previously encoded I or P pictures, which temporally precede and succeed the B picture.
A group of pictures (GOP) layer includes at least one I picture, and may also have at least one non-I picture.
A video sequence layer includes at least one GOP.
The MPEG1 standard defines different techniques for encoding the direct-current (DC) and alternating-current (AC) component coefficients of a two-dimensional DCT coefficient block. Representative MPEG1 techniques for encoding and decoding two-dimensional DCT DC component coefficients in an intra picture encoding process will now be described.
FIG. 3A shows an encoding apparatus comprising a DCT circuit 2, a quantizer 3, a differentiator 4 and a variable length coding (VLC) circuit 5. An input picture 1 is supplied to a DCT circuit 2 as blocks of 8 pixels.times.8 lines. The DCT circuit 2 is adapted to orthogonally transform each block of 8 pixels.times.8 lines into a block of DCT coefficients (e1) which is applied to a quantizer 3 that linearly quantizes the DC component coefficient of each block using a predetermined quantization step width having, in the case of MPEG1, a value of 8 to produce quantized DC component coefficients (e2). In the linear-quantization process, fractions of 0.5 and over are rounded up, while fractions less than 0.5 are disregarded.
The quantized DC component coefficients (e2) are supplied to a differentiator 4 which is adapted to differentiate blocks adjacent to each other, using different techniques for a luminance (Y) block and corresponding two chrominance (Cb and Cr) blocks, to produce differentiated coefficients (e3).
FIG. 4A shows a block diagram of the differentiator 4. An input is applied to a delay and a subtractor. The subtractor subtracts the delayed input from the current input and outputs the result as a differentiated signal.
FIG. 5A shows a differentiation technique for luminance blocks. A DC component coefficient of a luminance block is differentiated from direct-current component coefficients of right, left, upper and lower adjacent blocks in a zigzag order and the differentiated result replaces the DC component coefficient in the respective luminance coefficient blocks.
FIG. 5B shows a differentiation technique for chrominance blocks. DC component coefficients of right and left blocks adjacent to each other are differentiated and the result replaces the original DC component coefficient in the respective chrominance coefficient blocks.
Since a first block, that is, a first block of an I picture or a first block of a slice, cannot be differentiated, a predetermined number is used as an initial value in the delay element of the differentiator 4. In the case of the MPEG1 standard, a value of 128 is used as the initial value.
The differentiated coefficients (e3) are applied to a VLC circuit 5 of FIG. 3A which functions to encode the coefficients using a variable length code to produce an encoded video signal (6).
The VLC circuit 5 uses the differential DC component coefficient value to obtain a corresponding size value, that is, number of bits that will be used to encode the differential DC component coefficient, from a table shown as FIG. 6A. For example, a differential DC component coefficient (e3) with a value of +5 corresponds to a size of 3 bits.
Next, the VLC circuit 5 encodes the size value using, for luminance blocks, a table shown as FIG. 6B, and for chrominance blocks, a table shown as FIG. 6C. Continuing with the example, a size of 3 bits is encoded as 101, for a luminance block, or 110, for a chrominance block.
Then, the VLC circuit 5 encodes the differential DC component coefficient using a fixed-length code from the table shown in FIG. 6A. The fixed length code has a unique code value for each unencoded value. In the example, a differential DC component coefficient (e3) with a value of +5 corresponds to a fixed-length encoded value of 101.
Finally, the encoded differential DC component coefficient value is the result of concatenating the variable-length code representing the number of bits that are used to represent the differential DC component coefficient and the fixed-length code representing the differential DC component coefficient. In the example, for a luminance block, the encoded differential DC component coefficient value is 101101, and for a chrominance block, the encoded value is 110101.
FIG. 3B shows a decoding apparatus comprising a variable length decoding circuit 8, an inverse differentiator 9, an inverse quantizer 10 and an inverse DCT circuit 11. These elements operate in a complementary manner to the corresponding elements shown in FIG. 2A. FIG. 4B shows a block diagram of the inverse differentiator 9.
A problem with the encoding tables defined in the MPEG1 standard is that they do not necessarily cover all coefficient values.
To be more specific, in the one-dimensional DCT processing, an output resulting from the DCT processing is about 2.sqroot.2 times the value prior to the processing. In an intra picture encoding process using the MPEG1 technique, a pixel value of an input picture is in the range 0 to 255 or a number comprising 8 bits. Accordingly, the direct-current component coefficient of a two-dimensional DCT transformation coefficient is in the range 0 to 2047 or a number comprising 11 bits. The range 0 to 2047 is approximately 8 (=2.sqroot.2.multidot.2.sqroot.2) times the range 0 to 255.
In the MPEG1 technique, a value with this 11-bit precision always undergoes a linear quantization process for transformation into an 8-bit number in the range 0 to 255, thus reducing its precision to 8 bits, and is then differentiated. Accordingly, the table shown as FIG. 6A provides numbers in the range -255 to +255. That is, a fixed encoding precision of 8 bits for DCT DC component coefficients reduces the quality of a high grade picture encoded with the MPEG1 technique.
For an input picture having an eight-bit precision, simply enhancing the encoding precision of the DCT DC component coefficients from the conventional eight bits to a higher precision such as eleven bits results in inefficient encoding in some cases. To be more specific, if an encoding technique with a precision of, for example, eleven bits, is applied to a poor-gradation picture-quality requirement which can be satisfied adequately using an eight-bit precision, unnecessary codes are inevitably output.
Thus, known encoding techniques for a high quality video signal either degrade the picture or result in inefficient compression of the encoded picture.