The present invention relates to a method and an apparatus for encoding a video signal; and, more particularly, to a method and an apparatus for effectively encoding texture information of the video signal based on the re-formed shape information according to the encoding type selected to encode the texture information.
In digital video systems such as video-telephone and teleconference systems, a large amount of digital data is needed to define each video signal since the video signal comprises a sequence of digital data referred to as pixel values. Since, however, the available frequency bandwidth of a conventional transmission channel is limited, in order to transmit the substantial amount of digital data therethrough, it is necessary to compress or reduce the volume of data through the use of various data compression techniques, especially, in the case of such low bit-rate video signal encoders as video-telephone and teleconference systems.
One of such techniques for encoding video signals for a low bit-rate encoding system is an object-oriented analysis-synthesis coding technique wherein an input video image is divided into objects and three sets of parameters for defining the motion, contour and pixel data of each object are processed through different encoding channels.
One example of such object-oriented coding schemes is the so-called MPEG (Moving Picture Experts Group) phase 4 (MPEG-4), which is designed to provide an audio-visual coding standard for allowing a content-based interactivity, improved coding efficiency and/or universal accessibility in such applications as low bit-rate communication, interactive multimedia (e.g., games, interactive TV, etc.) and area surveillance (see, for instance, MPEG-4 Video Verification Model Version 7.0, International Organization for Standardization, Coding of Moving Pictures and Associated Audio Information, ISO/IEC JTC1/SC29/WG11 MPEG97/N1642, Bristol, April 1997).
According to MPEG-4, an input video image is divided into a plurality of video object planes (VOP""s), which correspond to entities in a bitstream that a user can access and manipulate. A VOP can be represented by a bounding rectangle whose width and height may be the smallest multiples of 16 pixels (a macroblock size) surrounding each object so that the encoder processes the input video image on a VOP-by-VOP basis.
A VOP disclosed in MPEG-4 includes shape information and texture information for an object therein which are represented by a plurality of macroblocks on the VOP, each of the macroblocks having, e.g., 16xc3x9716 pixels. Each of the macroblocks on the VOP can be classified into one of a background, a boundary and an object macroblocks. The background macroblock contains only background pixels located outside an object in the VOP; the boundary macroblock includes at least one background pixel and at least one object pixel located inside the object; and the object macroblock has only object pixels. The shape information is encoded by using, e.g., a context-based arithmetic encoding (CAE) technique on a macroblock basis, while the texture information is encoded through the use of conventional encoding techniques such as DCT (discrete cosine transform), quantization and statistical coding processes on the macroblock basis. Specifically, the DCT process for transforming the texture information is performed on a DCT-block basis, wherein a macroblock is divided into 4 DCT-blocks of 8xc3x978 pixels.
Through the DCT and the quantization processes, one DC component and a multiplicity of AC components are produced for each DCT-block, wherein each AC component has much more information than the DC component so that it requires many bits to represent itself. However, if the texture information for a DCT-block can be represented as constant, there will be no corresponding non-zero AC component for the DCT-block. Therefore, CBPY (coded block pattern type) information has been proposed to represent whether a DCT-block has at least one corresponding non-zero AC component. To be more specific, if there exists at least one non-zero AC component corresponding to a DCT-block, the CBPY information obtains a bit of, e.g., xe2x80x9c1xe2x80x9d, and, if otherwise, a bit of, e.g., xe2x80x9c0xe2x80x9d. Accordingly, a decoding part can tell the existence of any non-zero AC component for a corresponding DCT-block by simply detecting the CBPY information transmitted thereto through a transmission channel without any further information for the corresponding DCT-block and before encoded texture information for the corresponding DCT-block is transmitted thereto.
Conventionally, the CBPY information is determined based on only the shape information of each macroblock. For instance, a background macroblock has no object pixel so that no CBPY information is generated. Also, the CBPY information of an object macroblock will have 4-bit data, each bit corresponding to one of the 4 DCT-blocks within the macroblock, since the object macroblock has 4 non-transparent DCT-blocks, wherein a non-transparent DCT-block has a DCT-block size and contains at least one object pixel to be encoded.
In addition, a boundary macroblock can include both a transparent DCT-block and a non-transparent DCT-block together, wherein the transparent DCT-block has only background pixels therein and need not be encoded so that the CBPY information corresponding to the boundary macroblock may have i-bit data, i being a positive integer ranging from 1 to 4, and the respective bits corresponding to the respective non-transparent DCT-blocks in the macroblock. Referring to FIG. 3A, each of the 4 DCT-blocks of the boundary macroblock P1 has at least one object pixel so that 4-bit CBPY information is generated, wherein each of the squares represents a pixel, each shaded square being an object pixel and each white one being a background pixel. Similarly, in FIGS. 3B , 3B and 3C only 2 DCT-blocks of the boundary macroblock P2 and P3 have at least one object pixel, respectively, so that only 2-bit CBPY is generated.
In the meantime, in order to encode the texture information for the VOP, the texture information on each of the macroblocks has been processed by adaptively using a progressive and an interlaced coding techniques to thereby enhance the coding efficiency. Therefore, DCT_type information representing a coding condition, i.e., a DCT_type, of the texture information has been employed, wherein the DCT_type has been determined on the macroblock basis using the texture information. For example, a frame and a field correlation coefficients are calculated, wherein the frame correlation coefficient is a sum of absolute first differences, each first differences being an error between a line pair including an even line and an adjacent odd line of the macroblock and the field correlation coefficient is a sum of absolute second differences and absolute third differences, each second difference and each third difference being errors between a consecutive even-line pair and between a consecutive odd-line pair, respectively, of the macroblock; and, then, the DCT-type is determined by comparing the frame correlation coefficient with the field correlation coefficient. In another preferred embodiment, each absolute difference can be replaced with a square error(see, MPEG-4 Video Verification Model Version 7.0, supra, p 54). The smaller the correlation coefficient is, the higher the degree of the correlation is. If the frame correlation is equal to or higher than the field correlation so that the progressive coding technique is determined to be more effective, the DCT_type information on the macroblock will have a bit of, e.g., xe2x80x9c0xe2x80x9d, and, if otherwise, a bit of, e.g., xe2x80x9c1xe2x80x9d.
However, the bit-number of the CBPY information in the boundary macroblock depends on the DCT_type thereof. In FIGS. 3B and 3C, the numbers of non-transparent DCT-blocks within a progressive and an interlaced type macroblocks are different from each other depending on their DCT_types. Consequently, the bit-number of the CBPY information is also changed according to the DCT_type. To be more specific, when the macroblock P2 is encoded through the progressive coding technique, 2-bit CBPY information is generated and, if otherwise, 4-bit CBPY information is produced. Meanwhile, when the macroblock P3 is encoded through the progressive coding technique, 2-bit CBPY information is generated and, if otherwise, 1-bit CBPY information is produced.
As can be noted above, if a macroblock to be processed is a boundary macroblock, the bit-number of the CBPY information, i.e., the number of non-transparent DCT-blocks therein, should be determined depending on its DCT_type.
Since, furthermore, a data stream to be transmitted to the decoding part has a sequence of CBPY information and DCT_type information, the decoding part may not correctly predict the bit-number of the CBPY information, i.e., the number of non-transparent DCT-blocks within the processed macroblock and, consequently, may not accurately reconstruct the CBPY information.
It is, therefore, a primary object of the invention to provide a method and an apparatus, for use in a video signal encoder, for effectively encoding texture information of a video signal by generating CBPY information based on a encoding type determined by the texture information.
In accordance with the present invention, there is provided a method, for use in a video signal encoder, for coding texture information of a video signal which includes the texture information and shape information on each of macroblocks, each macroblock having Mxc3x97M pixels and being dividable into P number of equal-sized subblocks, M and P being positive integers, respectively, comprising the steps of:
(a) determining an encoding_type of a target macroblock based on the texture information thereof, wherein the encoding_type represents the more effective coding technique between a progressive and an interlaced coding techniques for encoding the texture information thereof;
(b) re-forming the shape information and the texture information on the target macroblock in response to the encoding_type to generate re-formed shape information and re-formed texture information thereof, respectively;
(c) detecting the re-formed shape information on a DCT-block basis to find a CBPY bit number of the target macroblock, wherein the CBPY bit number is the number of bits for non-transparent subblocks, each non-transparent subblock having a subblock size and containing at least one object pixel;
(d) if the CBPY bit number is not zero, transforming the re-formed texture information of the target macroblock into a set of transformation coefficients for each non-transparent subblock based on the CBPY bit number and quantizing the set of transformation coefficients to thereby produce a set of quantized transformation coefficients;
(e) detecting the set of quantized transformation coefficients for said each non-transparent subblock to generate CBPY information for the target macroblock, wherein the CBPY information represents whether or not the set of quantized transformation coefficients for said each non-transparent subblock contains at least one non-zero component therein; and
(f) multiplexing the encoding_type and the CBPY information for the target macroblock to generate a bit stream.